Enhancing Molecules using OpenGL ES 2.0
The 2.0 version of Molecules brings with it a brand new rendering engine that utilizes OpenGL ES 2.0 to deliver realistic 3-D representations of molecular structures. This is a long way from the original OpenGL ES 1.1 renderer that I first wrote about here, so I want to describe in detail how this new version works. The source code for Molecules is available under the BSD license, so you are free to download the project from the main application page and follow along as I walk through the process.
Up until now, Molecules had used OpenGL ES 1.1 to display 3-D representations of molecules, which worked, but I was never happy with the results. For the last two and a half years I've had a research paper sitting on my desk that describes a technique for rendering molecules with stunning results, but I had no idea how to implement this on iOS, or even if it was possible on a mobile device. It turns out that not only is it possible, but with proper tuning a newer iOS device can deliver nearly the same rendering quality as a desktop, and do so at a pretty good framerate.
To do this, I used the newer OpenGL ES 2.0 API and its programmable shaders, instead of OpenGL ES 1.1 and its fixed function pipeline. OpenGL ES 1.1 has been supported on all iOS devices since the first iPhone, but OpenGL ES 2.0 capabilities were introduced with the iPhone 3G S and have been in every iOS device since then (iPhone 4, third and fourth generation iPod touches, and both iPads).
What do I mean when I talk about improved image quality using OpenGL ES 2.0? I used these comparison images in my post announcing the new version, but I'll repeat them because I believe they clearly illustrate the huge difference this new approach makes. The old OpenGL ES 1.1 implementation is on the left, and the new OpenGL ES 2.0 one on the right:
OpenGL ES 2.0 was the key to accomplishing this style of graphics, but it took me a long while to understand the new API. It allows you to write little programs, called shaders, that run on the GPU to perform custom effects. If you want to learn more about the API, you can check out the video for the class I taught about it on iTunes U, Jeff LaMarche's posted chapters from his unfinished book, or Philip Rideout's great iPhone 3D Programming book. I don't have the space in this article to bring you completely up to speed on OpenGL ES 2.0 and its shaders, so I'll assume a little familiarity with them as I describe things here.
As I mentioned earlier, this rendering technique is entirely drawn from Marco Tarini, Paolo Cignoni, and Claudio Montain at Universita dell’Insubria and I.S.T.I. - C.N.R., as described in their paper titled "Ambient Occlusion and Edge Cueing to Enhance Real Time Molecular Visualization" published in 2006 in the IEEE Transactions on Visualization and Computer Graphics. I owe a tremendous debt to them for developing this process. They did create a companion application for Mac, Windows, and Linux called QuteMol which embodies this rendering model in a GPL-licensed open source project. While QuteMol provides an excellent example, I chose to write my own implementation from scratch so that it would integrate well with iOS and the capabilities of mobile GPUs.
There are several key elements involved in rendering a frame for a molecule onscreen: procedurally generated impostors, a depth texture to manage impostor intersections, and a precalculated ambient occlusion texture for shadowing the surface of the molecule. I'll step through each of these stages, as well as some optimizations I put into place to help make this all render at a reasonable framerate.
The sphere is a lie
One of the challenges I encountered early on when building the original version of Molecules was how to express spheres and cylinders using 3-D geometry. To make truly smooth versions of these objects by normal means, you need to use an extremely large number of triangles. If you have a molecule with thousands of atoms, the number of triangles required to represent the structure could be staggering, and would challenge even the fastest of desktop GPUs. Not only that, but this geometry would have a very large memory footprint.
The solution proposed by Tarini, et al. was to fool the eye by drawing squares and rectangles that always face the user, then procedurally color each pixel within those simple 2-D faces as if they were windows looking in on 3-D spheres and cylinders. These so-called procedural impostors are conceptually similar to texture billboards, only instead of using prerendered images, these sphere and cylinder representations are drawn on the fly.
By doing this pixel-based raster drawing of the atoms and bonds, a minimal amount of geometry is used (only two triangles for each sphere or cylinder), yet these objects look perfectly sharp at any magnification.
Custom vertex and fragment shaders are used to do this drawing. For spheres, four identical vertices are sent to the GPU corresponding to the center of the sphere, as well as four coordinates that represent the corners of the sphere impostor square (-1, -1; 1, -1; -1, 1; 1, 1). The vertex shader then takes each of the four coordinates, transforms them according to the model view and orthographic matrices (to handle rotation and scaling of the model, as well as the rectangular nature of the OpenGL scene), and then displaces them relative to the viewer using the impostor space coordinates so that the square is always facing the user. This process is depicted below:
Once the square has been generated, every fragment within that square (roughly, every pixel) needs to be colored as if it were from a lit sphere behind that point. For this, the normal of the sphere at that point is calculated as the vector (impostor space X, impostor space Y, normalized depth). The calculation and use of the depth component is discussed later. The dot product of the normal and the light direction is calculated and used to determine the strength of the illumination at that point for both ambient light and the specular highlight. The resulting color is written to the screen at that point, except for fragments that lie outside of the sphere, which are output as transparent.
Cylinders are a little more complicated, but the same general process applies. Four vertices are fed to the GPU (two for the starting coordinate and two for the ending coordinate), along with four impostor space coordinates and four direction vectors that point from the beginning to the end of the cylinder center. The beginning and ending points are transformed at each vertex, then by using the transformed directions the vertices at each end are displaced perpendicular to the axis of the cylinder as viewed by the user. Additionally, the vertices at one end of the cylinder are displaced along the axis to account for the curving out of the cylinder at that end. This is shown below:
Like for the spheres, the normal at each fragment on the cylinder is calculated to use in determining illumination, but the calculations here aren't as simple as those for the spheres. Many values are calculated in the vertex shader for points on the center axis, then adjusted in the fragment shader as a function of the distance from the axis.
Problem: no gl_FragDepth in OpenGL ES
One of the significant challenges with using 2-D impostors to represent 3-D objects lies in how to deal with overlapping objects. In a spacefilling model, you have spheres that intersect one another, and in a ball-and-stick visualization spheres and cylinders intersect. You can't rely on the GPU to handle these objects correctly using standard clipping, because the spheres are on flat squares that never cross and the cylinders use rectangles that would run right to the center of an intersecting sphere.
How, then, do you draw the curved boundaries of these objects and hide the appropriate areas on them? In the original implementation by Tarini, et al., they used the capability in OpenGL to write out a custom depth value for each fragment in their fragment shader to the variable gl_FragDepth. The GPU could then figure out which fragments of which object were in front of others. Unfortunately, this variable is missing in OpenGL ES, probably due to the specific optimizations used in mobile GPUs.
To work around this, I implemented my own custom depth buffer using a frame buffer object that was bound to a texture the size of the screen. For each frame, I first do a rendering pass where the only value that is output is a color value corresponding to the depth at that point. In order to handle multiple overlapping objects that might write to the same fragment, I enable color blending and use the GL_MIN_EXT blending equation. This means that the color components used for that fragment (R, G, and B) are the minimum of all the components that objects have tried to write to that fragment (in my coordinate system, a depth of 0.0 is near the viewer, and 1.0 is far away). In order to increase the precision of depth values written to this texture, I encode depth to color in such a way that as depth values increase, red fills up first, then green, and finally blue. This gives me 768 depth levels, which works reasonably well.
The following demonstrates a rendered model and its generated depth map:
Once generated for a frame, this depth map is then passed into the color rendering pass, where each procedural sphere or cylinder impostor calculates the depth of each fragment it's trying to write out. If the depth of that fragment is greater than the value at that point in the depth texture, that fragment is made transparent so that it doesn't appear onscreen. That way, only the parts of objects that are nearest the user (the only ones that would be seen by the eye) are displayed.
Unfortunately, given the limited encoding range in my color scheme sometimes rendering artifacts seep into these depth checks, causing ugly artifacts. I've done my best to work around the cases I've seen, but there may be a way to make this a little cleaner.
Bringing out details using ambient occlusion lighting
Using raytraced impostors for the atoms and bonds adds a level of sharpness to the renderings that was missing under OpenGL ES 1.1, but Tarini, et al. suggest a technique for bringing out even more detail in molecular structures using ambient occlusion lighting. Ambient occlusion lighting is a rendering process where the intensity of the light at a point on a model is adjusted based on how ambient light hitting that point would be reduced by other nearby objects blocking that light. It produces rendered scenes that look more realistic to us than standard shading, because it's closer to the way illumination works in the world around us. For more on this technique, see Chapter 17 by Matt Pharr and Simon Green in the GPU Gems book (available for free to read online).
How can ambient occlusion lighting help us to get a better feel for the 3-D structure of a molecule? The following illustration provides a good example of this, where the image on the left is a molecule rendered using all of the processes described this far, and the one on the right with ambient occlusion lighting added in:
As you can see, the use of ambient occlusion lighting not only makes the molecule look more real, it also exposes structural features hidden in a normally lit model. The folds and pockets within the surface of the molecule are clearly visible now, giving you a much better idea of the true shape of this molecule.
The first step in enabling ambient occlusion lighting for a molecule is to determine how much light hits each point on the surface. To do this, I rotate the molecule, generate the depth texture for that rotation, and use another shader to determine which points on the surface of the molecule would be lit in that orientation. This shader writes out a lit or unlit value to a texture that maps this illumination to the surface of each sphere and cylinder. An additive blending mode is used to increase the brightness of a portion of the texture every time that part of the surface is exposed.
The resulting texture storing these ambient occlusion intensities looks something like this:
This particular image is from just the portion of the texture encoding the sphere surfaces for a ball-and-stick visualization mode. You can see the dark areas that correspond to where the cylinders from bonds intersect the spheres, and consequently block all light from hitting those points on the spheres.
A mapping function is used to associate the values from this ambient occlusion texture with the fragments generated in the sphere and cylinder impostors, since there isn't really a physical surface to those virtual objects.
The ambient occlusion mapping occurs just once when the molecule is first loaded. QuteMol uses 128 sampling points for their ambient occlusion calculations, but I just use 44 in the interest of speed on the mobile devices. The resulting ambient occlusion texture is then used for each subsequent rendered frame.
The fragment shader that does the color calculation for the spheres and cylinders loads the appropriate ambient occlusion value from this texture and scales the ambient and specular lighting intensities based on it, leading to a realistic representation of the lighting of a molecule.
The most significant downside to this gorgeous new rendering engine is that it's noticeably slower than the plain flat shaded, simple geometry approach of the original version of Molecules. It's a tradeoff I'm more than willing to make, but optimizations are always welcome. I've only just started with trying to improve the performance of this new approach, but I'll describe a few of the tweaks I've made so far.
The first thing I always do when confronted with a performance problem is to profile the application as best I can. Apple gives you a great set of tools for doing this within Instruments, so that's the first place to look. In particular, the new OpenGL ES Analyzer gives you a great view into your OpenGL ES rendering pipeline.
In my case, it made apparent that the bottleneck here is clearly the fragment shaders for my various rendering steps. This is also obvious when you zoom in on a model and the rendering slows down due to the greater number of pixels that need to be drawn.
Another tool in my shader tuning arsenal is the PVRUniSCo Editor put out by Imagination Technologies as part of their free PowerVR SDK. The PowerVR SGX series of GPUs are what power the current iterations of the iOS devices, and the PVRUniSCo Editor gives you the ability to load in your shaders and get a line-by-line readout of the number of cycles on the GPU that will be required to execute that part of your shader code. A more accurate whole-shader estimate is provided for the best- and worst-case number of cycles that would be used when running your shader. This pointed out several inefficient areas in my shaders (vector arithmetic being done in the wrong order, etc.), and highlighted the most expensive calculations I was performing.
As a result, one of the first optimizations I undertook was to precalculate many of the lighting values for my sphere impostors. This was inspired by Aras Pranckevičius' article on iOS shader tuning, where he found that storing commonly calculated values in a texture and looking them up was far faster than recalculating those same values over and over again. I was able to boost my sphere rendering performance by about 20% in this manner.
Another significant improvement came from using branching in my shaders to avoid performing expensive calculations in cases where a fragment would be ignored. I was under the impression that all forms of branching in shaders were terrible, but by applying this early-bailout testing here to avoid these later calculations, I was able to improve rendering performance by 40%.
One last interesting optimization was the use of Grand Central Dispatch to manage my rendering operations. I needed a solid, performant way to ensure that all rendering operations using the same OpenGL ES context would never execute at the same time on multiple threads. In the previous version of Molecules, I had done this by making every OpenGL call run on the main thread. This was not a good solution, and it became even worse with the longer frame rendering times of this new engine.
Instead, I created a GCD dispatch queue that would only execute one item at a time, but would run these items on a background thread, and wrapped every operation that touched the OpenGL ES context in a block to be performed on this dispatch queue. By moving these operations from the main thread, performance increased by 10-20% on most devices. On the iPad 2, however, performance jumped by nearly 50%. I'm guessing that fully utilizing that second processor core on the iPad 2 really helped here.
As I said, I'm only just getting started with the performance tuning of this new renderer. Because this is a lot more complex than the old fixed-function renderer, there are many more ways of optimizing this.
Again, I highly recommend reading the paper by Tarini, et al. to see the original description of this rendering technique, as well as some of the math that I may have skipped over here. Hopefully, by going between this overview, their math, and the published Molecules code, you should be able to see how this application operates.
I'm incredibly happy that I was finally able to make this rendering model work. It took a few months to pull together once I understood the basic concepts, and I think it was definitely worth the effort.