Improving OpenGL ES performance using CATransform3D

January 13, 2009

Author: Brad Larson

Improving OpenGL ES performance using CATransform3D

I've been doing a lot of work with Core Animation lately during development of my next iPhone application, and I came across an interesting way to optimize OpenGL ES rendering that I thought I'd share. This small improvement yielded a 14-25% increase in the triangles per second I was able to push in Molecules.

Core Animation, Apple's framework for making hardware-accelerated animation easy to use on the Mac and the iPhone, is built on top of OpenGL. Recently, I realized that this layering meant that some of Core Animation's functions could be used for more than just 2-D view animations. This started with a little proof-of-principle application I did where I translated the 3-D rotation mechanic in Molecules to Core Animation (based on work done by Bill Dudney).

It turns out that the CATransform3D struct provided by Core Animation for doing manipulation of CALayers is identical in structure to the model view matrix in OpenGL (ES). In fact, Apple has provided us with a number of functions for manipulating this CATransform3D struct in ways that correspond to glRotate, glScale, and glTranslate calls.

This solves the biggest known problem I had with Molecules' rendering routines: for me to do rotation and translation relative to the user's touch input, I had to know the state of the model view matrix at all times. In the current version of the program, that means that I had to use a few glGetFloatv calls to query the current state of the model view matrix. Each of these calls halted the rendering pipeline, so I needed a way to get rid of them.

CATransform3D and its supporting functions provide the means to do that. I could keep track of the current model view matrix in a CATransform3D, perform manipulations on that structure, and just replace the OpenGL model view matrix with it every time there was a change.

At the first load of a molecule, I now copy the starting value of the model view matrix into a CATransform3D instance variable (that persists for as long as the molecule is represented in 3-D):

GLfixed currentModelViewMatrix[16]  = {45146,47441,2485,0,-25149,26775,-54274,0,-40303,36435,36650,0,0,0,0,65536};
[self convertFixedPointMatrix:currentModelViewMatrix to3DTransform:&currentCalculatedMatrix];

I do all my OpenGL ES calls using fixed-point values, because I thought it would be faster, so I have a simple set of methods for converting between a fixed-point matrix and a CATransform3D struct:

- (void)convertFixedPointMatrix:(GLfixed *)fixedPointMatrix to3DTransform:(CATransform3D *)transform3D;
{
	transform3D->m11 = (CGFloat)fixedPointMatrix[0] / 65536.0f;
	transform3D->m12 = (CGFloat)fixedPointMatrix[1] / 65536.0f;
	transform3D->m13 = (CGFloat)fixedPointMatrix[2] / 65536.0f;
	transform3D->m14 = (CGFloat)fixedPointMatrix[3] / 65536.0f;
	transform3D->m21 = (CGFloat)fixedPointMatrix[4] / 65536.0f;
	transform3D->m22 = (CGFloat)fixedPointMatrix[5] / 65536.0f;
	transform3D->m23 = (CGFloat)fixedPointMatrix[6] / 65536.0f;
	transform3D->m24 = (CGFloat)fixedPointMatrix[7] / 65536.0f;
	transform3D->m31 = (CGFloat)fixedPointMatrix[8] / 65536.0f;
	transform3D->m32 = (CGFloat)fixedPointMatrix[9] / 65536.0f;
	transform3D->m33 = (CGFloat)fixedPointMatrix[10] / 65536.0f;
	transform3D->m34 = (CGFloat)fixedPointMatrix[11] / 65536.0f;
	transform3D->m41 = (CGFloat)fixedPointMatrix[12] / 65536.0f;
	transform3D->m42 = (CGFloat)fixedPointMatrix[13] / 65536.0f;
	transform3D->m43 = (CGFloat)fixedPointMatrix[14] / 65536.0f;
	transform3D->m44 = (CGFloat)fixedPointMatrix[15] / 65536.0f;
}
 
- (void)convert3DTransform:(CATransform3D *)transform3D toFixedPointMatrix:(GLfixed *)fixedPointMatrix;
{
	fixedPointMatrix[0] = (GLfixed)(transform3D->m11 * 65536.0f);
	fixedPointMatrix[1] = (GLfixed)(transform3D->m12 * 65536.0f);
	fixedPointMatrix[2] = (GLfixed)(transform3D->m13 * 65536.0f);
	fixedPointMatrix[3] = (GLfixed)(transform3D->m14 * 65536.0f);
	fixedPointMatrix[4] = (GLfixed)(transform3D->m21 * 65536.0f);
	fixedPointMatrix[5] = (GLfixed)(transform3D->m22 * 65536.0f);
	fixedPointMatrix[6] = (GLfixed)(transform3D->m23 * 65536.0f);
	fixedPointMatrix[7] = (GLfixed)(transform3D->m24 * 65536.0f);
	fixedPointMatrix[8] = (GLfixed)(transform3D->m31 * 65536.0f);
	fixedPointMatrix[9] = (GLfixed)(transform3D->m32 * 65536.0f);
	fixedPointMatrix[10] = (GLfixed)(transform3D->m33 * 65536.0f);
	fixedPointMatrix[11] = (GLfixed)(transform3D->m34 * 65536.0f);
	fixedPointMatrix[12] = (GLfixed)(transform3D->m41 * 65536.0f);
	fixedPointMatrix[13] = (GLfixed)(transform3D->m42 * 65536.0f);
	fixedPointMatrix[14] = (GLfixed)(transform3D->m43 * 65536.0f);
	fixedPointMatrix[15] = (GLfixed)(transform3D->m44 * 65536.0f);
}

The first step in the rendering is to scale the molecule based on pinch gestures. The OpenGL call

glScalex(fixedPointScaleFactor, fixedPointScaleFactor, fixedPointScaleFactor);

is replaced by

currentCalculatedMatrix = CATransform3DScale(currentCalculatedMatrix, scaleFactor, scaleFactor, scaleFactor);

Note that I'm not using any OpenGL commands until the very end, when all transformations have been calculated. After scaling comes rotation. This is normally one of the places where I'd need to do a glGet call to grab the model view matrix in order to rotate the model

glGetFixedv(GL_MODELVIEW_MATRIX, currentModelViewMatrix);	
GLfloat totalRotation = sqrt(xRotation*xRotation + yRotation*yRotation);
 
glRotatex([moleculeToDisplay floatToFixed:totalRotation],
		  (GLfixed)((xRotation/totalRotation) * (GLfloat)currentModelViewMatrix[1] + (yRotation/totalRotation) * (GLfloat)currentModelViewMatrix[0]),
		  (GLfixed)((xRotation/totalRotation) * (GLfloat)currentModelViewMatrix[5] + (yRotation/totalRotation) * (GLfloat)currentModelViewMatrix[4]),
		  (GLfixed)((xRotation/totalRotation) * (GLfloat)currentModelViewMatrix[9] + (yRotation/totalRotation) * (GLfloat)currentModelViewMatrix[8])
		  );

The above code is replaced with

GLfloat totalRotation = sqrt(xRotation*xRotation + yRotation*yRotation);
 
CATransform3D temporaryMatrix = CATransform3DRotate(currentCalculatedMatrix, totalRotation * M_PI / 180.0, 
	 ((xRotation/totalRotation) * currentCalculatedMatrix.m12 + (yRotation/totalRotation) * currentCalculatedMatrix.m11),
	 ((xRotation/totalRotation) * currentCalculatedMatrix.m22 + (yRotation/totalRotation) * currentCalculatedMatrix.m21),
	 ((xRotation/totalRotation) * currentCalculatedMatrix.m32 + (yRotation/totalRotation) * currentCalculatedMatrix.m31));

I use a temporary matrix for the transform, in case there is a problem with the rotation (an angle of zero or a zero-length rotation vector leads to undefined behavior). If the temporary matrix is not full of NaNs, I copy it over the currentCalculatedMatrix.

The last operation is translation, for which the code

glGetFixedv(GL_MODELVIEW_MATRIX, currentModelViewMatrix);
 
float currentScaleFactor = sqrt(pow((GLfloat)currentModelViewMatrix[0] / 65536.0f, 2.0f) + pow((GLfloat)currentModelViewMatrix[1] / 65536.0f, 2.0f) + pow((GLfloat)currentModelViewMatrix[2] / 65536.0f, 2.0f));
 
xTranslation = xTranslation / (currentScaleFactor * currentScaleFactor);
yTranslation = yTranslation / (currentScaleFactor * currentScaleFactor);
glTranslatef(xTranslation * (GLfloat)currentModelViewMatrix[0] / 65536.0f, xTranslation * (GLfloat)currentModelViewMatrix[4] / 65536.0f, xTranslation * (GLfloat)currentModelViewMatrix[8] / 65536.0f);
glTranslatef(yTranslation * (GLfloat)currentModelViewMatrix[1] / 65536.0f, yTranslation * (GLfloat)currentModelViewMatrix[5] / 65536.0f, yTranslation * (GLfloat)currentModelViewMatrix[9] / 65536.0f);

is replaced by

float currentScaleFactor = sqrt(pow(currentCalculatedMatrix.m11, 2.0f) + pow(currentCalculatedMatrix.m12, 2.0f) + pow(currentCalculatedMatrix.m13, 2.0f));	
 
xTranslation = xTranslation / (currentScaleFactor * currentScaleFactor);
yTranslation = yTranslation / (currentScaleFactor * currentScaleFactor);
temporaryMatrix = CATransform3DTranslate(currentCalculatedMatrix, xTranslation * currentCalculatedMatrix.m11, xTranslation * currentCalculatedMatrix.m21, xTranslation * currentCalculatedMatrix.m31);
temporaryMatrix = CATransform3DTranslate(temporaryMatrix, yTranslation * currentCalculatedMatrix.m12, yTranslation * currentCalculatedMatrix.m22, yTranslation * currentCalculatedMatrix.m32);

Because all of the scaling, rotation, and translation operations to this point have only been applied to a CATransform3D struct, I need to update the model view matrix to match that transform.

[self convert3DTransform:&currentCalculatedMatrix toFixedPointMatrix:currentModelViewMatrix];
 
glLoadMatrixx(currentModelViewMatrix);

As I stated at the beginning, this quick set of optimizations leads to an increase of 14-25% (depending on the specific model) in the number of triangles per second that Molecules was able to push through the iPhone's GPU. This means that on the iPhone 2.2 firmware, I'm now getting a max of 310,000 triangles per second. This is still short of the 470,000 triangles per second the iPhone is capable of under these conditions, as reported elsewhere. There's still some room to improve, but I'm getting there.

I know that I could have implemented my own matrix math functions or found ones written by someone else to do what I've described above, but these are structures and functions provided by Apple as part of Core Animation, which is on every iPhone out there (and every Mac running Leopard). Also, I'm terrible at matrix math and this prevents me from having to even think about it. Frankly, I think it's cool that you can do this kind of back-and-forth with Core Animation and OpenGL.

As soon as I finish the new Protein Data Bank searching methods, I'll post version 1.3.2 of Molecules containing this new and improved rendering code.