GPU-accelerated video processing on Mac and iOS

SecondConf logo

I've been invited to give a talk at the SecondConf developer conference in Chicago, and I'm writing this to accompany it. I'll be talking about using the GPU to accelerate processing of video on Mac and iOS. The slides for this talk are available here. The source code samples used in this talk will be linked throughout this article. Additionally, I'll refer to the course I teach on advanced iPhone development, which can be found on iTunes U here.

UPDATE (7/12/2011): My talk from SecondConf on this topic can now be downloaded from the conference videos page here.

UPDATE (2/13/2012): Based on this example, I've now created the open source GPUImage framework which encapsulates these GPU-accelerated image processing tasks. Read more about the framework in my announcement, and grab its code on GitHub. This framework makes it a lot easier to incorporate these custom filter effects in an iOS application.

Processing video on the GPU

As of iOS 4.0, we now have access to the raw video data coming in from the built-in camera(s). The Mac has had this capability for long while, but the mobility of iOS devices opens up significant opportunities to process and understand the world around us.

However, dealing with images coming at an application at up to 60 frames per second can strain even the most powerful processor. This is particularly the case with resource-constrained mobile devices. Fortunately, many of these devices contain processors ideally suited to this task in their GPUs.

GPUs are built to perform parallel processing tasks on large sets of data. For a while, desktop computers have had GPUs that can be programmed using simple C-like shaders to produce fascinating effects. These same shaders let you also run arbitrary calculations against data input to the GPU. With more and more OpenGL ES 2.0 compatible GPUs being bundled into mobile devices, this kind of GPU-based computation is now possible on handhelds.

On the Mac, we have three technologies for performing work on the GPU: OpenGL, Core Image, and OpenCL. Core Image, as its name indicates, is geared toward processing images to create unique effects in a hardware-accelerated manner. OpenCL is a new technology in Snow Leopard that takes the use of GPUs as computational devices to its logical extreme by defining a way of easily writing C-like programs that can process loads of various types of data on the GPU, CPU, or both.

On iOS, we just have OpenGL ES 2.0, which makes these kind of calculations more difficult than OpenCL or Core Image, but they are still possible through programmable shaders.

Why would we want to go through the effort of building custom code to upload, process, and extract data from the GPU? I ran some benchmarks on my iPhone 4 where I performed a simple calculation (identifying a particular color, with a certain threshold) for various times within a loop against every pixel in a 480 x 320 image. This was done using a simple C implementation that ran on the CPU, and a programmable shader that ran on the GPU. The results are as follows:

Calculation GPU FPS CPU FPS Speedup
Thresholding x 100 1.45 0.05 28.7X
Thresholding x 2 33.63 2.36 14.3X
Thresholding x 1 60.00 4.21 14.3X

As you can see, the GPU handily beats the CPU in every case on these benchmarks. These benchmarks were run multiple times, with the average of those results presented here (there was reasonably little noise in the data). On the iPhone 4, you look to gain an approximate 14X - 29X speedup by running a simple calculation like this across a lot of data. This makes it practical to implement object recognition and other video processing tasks on mobile devices.

Using Quartz Composer for rapid prototyping

It can be difficult to design applications that use the GPU, because there generally is a lot of code required just to set up the interaction with Core Image, OpenCL, or OpenGL, not to mention any video sources you have coming in to the application. While much of this code may be boilerplate, it still is a hassle to build up a full application to test out the practicality of an idea or even just to experiment with what is possible on the GPU.

That's where Quartz Composer can make your life a lot easier. Quartz Composer is a development tool that doesn't get much attention, despite the power that resides within the application. The one group it seems to have traction with are VJs, due to the very cool effects you can generate quickly based on audio and video sources.

However, I think that Quartz Composer is an ideal tool for rapid prototyping of Core Image filters, GLSL programmable shaders, and OpenCL kernels. To test one of these GPU-based elements, all you need to do is drag in the appropriate inputs and outputs, drop in your kernel code in an appropriate patch, and the results will be displayed onscreen. You can then edit the code for your filter, kernel, or shader and see what the results look like in realtime. There's no compile, run, debug cycle in any of this.

This can save you a tremendous amount of time by letting you focus on the core technical element you're developing and not any of the supporting code that surrounds it. Even though it is a Mac-based tool, you can just as easily use this to speed up development of iPhone- and iPad-based OpenGL ES applications by testing out your custom shaders here in 2-D or 3-D.

For more on Quartz Composer, I recommend reading Apple's "Working with Quartz Composer" tutorial and their Quartz Composer User Guide, as well as visiting the great community resources at Kineme.net and QuartzComposer.com

Video denoising

As a case study for how useful Quartz Composer can be in development, I'd like to describe some work we've been doing lately at SonoPlot. We build robotic systems there that act like pen plotters on the microscale and are useful for printing microelectronics, novel materials, and biological molecules like DNA and proteins. Our systems have integrated CCD cameras that peer through high-magnification lenses to track the printing process.

These cameras can sometimes need to be run at very high gains, which introduces CCD speckle noise. This noise reduces the clarity of the video being received, and can make recorded MPEG4 videos look terrible. Therefore, we wanted to try performing some kind of filtering on the input video to reduce this noise. We just didn't know which approach would work best, or even if this was worth doing.

We used Quartz Composer to rig up an input source from our industrial-grade IIDC-compliant FireWire cameras through multiple Core Image filters, then output the results to the screen to test filter approaches. You can download that composition from here to try out for yourself, if you have a compatible camera.

The denoise filtering Quartz Composition will require you to have two custom plugins installed on your system, both provided by Kineme.net: VideoTools and Structure Maker. VideoTools lets you work with FireWire IIDC cameras using the libdc1394 library, as well as set their parameters like gain, exposure, etc. If you wish to just work with the built-in iSight cameras, you can delete the Kineme Video Input patch in these compositions and drop in the standard Video Input in its place.

From this, we were able to determine that a Core Image low-pass filter with a specific filter strength was the best solution for quickly removing much of this speckle, and now can proceed to implementing that within our application. By testing this out first in Quartz Composer, not only were we able to save ourselves a lot of time, but we now know that this approach will work and we have confidence in what we're building.

Color-based object tracking

A while ago, I had watched the presentation given by Ralph Brunner at WWDC 2007 ("Create Stunning Effects with Core Image") and was impressed by the demonstration he provided of doing object tracking in a video that was keyed off of the object's color. That demo turned into the CIColorTracking sample application, as well as a chapter in the GPU Gems 3 book which is available for free online.

When I saw that iOS 4.0 gave us the ability to handle the direct video frames being returned from the iPhone's (and now iPod touch's) camera, I wondered if it would be possible to do this same sort of object tracking on the iPhone. It turns out that the iPhone 4 can do this at 60 frames per second by using OpenGL ES 2.0, but I didn't know that at the time. Therefore, I wanted to test this idea out.

Quartz Composer color tracking

The first thing I did was create a Quartz Composer composition that used the Core Image filters from their example. I wanted to understand how the filters interacted and play with them to see how they worked. That composition can be downloaded from here.

The way this example works is that you first supply a color that you'd like to track, and a threshold for the sensitivity of detecting that color. You want to adjust that sensitivity so that you're picking up the various shades of that color in your object, but not unrelated colors in the environment.

The first Core Image filter takes your source image and goes through each pixel in that image, determining whether that pixel is close enough to the target color, within the threshold specified. Both the pixel and target colors are normalized (red, green, blue channels added, divided by three, then the whole color divided by that amount) to attempt to remove the effect of varying lighting on colors. If a pixel is within the threshold to the target color, it is replaced with a RGBA value of (1.0, 1.0, 1.0, 1.0). Otherwise, it is replaced with (0.0, 0.0, 0.0, 0.0).

The second filter takes the thresholded pixels and multiplies their values with the normalized X,Y coordinates (0.0 - 1.0) to store the coordinate of each detected pixel in that pixel's color.

Finally, the colors are averaged across all pixels, then adjusted by the percentage of pixels that are transparent (ones that failed the threshold test). A single pixel is returned from this, with a color that has the centroid of the thresholded area stored in its red and green colors, and the relative size of the area in its blue and alpha color components.

The iPhone lacks Core Image, but it does have the technology originally used as the underpinning for Core Image, OpenGL. Specifically, all newer iOS devices have support for OpenGL ES 2.0 and programmable shaders. As I described earlier, shaders are written using the OpenGL Shading Language (GLSL) and they instruct the GPU to process 2-D and 3-D data to render onscreen or in a framebuffer.

Therefore, I needed to port the Core Image kernel code from Apple's example to GLSL shaders. Again, I decided to use Quartz Composer for the design portion of this. The resulting composition can be downloaded from here.

Porting the filters over turned out to be pretty straightforward. Core Image and GLSL have very similar syntaxes, with only a few helper functions and language features that differ.

Object tracking on iPhone

The final step was taking this GLSL-based implementation and building an application around it. The source code for that application can be downloaded from here. Note that this application needs to be build for the device, because the Simulator in the iOS 4.1 SDK seems to be lacking some of the AVFoundation video capture symbols.

iPhone color tracking

The application has a control at the bottom that lets you switch between four different views: the raw camera feed, a view that overlays the thresholded pixels on the camera feed, a view of the thresholded pixels with position data stored in them, and then a view that shows a tracking dot which follows the color area.

Touching on a color on the image selects that color as the threshold color. Dragging your finger across the display adjusts the threshold sensitivity. You can easily see this on the overlaid threshold view.

The one shortcoming of this application is that I still use a CPU-bound routine to average out the pixels of the final image, so the performance is not what it should be. I'm working on finishing up this part of the application so that it can be GPU-accelerated like the rest of the processing pipeline.

Potential for augmented reality

One potential application for this technology that I'll mention in my talk is in the case of augmented reality. Even though it is not a GPU-accelerated application, I want to point out a great example of this in the VRToolKit sample application by Benjamin Loulier.

This application leverages the open source ARToolKitPlus library developed in the Christian Doppler Laboratory at the Graz University of Technology. ARToolKitPlus looks for 2-D BCH codes in images from a video stream, recognizes those codes, and passes back information about their rotation, size, and position in 3-D space.

VRToolKit example

The VRToolKit application provides a great demonstration of how this works by overlaying a 3-D model on any of these codes it detects in a video scene, then rotating and scaling these objects in response to movement of these codes. I highly recommend downloading the source code and trying this out yourself. BCH code images that you can print out and use with this application can be found here, here, here, here, and here.

As I mentioned, this library is not hardware accelerated, so you can imagine what is possible with the extra horsepower the GPU could bring to this kind of processing.

Also, this library has a significant downside in that it uses the GPL license, so VRToolKit and all derivative applications built from it also must be released under the GPL.

Overall, I think there are many exciting applications for video processing that GPU acceleration can make practical. I'm anxious to see what people will build with this kind of technology.

Comments

Wow! Very interesting. I need to learn to program shaders, clearly! Thank you for this explanation. Eye opening (!).

Same as Dad !!

Working with OpenCV on IOS but now it could (and should) get a boost !

strange99 wrote:
Same as Dad !!

Working with OpenCV on IOS but now it could (and should) get a boost !

How to did it in opencv with a boost?
can you share for me, please...

thanks in advance

The idea is that something like OpenCV could be modified to have better performance if it used the GPU instead of the CPU. A modification to the library that does this currently doesn't exist, but you'd be more than welcome to work on one.

Thanks for a great tutorial. Wanted to let you know that the Quartz Composer file downloads URLs are switched (Core Image refers to the GLSL shader download, and vice versa). I was trying to write the output of the threshold shader to a file. I understand assetwriter, but the pixelbuffer is not modified directly, so writing it to a file just results in the original video. What should I read out to an assetwriter in order to see the shaders' output? Thanks!

Thanks for a great tutorial. Wanted to let you know that the Quartz Composer file downloads URLs are switched (Core Image refers to the GLSL shader download, and vice versa). I was trying to write the output of the threshold shader to a file. I understand assetwriter, but the pixelbuffer is not modified directly, so writing it to a file just results in the original video. What should I read out to an assetwriter in order to see the shaders' output? Thanks!

I LOVE IT! Thank you so much for this tutorial. It is incredible.!!

Thanks a lot for kind sharing

Hi,

How can I do GPGPU using OpenGL ES 2.0 without using the Quartz Composer.

I 'm confused regarding the projection stuff that needs to be setup as described in
http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial.html

But since OpenGL ES 2.0 is much more restricted, I have no idea how to go about this. I'm looking a developing a simple blurring of image sample.

Please share a sample code snippet that can do this. Thanks

The base concepts on the linked page are sound, but the specific instructions are for desktop OpenGL. Many of the things you see there (immediate mode, GLUT, Cg, etc.) don't exist in OpenGL ES, so you can't use the code they provide.

In my case, Quartz Composer was merely used as a rapid prototyping tool. It's not necessary at all for making a working application in OpenGL ES 2.0.

I'd suggest looking at what I did in my iPhone color tracking example instead, because that's a working OpenGL ES 2.0 sample that performs realtime image processing. This sample could easily be modified to perform a box blur or something similar.

Hi,
I do only need a grayscale image. There is a difference between taking the Y-Channel of:

kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange

or

performing the grayscale conversion via OpenGl on the GPU?

I need to know whats faster. Maybe you can help me.

Thanks.

Best.
Florian

Hi Brad,

Do you have any thoughts on making the centroid calculation perform better? I am trying to use your example (with great success so far) in a robotics project, but the centroid calculation drops the frame rate by 2/3. I'd hoped you have some idea's on how to improve it for object tracking.

That's a question that I haven't had the chance to answer myself, but I was going to ask the Apple engineers at WWDC. I had implemented a similar whole-image color averaging using the Accelerate framework in our control software, which is faster than the straight C implementation I have in the sample, but is still not the optimal way to handle this.

Core Image has a filter that does this on-GPU, using a shader that minimizes the image over several passes, until you get a single pixel at the end (and then adjusts it if the image was not square or an even power of two). Adding a few render passes that do this might be possible, but would take a little code to pull off.

Thank you so much for your answer and of course also for this example (although it has low framerate, it works great so far). I hope you keep us up-to-date after WWDC.

Long time console programmer, moving into the ios world...and first task is an AR project...found your site, good stuff!

I cant get the cicolortracking example to work with sdk 10.6...wondered if you had encountered same and could give me a push in the fix direction?

Thanks

Try this version, which I was able to get to build under 10.6. I didn't try running it against any test videos, though.

Honestly, I didn't spend that much time with this version of their sample, instead working more in Quartz Composer when working on the Mac.

yah in retrospect, i decided to quit fighting with the mac, as the idevice code works ... heh...dumb programmer dumb ;)

thanks,

Phil

and that version loads up with no errors, but still doesnt play the movie, or i'm missing something...i'm stubborn enough to probably try to get it working...thanks!

Phil

For everyone who doesn't need precise object recognition I made some adjustments to speed up the centroid calculation. It's not a replacement, but it might point some people into the right (or different) direction in terms of speed and precision.

http://codingblock.com/?p=40

Hi everyone.
Thank you for this article and explanation. I wonder if someone could help me in how encode BGRA video frame to H264.
Thank you.

One of the sweetest article I've read in the recent times.
Thank you for the demo code SIr. Really shows us many possibilities.

Hi, I am so excited to take a look your post. the tech you remind help me a lot. But I have other qustions that cound opengles change Sector's Image to Rectangle's Image?

Could you send me one email?

Thanks.

I don't understand what you're asking. What's a sector and what's a rectangle in your description?

Perhaps if you asked on Stack Overflow you'd get a better response.

Interesting tutorial, and I wanted to add something that might be of use to anyone wanting to go further with marker tracking for augmented reality; ARToolKitPlus was derived from ARToolKit, and ARToolKit also has a commercially licensed version, for iOS too. If you're interested, please contact ARToolworks for pricing and licensing conditions.

Thanks for this article! Speeding up OpenCV/Canny on iPhone & iPad with NEON assembly does not give good enough results. Haven't touched OpenGL/GLSL before but will definetely look that route too.

hey! great stuff
I am following it but in another embedded platform that supports openGL ES 2.0

everything went smooth but, when you do the average to get the centroid I got lost, why do you do this in the code?
why do you use the R value as Y coordinate? shouldn´t it be X ?
and why do you substract 1?
(1.0f - (currentXTotal / currentPixelTotal), currentYTotal / currentPixelTotal);

What I am getting is a weird coordinate that doesn´t seem to be true to where the centroild should be..

would you please help me with that?

thank a lot!!

I might have swapped the R and G elements (X and Y components) in my calculations to account for the rotated coordinate system of the image when displayed in portrait. Similarly, I invert one of the coordinates (the subtraction from 1) in order to account for the iPhone's (0, 0) coordinate origin being in the upper left of the portrait orientation.

It should be fairly easy to figure out the manipulations you'll need to do to fit your device's coordinate system. I arrived at these values by trial and error.

As far as why I do this on the CPU, I just didn't get a working implementation of a shader-based filter to perform this calculation in time for my talk and haven't managed to revisit this since. I believe that a shader which downsamples the image by 4X on each pass over multiple passes should be what's needed here.

This is impressive and I think it must be fun to use color tracking as game controller.
But here's the problem:
1. We use opengl to do color tracking, so we need to render the video frame with customized shader.
2. We need to use opengl to render game scenes.
Is it possible to do both at the same time?
If we render the video outside the viewport, will the shader be executed?

Thanks!

If you look at the code for the case where the dot is displayed on the screen, you'll see that the color tracking filter is being applied to an offscreen framebuffer that is never displayed. The video feed image is displayed using OpenGL ES independently of the color filtering. Shaders are executed when rendering a scene, no matter if the scene is sent to the screen or an offscreen render buffer.

So, yes, you can do the offscreen color filtering at the same time as displaying traditionAL OpenGL ES elements to the screen. However, every bit of offscreen processing you do will shave off a little GPU power depending on its complexity, so your visual rendering may slow down a bit.

Thanks!

Great Post ! Thank you a lot for sharing it.

I'm in trouble because I can't implement kuwahara filter in GLSL.
I've found a version on line but it change orientation / dimension to the image (is very fast but I don't understand how it works and why change dimension, orientation etc ... (also in quartz composer)

http://code.google.com/p/gpuakf/source/browse/trunk/glsl/kuwahara.glsl?s...

any help?

Great article.

I am curious why your frame buffer as defined by FBO_WIDTH and FBO_HEIGHT is set to half the resolution of the camera 640x480. Is this to improve performance?

I was tweaking the code and noted that if I use the same resolution framebuffers the aspect ratio is all messed up. Any idea why?

Thanks

Mike

If I remember correctly, the reason I limited the size of the FBO there was because I still had to read the pixels back onto the CPU to do the color averaging. I hadn't completed a shader-based implementation of that color averaging, so I needed to reduce the number of pixels so that the on-CPU averaging didn't slow things down too much. With a shader-based averaging implementation, I bet that size restriction could be raised (even the iPhone 4S offsets its 1080p video capture with a massive step up in shader performance).

I remember screwing around with different settings to make sure that the images were properly rotated and scaled for display, so without trying it out I can't tell you right away what could be the source of the odd aspect ratio in your tests. You might just have to try tinkering with parameters to see what effect they have on the final image.

Hi Brad,

thanks for your response. It turns our the aspect ratio issue was my own buy where I had transposed the horizontal and vertical resolutions in one of the steps.

How were you rotating your video streams?
I want to support different orientations of the iPhone, however this rotates the video stream which is not want I want (I essentially just want to show the video in the same orientation as the handset). Hence, I have been using the transform property of the UIView that the stream is in. Unfortunately this has some unfortunate side effects when you zoom into it. Were you using any different methods? Ideally, I would prefer just to lock the orientation of the UIView with the video stream and allow the rest of the app to rotate.

Thanks

Mike

hello,i am very surprise to read your blog ,now ,i am studying there gpu programming in the iPhone ,but i have no idea,it confused me long time, i still no idea about how it works,i want to sent out the data ,but how it works,can u give me more resources ?

Wow! its wondering i Love this post...! Please Keep posting this Kind of advance and new technology.. Its rockz..
I want to capture a 30sec fixed length video in my App. Can you please help me. I asked many and tried in google also but i didnt find the solution.I believe that you are the right person how can solve my problem
will be waiting for your reply
Thanks in advance..

Thanks for sharing this, it is really helpful to get started quickly.

You can make the centroid calculation 5 times faster by adding them all as integers, and only converting to float after the loop is finished. (orig was ~60ms, int is ~13ms per calculation)

Using just a little ARM NEON tuning, you can get to 8.5 ms. I think that if you would find the optimal way to sum the image using NEON instructions, this number can go down to 4 ms or lower.

(measured on an iPhone 4)

Yes, I had a much better Accelerate framework version of something like this in another application and it was significantly faster. I just haven't bothered to update this with anything more performant.

The best route would still be a multipass color averaging using a box blur shader that reduced the image iteratively by 4X each time. This is how Core Image gets these color average values for an image, and would be even faster than an optimized NEON version. I'm working on something like this for a project I hope to announce soon.

As an update to this, I finally solved out a GPU-based means of whole-image color averaging, which is present in my GPUImage framework. See the ColorObjectTracking example, which is the above sample application rebuilt for GPUImage and which now uses this new GPU-accelerated averaging routine. It's much, much faster and can now track objects at a solid 30 FPS on an iPhone 4.

Hi Brad,

I looked at your ColorTracking sample application and since I am a beginner in OpenGL please tell me how could I write new shaders for an iPhone application. Do you know a good tutorial or book about OpenGL and GLSL so I could learn to write my own shaders? I am developing an iPhone application that would record video and apply live filters to the video such as Black & White, Sepia and so on.

Your help would be greatly appreciated.

Thank you,
Levi

On iTunes U, you can find video for a class I did on OpenGL ES 2.0 as part of my Advanced iPhone Development course. I explain the basics of shaders and GLSL there. My course notes for that class session also include links to many other resources that I found to be of value in this area.

Hello again,

Thank you very much for your help, I managed to implement what I wanted thanks to the videos and resources that you have provided. I have another question regarding video recording. I have to record and store videos when these filters are applied and I don't know how to do this, because with the AVCaptureVideoDataOutput class video recording can not be achieved. Please point me in the right direction, how could I implement this while using OpenGL for applying filters to the video?

Thank you again for your help.

Best regards,
Levente Sujanszky

In the meantime I managed to record video using the AVAssetWriter class, but it recorded it without the applied filters (Black & White, Negative, Sepia) and I saw that in the processNewCameraFrame you pass the imageBuffer and create the new texture using the shaders. How could I record the video with the applied filters on it.

Thank you for your much appreciated help.

Levente Sujanszky

hi levi sujanszky,
i am doing the video application like u have asked in this blog.pls help ma. i was stuck with this problem...

Master of image processing, Thanks for really good articles.

Can I use GPUImage framework to play raw video file with .h246 extension on IOS.

What I need is a hardware accelerated GPU decoder to decode the frames from the file and play it in a player. Can you please give me a heads up on how to do it.

Thanks,

Hi Brad,

Your example application was very useful to me. But I have a problem, I'm pretty new on this and I want to change the white color to red. The white color I'm referring is the one you used on the threshold method to exchange with the color selected. How can I change this to red? Thanks in advance.

Regards

牛逼,great

Really nice framework. Do you know if somebody ported it to work with android and NDK? How hard might it be to port it to android? It would be nice easier to be able to target both platforms simultaneously.

hi,

your tutorial is very helpfull to me. That's great tutorial.
but now, i want to add new filters like rainbow,fade - in & out and snow fall ,fast motion, slow motion,etc. How can i do that?? please help me!!

Syndicate content