Performance analysis: memory, CPU usage, startup time

Relevant resources

Documentation

Web

"How To Make Your iPhone App Launch Faster" by James Thomson
"Cranking Floating Point Performance To 11 On The iPhone" by Noel Llopis
"Tuning Cocoa Applications Using DTrace: Writing Scripts" by me
"Tuning Cocoa Applications Using DTrace: Custom Static Probes and Instruments" by me

Video

"Session: Secrets of iPhone performance optimization" by Alberto Araoz
WWDC 2009 Session 413: Performance Tuning with Shark on Mac and iPhone
WWDC 2009 Session 414: iPhone Performance Optimization with Instruments
WWDC 2009 Session 415: Optimizing Performance on iPhone
iPhone Tech Talk 2009: Maximizing iPhone App Performance

Code

Performance optimization is a vital part of writing applications for a mobile device. While they are now more powerful than what many of us had sitting on our desks years ago, and getting faster every few months, they are still resource-constrained devices. Even more importantly, applications on a mobile device need to be fast simply because they will be used in shorter sessions, and sometimes even waiting for seconds is unacceptable when someone is walking down the street.

General performance tips

Do not use the Simulator for any kind of performance tests on your application. The Simulator is running on a Mac, a system that has a very fast, probably multicore processor, a fast GPU, and loads of RAM backed by virtual memory. The Simulator is also layered on top of native Cocoa and Core Foundation frameworks, which may not have the same performance characteristics or even capabilities as the ones present on the iPhone OS devices. I will show one case where you need to perform tests only in the Simulator, but that is a rare exception.

Do not make any assumptions. Test everything. I once thought a slowdown in my application was due to SQLite loading a large quantity of data, so I spent a week optimizing SQLite performance to no gain. I ran the application for ten minutes in Instruments and found that my slowdown was in an unrelated string-handling routine I could fix in a half hour. Making assumptions can cause you to waste time and / or ignore even larger problems in your application.

Fix crashes before anything else. While this is more of a debugging-related tip, crashes can also be caused by out-of-memory conditions. If your application runs out of memory, it will be killed by the system, which appears as a crash to the user. Fix this before you do anything else. Crashes leave a terrible impression with the user, can cause lost data, and can lead to them not trusting your application.

Optimize actions that users notice first. Don't spend a lot of time at first optimizing something that will only get you a few milliseconds of speedup, or something that runs infrequently. Find the areas that detract from the user experience, such as startup time or common button presses that cause slow actions, and work on reducing their running time first. Instruments and Shark, which we'll cover later, will help you identify where to focus on first.

Provide progress indicators and animation to make things feel faster, even if they aren't. Most times, what matters most in your application is not the wall-clock time it takes to perform an action, but how fast your application is perceived to the end user. If you have an operation that takes a long time to perform, add a status bar to give feedback on the process and your users will think your application has become much faster. If you need to go from one state to another over a period of time, animate it to hide loading times. I've done this in my applications, and users have written me to thank me for making my application faster. I did not improve the actual speed at which the application did something, but my users thought I did, and that's what matters. For another example, think about how this proposed redesign of stoplights would make those lights seem to take less time than they currently do.

Memory management

Back when we talked about understanding Cocoa, we discussed the memory management model of Cocoa and how it was applied to the iPhone. It is critical that you familiarize yourself with the memory management conventions we discussed, because almost all of the memory problems encountered on the iPhone are due to simple programming errors where these conventions were not followed.

On the original iPhone models, your application could use about 24 MB of RAM before it started getting memory warnings, with it being hard-killed at 30 MB of RAM usage. The newer models have more memory to work with, but if you want to be able to target these older devices for the greatest market size, you will need to account for these limits. If at all possible, you need to test your application on one of these older devices before release to verify that it will still function. Anecdotal reports seem to show that applications have been crashing for memory-related issues more frequently after the launch of the iPhone 3G S because some developers are only testing on these newer devices.

If your application is sent memory warnings, the -didReceiveMemoryWarning method will be triggered in your UIViewControllers. The default implementation will deallocate views that are not visible, which may cause problems if your view controllers are within a UITabBarController. This method gives you a chance to clear out caches or any unneccessary data that may relieve some of the memory pressure in your application.

Memory leaks are a significant concern on any platform, particularly the iPhone. A memory leak is an object (or other structure) that has been allocated in memory and not sufficiently deallocated before the application no longer points to it. For example, if a string is allocated and assigned to a variable, and then another string assigned to that same variable, the first string will be leaked because the application no longer has any way to access it yet it is still in memory. Memory leaks, no matter how small, can cause serious problems if they are allowed to build up over time as the application runs.

We previously described the Clang Static Analyzer, which is now integrated with Xcode 3.2+, and how it can be used to identify instances where crashes may occur in your application due to overreleased objects. It can also point out memory leaks, where objects have not been released enough. The analyzer is not perfect, in that it tends to compartmentalize methods (it doesn't see outside of a method or how it interacts with the rest of the application) and it doesn't track retain counts on instance variables (which can be a significant source of hard-to-track-down leaks). Still, it can provide a build-time check of the cleanliness of your code.

If you can, avoid using autoreleased objects in your application. Autoreleased objects can make memory management more convenient, but they can lead to problems. By default, autoreleased objects will only be deallocated at the end of the runloop, when the application-wide autorelease pool is drained. If you are autoreleasing large objects, or many small ones within a loop, your application's memory usage can spike as these autoreleased objects build up in RAM. If you try to create your own autorelease pools and drain them, you add overhead and can cause your application to stutter as the pool is being drained. By avoiding autoreleased objects, you can guarantee when objects will be deallocated and avoid the overhead of secondary autorelease pools.

As we mentioned when talking about OpenGL ES, textures and other geometry data sent to the GPU can also take up memory. On the older devices, you have a limit of 24 MB of memory that can be accessed contiguously, but it is possible to use more memory than this. Because the GPU and the OS share RAM, it is possible to exhaust system memory this way and lead to application termination. Follow the techniques we described in that previous class to alleviate this, such as texture compression and reducing geometry size.

Memory issues are your own responsibility. Any application that tells the user to reboot their device before running their application to provide maximum performance is doing it wrong.

Reducing startup time

One critical section of your application to tune is what happens on startup. Because iPhone applications will be started repeatedly, trimming even a second off of the time users have to wait on startup could significantly increase the usability of your application.

The order of execution of your application is as follows:

Your default.png image is loaded as a representation of the initial state of the application (sometimes this is a splash screen instead). This image is animated onto the screen when the application icon is touched.
The code in main.m is executed.
If your application uses a main Nib file, it is loaded into memory, which triggers your application delegate to be loaded as well.
If your application does not use a Nib file, your application delegate is initialized. It must complete the construction of the user interface.
Either -applicationDidFinishLaunching: or -application:didFinishLaunchingWithOptions: will be called within your application delegate to complete your application startup.

Your default.png will disappear from the screen, be replaced with your application interface, and your application will become responsive to user input only when -applicationDidFinishLaunching: or -application:didFinishLaunchingWithOptions: has completed. Therefore, your goal should be to make your application hit the ends of these methods as soon as it can.

The single biggest improvement you can make in this process is to defer loading or initializing anything not critical to the immediate display of the interface. As pointed out by James Thomson in his article "How To Make Your iPhone App Launch Faster", you can take this to an extreme by only throwing up a minimal UI or placeholder image, displaying a loading indicator on top of it, and performing the remainder of your application setup in a method triggered by -performSelector:withObject:afterDelay: with a delay of 0.001 seconds. Your application delegate will complete its startup, the loading indicator will animate, and the remainder of your startup processing will take place on the next pass through the runloop.

Loading data lazily is one reason why using Core Data and its NSFetchedResultsController can save you a lot of time on startup, because only the data that need to be on the screen right now are loaded from disk. Apple showed an 80% reduction in startup time using a large database on the iPhone when migrating from SQLite to Core Data.

To time the launch process, you will want to start a timer as a global variable

CFAbsoluteTime applicationStartTime;

and grab the start timestamp in main.m:

int main(int argc, char *argv[])

{

applicationStartTime = CFAbsoluteTimeGetCurrent();

NSAutoreleasePool *pool = [NSAutoreleasePool new];

UIApplicationMain(argc, argv, nil, @"MyAppDelegate");

[pool release];

return 0;

}

At the very end of -applicationDidFinishLaunching: or -application:didFinishLaunchingWithOptions: , you will want to grab the current time and compare the two to determine the total launch time:

CFAbsoluteTime elapsedTime = CFAbsoluteTimeGetCurrent() - applicationStartTime;

NSLog(@"Elapsed time for application launch: %f", elapsedTime);

The relative time difference is expressed in decimal seconds.

However, to get an accurate reading for the startup time on the device, you will want to run your application while it is not attached to the debugger through Xcode. When your application is started using Build | Build and Run, it is slowed down as the debugger is attached to it. In fact, Xcode disables the normal watchdog timer that kills your application after 20 seconds of being unresponsive. This is described in Technical Q&A QA1592: Application does not crash when launched from debugger but crashes when launched by user., along with the amusing fact that the error code for such a timeout is 0x8badf00d.

To get an accurate reading, run the application outside of Xcode a few times, with the startup time being logged to the console as in the above. Then, go the Organizer, connect to the device, and find the console logs under the Console tab. You should see the logged startup times there, as the application is actually used on the device.

We'll look at a DTrace script later that will let you get even more precise information about the order in which your application executes tasks during startup, as well as the relative times spent in each task.

Build settings

There are a few compiler settings to be aware of, as they can affect your application's performance. The first is whether to use the Thumb instruction set. The iPhone supports two instruction sets: the native ARM set, and the more compact Thumb. By default, your applications will be built to generate Thumb instructions, because this can reduce your application size by up to 35%, speeding up application launch and reducing the memory used by your application. It also means more cache hits for instructions, because of their smaller size.

However, the Thumb instruction set on the original iPhone OS devices could not handle floating point operations, so it had to kick back into ARM instructions. This added overhead and slowed these operations down. Therefore, if you have a floating-point-intensive application, like an OpenGL ES game, you may want to switch to building using the ARM instruction set. This is as simple as unchecking the Compile for Thumb build option.

One thing to note is that the newer devices, such as the iPhone 3G S, third generation iPod touch, and iPad, all support Thumb-2 instructions. Thumb-2 instructions share all the size advantages of Thumb instructions, but do not have the same floating point slowdowns as Thumb instructions, so Apple recommends that you always leave Compile for Thumb on for these newer machines.

For floating-point-intensive applications, you can make the Thumb build setting conditional so that it is off for older devices and on for newer ones. To do this, go to your build settings and select the Compile for Thumb option. Go to the menu at the bottom-left of the screen and choose the Add Build Setting Condition option. In the new build setting condition, choose ARMv6 for the architecture, turn off Thumb for it, add another condition, choose ARMv7 for its architecture, and enable Thumb for it. These conditions can also be used elsewhere to tweak optimizations for specific platforms.

The original iPhone OS devices perform single-precision floating point calculations slightly faster than double-precision (the default when specifying floating-point values), so I tend to use floats instead of doubles where I can, and I tack on a f at the end of floating point constants. The iPhone 3G S and later models have a NEON SIMD unit, that makes single-precision calculations much faster, as they can be performed in parallel. For more on the topic, see the question "Double vs float on the iPhone" on Stack Overflow.

The two different classes of iPhone OS devices out there support different ARM instruction sets. The older devices only support ARMv6 instructions, where the newer ones also support ARMv7. In order to create a "fat" binary that generates code for both types of devices, make sure that your Architectures build option is set to Optimized (armv6 armv7). This will make an application that uses the optimal instruction set for whatever device this application will run on.

Instruments

Apple has provided a number of tools for examining the performance of your application while it is running on the actual device hardware. The most important of these is Instruments, which provides a graphical environment for recording performance data and analyzing it. Various aspects of your running application can be examined using separate tools called "instruments". Each instrument focuses on one aspect of your application, and they can be combined to perform multifaceted analyses of your application as you use it in the Simulator or device. These instruments include:

Sampler - This samples all running threads in your application at a regular interval, providing a statistical representation of the heaviest areas of execution. The call tree for messages and functions can be inverted to instantly gauge the hotspots in your code. The samples can be filtered to only display Objective-C methods or hide system libraries. If there is a system library function that is a hotspot, but it is outside of your code, you can right-click on the symbol and choose Charge Symbol to Caller or Charge Library to Callers. These will take the time or number of samples accrued by the selected functions and add them to the amount recorded for any functions or methods that called them. This can help you identify where your code calls into expensive libraries.
CPU Monitor (device only) - This monitors the use of the CPU by various applications on the system, including your own.
Time Profiler (Simulator only) - New in Snow Leopard, this instrument brings some of the functionality of Shark (which we'll talk about later). It is very similar to the Sampler instrument, in that it halts the application periodically to measure what it running at that instant. However, it gathers data from kernel space, not user space, so it is more efficient than Sampler, and it can also gather data from all running processes.
File Activity - This tracks files that your application loads from, saves to, or otherwise touches. It can be useful to see not only what your application is accessing, but when.
Directory I/O - Like File Activity, this instrument monitors directory interactions.
Reads / Writes - This tracks all reading and writing activity involving the filesystem, letting you see where your application may be having expensive interactions with the disk.
ObjectAlloc - This tracks the objects that have been allocated by your application, both those currently in memory and the ones that have been deallocated. Memory allocation is one of the most expensive operations your application can undertake, so it is useful to know where and how much of this is occurring. One thing to note: the memory usage reported in ObjectAlloc is not the total memory usage of your application. For that statistic, you will need to go to Memory Monitor.
Memory Monitor - This provides a system-level view of memory consumption on the device, letting you view the memory usage for each running process, including your own application. If you select the option Track inspection head, you can observe the actual size of your application in memory at any point in its execution.
Leaks - This instrument does regular passes through your application's memory and identifies unaccessible objects or structures. These are considered memory leaks, and reported back. You can trace a leak using the detail view on the right of Instruments. Note that this will not catch all leaks, so it is best to use this with other tools, like the Clang Static Analyzer and the Memory Monitor instrument.
Core Animation (device only) - This instrument gives you a sampled framerate for your application, letting you test the performance of Core Animation and other drawing operations. More importantly than that, this instrument gives you the ability to highlight areas of your interface for you to optimize compositing or track down alignment errors. On the left in Instruments when you've selected the Core Animation instrument are several options. These include: Color Blended Layers, which will color opaque layers green and transparent layers red, Color Misaligned Images which will color any view or layer that is not pixel aligned, Color OpenGL Fast Path Blue, which will show you if your OpenGL rendering area is optimized, and Flash Updated Regions, which will flash in yellow any area that is redrawn on the screen. The fact that these are done in realtime can let you debug and optimize your interface.
OpenGL ES (device only) - This provides detailed benchmarking of OpenGL ES in your application, including many GPU-specific statistics. To alter which statistics are gathered, click on the information button on the right side of the instrument label. For example, the Tiler Utilization statistic generally indicates whether you are being limited by the size of your geometry in the GPU. If that is maxed out, you need to reduce your geometry size to increase your framerate.
Core Data Saves, Fetches, Faults, and Cache Misses (Simulator only) - There are a number of Core Data instruments which can probe the performance of your Core Data store at runtime, but unfortunately these instruments only work in the Simulator. This is because they require DTrace to run, and DTrace is not yet present on the iPhone itself.

You can bring up an extended detail view on the right of the application by either clicking on the button at the bottom of the screen or by selecting the menu option View | Extended Detail. This extended detail view displays instrument-specific information, such as stack traces or detailed statistics.

You can start your application with Instruments from within Xcode using the menu option Run | Run with Performance Tool and then selecting one of the listed instruments. These are only a subset of the instruments you can run against your application, so you can also go to Instruments manually, drag instruments to the list on the left and either start or connect to your application. To start your application, use the Default Target pulldown and choose either your computer or your attached device. From there, you can either select Attach to Process or Launch Executable, then pick an application. To start recording, click the red record button. To stop, either exit the application or click the record button again.

Multiple runs can be performed within the same Instruments document, allowing you to compare before-and-after performance for optimizations you might be testing out. To see previous runs, click on the disclosure arrow to the left of the instrument name in the top panel.

Results can be filtered in a number of ways. If you'd like to only see results from a certain range of time in your application, move the scrubber to the beginning of this block of time and click the left button above the Inspection Range group in the toolbar. Move the scrubber to the end of the range you want to profile and click the right button in that group.

You can also filter symbols by name using the search bar at the bottom of the window.

If you double-click on one of the methods in your code within an instrument like Sampler, you will be taken to your code, where the hot lines within that method will have percentage execution times listed to their right.

New in iPhone OS 3.1 is the ability to use Instruments over WiFi. This is only really useful when you want to test out an application that interacts with a USB dock connector accessory through the External Accessories framework.

DTrace

In Leopard, Apple incorporated a technology from Sun Microsystems called DTrace. DTrace is an extremely powerful profiling framework that lets you examine in detail the inner functionings of your applications, others on the system, or even the kernel itself. It operates through probes that have been placed throughout the system, which effectively turn into no-ops when you are not running a DTrace script. When you are, you can obtain realtime data on the execution of specific methods, functions, or many other types of actions.

DTrace can even be used to construct custom instruments for Instruments.

Unfortunately, DTrace is not yet incorporated into iPhone OS on the devices, so you cannot run any scripts or custom instruments against applications running on a device. This is why we can't use the Core Data instruments on the device. However, DTrace scripts can still be used to answer valuable questions about the performance of your application, even when it is just running in the Simulator.

If you are curious about the topic, I highly recommend reading my two-part article for MacResearch "Tuning Cocoa Applications Using DTrace: Writing Scripts" and "Tuning Cocoa Applications Using DTrace: Custom Static Probes and Instruments".

Briefly, you can write scripts using the D language to answer specific questions about the functioning of your application or the system overall. These scripts look something like the following:

#pragma D option quiet

objc$target::-drawRect?:entry

{

start[probemod] = timestamp;

}

objc$target::-drawRect?:return

{

printf("%30s %10s Execution time: %u us\n", probemod, probefunc, (timestamp - start[probemod]) / 1000);

}

This particular example times the execution of -drawRect: for any object that calls this method, logging the output to the console.

To start a script against a running application on your Mac, you can use the following command-line:

sudo dtrace -s dtracedrawrecttiming.d -p[pid]

To start an application and run a script against it, use the following command:

sudo dtrace -s dtracedrawrecttiming.d -c [Path to executable]

DTrace is particularly useful for determining what happens during your application's startup phase. It can be used to log all Objective-C messages in order received in your application, and the times for each method execution. I've provided a custom instrument in the DTraceStartup example that does this, logging the methods and their time of execution into Instruments, along with a stack trace so that you can identify how that method was called.

Shark

An older tool than Instruments, Shark doesn't have the polish of that application, yet it is still a very valuable tool for tracking down performance bottlenecks. I tend to use Shark along with Instruments to provide a second opinion or a different perspective on what might be slowing my application down.

The name for the tool suggests its purpose: to sniff out and hunt down any hotspots in your code. It is a very minimal, focused application.

You can run Shark against your application in the Simulator or on the device. Running against the device is a little more involved. For that, you need to start Shark, and select the Sampling | Network / iPhone Profiling menu option. This should cause the main interface to show a list of connected devices. Select one to target by clicking on the checkbox to its left.

You can perform a couple of different measurements. These include:

Time Profile - Performs a statistical sampling of a running application or all running processes on the system.
Time Profile (All Thread States) - This is the same as the Time Profile, only it also takes into account time spent idling by threads.
System Trace - This performs a system-wide trace of all running processes and kernel events. This can be really useful for viewing the states of threads, as well as context switches and locks.

To perform a Time Profile sampling on a connected iPhone, first start the application you would like to target on the device. Then, choose Time Profile from the list of options in the Config column at the bottom of the Shark main window. Choose the running application from the list under Target. Finally, click the Start button. Run for a little while and click the Stop button.

Shark will analyze the samples and present them to you in a bottom-up tree view. This shows the heaviest called methods (where the most time was spent in your application). You can invert this and show the actual call tree by selecting the Tree (Top-Down) menu option from the pull-down menu in the bottom right hand corner of the window.

Similar to Instruments, you can data mine this tree view by right-clicking on an element and choosing to charge a symbol or library to its callers. If you simply want to ignore a particular element, you can choose to remove callstacks with that symbol in them.

Double-clicking on a method from your own code will bring up that code in Shark. You might see little exclamation points that indicate some helpful hints that Shark may have about ways to optimize your application. To drop down to an even lower level, you can view the assembly code for the corresponding Objective-C by choosing the Assembly or Both button in the upper-right of the code window. If you're curious about a particular assembly instruction, you can click on the Asm Help button in the lower right, which will take you to the ARM assembly code manual page that describes the selected code.

You can expose many useful settings by bringing up the Advanced Settings view by selecting the Window | Show Advanced Settings menu option.