http://blog.csdn.net/innost/article/details/9008691
Authorized by the author, published Tieto a young cow of a "programmer" masterpiece.
Introduction to the Android System Performance tuning tool
In the software development process, presumably many readers have encountered the system performance problem. Several major steps to solve the system performance problem are:
- Evaluation: A large number of targeted testing system to obtain the appropriate test data.
- Analyze system bottlenecks: Analyze test data to find the hotspot (hot spot, i.e. bottleneck).
- Performance optimization: Optimized for hotspot-related code.
From the above steps, the target object of performance optimization is hotspot. If the found hotspot is not really hot, then the result of performance optimization must be less than even naught. So, as part of the first section of Android performance tuning, this article will first introduce readers to the three important performance testing tools on the Android platform that can help developers find hotspots.
A TraceView Introduction 1.1 TraceView Introduction
TraceView is an Android platform-specific data acquisition and analysis tool that is used primarily to analyze hotspots for applications in Android. TraceView itself is just a data analysis tool, and data acquisition requires the use of the Debug class in the Android SDK or the use of the Ddms tool. The use of the two is as follows:
- The developer calls the Startmethodtracing function of the Debug class in the Android SDK before starting some critical code snippet and calls the Stopmethodtracing function before the end of the critical snippet. During the run of these functions, the function of all threads of the application (note, Java threads only) will be collected during run time, and the collected data will be saved to a file under/mnt/sdcard/. Developers then need to use the TraceView tool in the SDK to analyze the data.
- With the Ddms tool in the Android SDK. Ddms can capture function call information for a running process in the system. For developers, this approach works for situations where the source code is not applied to the target. The TraceView in the Ddms tool is shown using 1-1.
Figure 1-1 Ddms in TraceView use
Click the button shown in Figure 1-1 to capture data from the target process. When the collection is stopped, the DDMS automatically triggers the TraceView tool to navigate through the data collection.
Below, we introduce the Debug class and the use of TraceView to the reader through an example program.
1.2 TraceView Sample Analysis
The sample program runs as shown in interface 1-2:
Figure 1-2 Sample Interface diagram
In Figure 1-2:
- SYSTRACEDEMOSTRINGAAA is the string that the ListView initially displays when the Traceviewdemo program starts.
- When the user taps an item in the ListView, TraceView calculates the MD5 value of the string currently displayed by the corresponding item 40 times, and then replaces the item displayed before it with the computed MD5 string. The "MD5 value" arrow in its effect 1-2 is shown.
The example is shown in key code 1-3:
Figure 1-3 Sample Code
The figure 1-3 shows that:
- On the left, the Debug class's startmethodtracing and stopmethodtracing are called in Mainacvtivity's constructor and OnDestroy functions, respectively.
- In the OnCreate function we set the first hotspot, the Getstringtoshow function. It parses an XML file and saves the parsed string to Mlistitem as the ListView display.
- On the right, when the user taps an item in the ListView, the program calculates the MD5 value 40 times in Onlistitem and then displays the result as a new string of clicked items. The function in GenerateMD5 is the second hotspot of this example.
Now, we use the TraceView tool to open the test results file traceviewdemo.trace.
The TraceView interface is complex, and its UI is divided into up to two panels, the timeline panel (timeline panels) and the profile panel. Figure 1-4 shows the timeline panel interface:
Figure 1-4 Traceview Timeline Panel
The timeline panel in Figure 1-4 can be subdivided into two pane of left and right:
- The left pane shows the thread information collected in the test data. As shown in Figure 1-4, this test data captures the information of the main thread, two binder threads, and other system worker threads (such as GC threads).
- The pane on the right shows the timeline, which is the function call information involved in each thread test time period. This information includes the function name, function execution time, and so on. As shown in Figure 1-4, the contents of the main thread corresponding to the row are very rich, while the other threads are doing much less work during this time period.
- In addition, developers can move the timeline vertical axis in the Timeline pane. The top of the vertical axis displays the function information that a thread is executing in the current point in time.
Now look at TraceView's profile panel interface, which is shown in 1-5:
Figure 1-5 Traceviewprofile Panel interface
The profile panel is the core interface of TraceView, and its connotation is very rich. It mainly shows the case of each function call in a thread (select a thread in the timeline panel first), including information such as CPU usage time, number of calls, and so on. This information is the key to finding a hotspot. So, for developers, be sure to understand the meaning of the columns in the profile panel. The author summarizes the role of several of these important columns, as shown in table 1-1:
Table 1-1 Profile Panel column function description
Column name |
Description / td> |
Name |
Function name called during run of this thread |
Incl Cpu time td> |
The CPU time that a function consumes, including the CPU time that internal calls to other functions |
Excl CPU time |
A function consumes CPU times, But does not contain the CPU time occupied by internal calls to other functions |
Incl Real time |
A function is running in milliseconds, containing the truth of the call to other functions Time |
Excl Real time |
A function runs in milliseconds, without the actual time it takes to invoke other functions |
Call+recur calls/total |
The number of times a function was called and the percentage of recursive calls to the total number of calls |
Cp U time/call |
A function calls the ratio of the CPU time to the number of calls. Equivalent to the average execution time of the function |
Real time/call |
Similar to CPU time/call, except that the statistical units have been replaced by real time |
In addition, each time column should have a column that is statistically counted by the percentage of times (for example, the incl CPU is a column with a column named incl CPU time%, which represents the incl CPU times that are counted as a percentage of a second).
Now that you're done with TraceView's UI, here's how to use TraceView to find a hotspot.
In general, the hotspot includes two types of functions:
- One is a function that has a few calls, but takes a long time each time it is called. In the sample code, it is hotspot 1.
- A class is a function that takes a very long time but is called very frequently. In the sample code, it is hotspot 2.
First, let's look for hotspot 1.
In the profile panel, select Sort by CPU Time/call (top-down, each item is time-consuming from high to low) and get the result shown in 1-6:
Figure 1-6 Data in descending order by CPU Time/call
In Figure 1-6:
- Mainactivity.oncreate is a function in an application that takes 4618.684 of its time. Then, click on the mainactivity.oncreate item to get the small image shown by the arrow.
- In the small graph, the parents line shows the caller of Mainactivity.oncreate, in this case it is the performcreate function. This part of the code belongs to the framework section. The children line shows the child functions that are called by the mainactivity.oncreate.
- In the mainactivity.oncreate called sub-function, we found that getstringstoshow in the incl Cpu time% of a column occupies 63.3%, it is the OnCreate child function takes the longest, and calls+recur The Calls/total column shows that the number of calls is 1, that is, it is called only once. This function is implemented by the application, so it is most likely a potential hotspot.
- In addition, as I already know getstringstoshow is the example of the implementation of the function, so in figure 1-6 in the larger image, This information can be directly based on mainactivity.getstringstoshow spending 2921.913CPU of time to determine the hotspot is it.
The hotspot for type 1 is relatively easy to find, and the steps are to sort the time items in descending order (which can be a percentage of time, real time, or CPU time), and then look for the most time-consuming functions. In general, the function of the application's own implementation should be checked first, the function of the framework may also be a hotspot, but the main reason is still in the application itself (for example, to set a complex interface, resulting in very slow XML parsing).
Now, let's see how to find a hotspot of type 2.
Click the call/recur calls/total column header to sort in descending order. The focus is on those functions that are frequently called and resource-intensive. Figure 1-7 shows a result graph in descending order.
Figure 1-7 Type 2 One of the hotspot lookup process schematic
As shown in Figure 1-7, several of the most frequently run functions, we have found several suspicious points, indicated by the 1 and 2 arrows in the figure.
- In combination with the code, the function that the arrow 1 refers to does not actually exist in the code. This is because the code directly accesses the private members of the inner class, causing the Java compiler to automatically generate this function at compile time. This function is called very many times. Therefore, in order to improve efficiency, we can modify the access type of the inner class member to be public. However, the incl Cpu time of the function is not high, only 3.2%.
- Similarly, the number of function calls in the part of Arrow 2 is also many, reaching more than 5,888 times. However, they occupy only 0.9% of the time.
Once the potential points for the first lookup are excluded, continue to browse the data and get the results shown in 1-8.
Figure 1-8 Type 2 The hotspot lookup process schematic two
In Figure 1-8:
- There are two overloaded mymd5.gethashstring function calls in the red box, each running 368 times and taking up a CPU time percentage of 31.8% and 53.2%. Obviously, there is room for optimization in these 2 calls, which is the hotspot2 we suspect.
Once the hotspot is found, the developer will need to combine the code to make the corresponding optimizations. For Java code optimization, readers can refer to the following information: http://developer.android.com/training/articles/perf-tips.html
In general, hotspot lookup is a meticulous work that requires developers to be familiar with the code of the target program, as well as the TraceView tools.
Summary of 1.3 TraceView
TraceView tools are a powerful tool for performance analysis of Android platform applications. But I think the UI is a little bit more complicated. And the feeling of fluency is not good enough when used.
Google's official introduction to TraceView can be found in the following links, but the content is not updated for a long time. Http://developer.android.com/tools/debugging/debugging-tracing.html.
Systrace Introduction 2.1 Systrace Introduction
Systrace is the new performance data sampling and analysis tool in Android4.1. It helps developers to analyze system bottlenecks and improve performance by collecting operational information on key Android subsystems such as Surfaceflinger, Windowmanagerservice, and other framework-critical modules and services.
The Systrace features include tracking system I/O operations, kernel work queues, CPU load, and the health of various Android subsystems. In the Android platform, it consists mainly of 3 parts:
- Kernel part: Systrace leverages the Ftrace functionality in Linux kernel. Therefore, if you want to use Systrace, you must turn on kernel and ftrace related modules.
- Data Acquisition Section: Android defines a Trace class. The application can use this class to output statistics to ftrace. At the same time, Android also has a atrace program, which can read statistics from the ftrace and then hand it to the data analysis tool to handle.
- Data Analysis Tool: Android provides a systrace.py (Python script file, located in the Android SDK directory/tools/systrace, which internally calls the Atrace program) to configure how data is collected, such as the label Output file name, etc.) and collect ftrace statistics and generate a result page file for users to view.
Essentially, Systrace is the encapsulation of ftrace in Linux kernel. The application process needs to use the Trace class provided by Android to Systrace. Android 4.1 Adds Systrace functionality to several key processes and modules in the system. To display the important modules in the system Hwcomposer as an example, its code uses the Systrace method 2-1:
Figure 2-1 Hwcomposer Module Systrace Use example
In Figure 2-1, the application can use Systrace with just three macros:
- The definition Atrace_tag:hwcomposer uses Atrace_tag_graphics, which indicates that it is related to the GRAPHICS.
- Atrace_init: Used to count the usage of a variable. The statistical results of "VSYNC" in the code are seen below.
- Atrace_call: Used for statistical function invocation cases.
Because of the space relationship, more information about Trace is available for readers to read frameworks/native/include/utils/trace.h or Android.os.Trace classes. Below, we demonstrate the use of systrace with an example.
2.2 Systrace Instances
First, run the following command on the PC to start systrace,2-2 as shown:
Figure 2-2 Systrace Operation steps
After executing the above command, you will get a file named Trace.html (trace.html is the default file name and the reader can also specify a different file name on the command line). Open this file through a browser, as shown in result 2-3:
Figure 2-3 trace.html content schematic
The trace.html page content shown in Figure 2-3 is very similar to the TraceView timeline panel. The diagram contains the following content:
- Because the-f-l and-I parameters are specified in systrace.py, Systrace generates CPU frequency, payload, and status-related information. They are shown in the first red box in Figure 2-1. As the author of the mobile phone CPU is dual-core, so the figure has CPU 0 and CPU 1 points. For the convenience of writing, the author uses CPU n to refer to a kernel of CPU.
- The line shown in "CPU N" corresponds to the process information that runs on a core during the entire test time.
- "CPU N c-state" shows the behavior of a certain CPU state during the entire test time. C-state values are shown in table 2-1.
- The line shown in "CPU N Clock Frequency" shows how often a CPU is running. You can see how often CPU N runs at a point in time by clicking on the color block of this line.
- "Cpufreq": This line shows the work of the CPU interactive frequency adjuster (Interactive Governor). The interactive CPU Tuner driver adds a trace of CPU frequency throttling events. Interested readers may wish to read the Include/trace/events/cpufreq_interactive.h file in kernel for more information.
In Figure 2-1, the following lines of CPU information are the statistics that are added through the macros provided by Trace.h, where:
- VSYNC: The statistics for this row are from the use of the Atrace_init macro in Figure 2-1. In the Hwcomposer code, the Atrace_init macro is used to count the tick-tack of Vsync[1] (that is, the 0,1,0,1 interleaved output). The VSync line shows that the time of every tick tack is about 16ms.
- Because the framework code also adds the use of Atrace_init in the Display section, the figure com.example.systracedemo/ Com.example.systracedemo.MainActivity shows the Tick-tack case where the application occupies the display buffer. If the use time exceeds 16ms, will cause the interface to display hysteresis and so on phenomenon.
- Surfaceflinger uses the Atrace_call macro, so the Surfaceflinger line in the figure shows the CPU time of its function call (as indicated by arrow 1, The run information for the onmessagereceived function in Surfaceflinger).
- In the bottom-most box of Figure 2-1, details of the current mouse selection in the timeline (that is, onmessagereceived in Surfaceflinger) are shown in detail.
The CPU status value information is shown in table 2-1:
Table 2-1 CPU Status
C-state |
Describe |
C-0 |
Run mode. |
C-1 |
STANDBY, in-place mode, ready to put into operation |
C-2 |
Dormant, sleep state, wake-up delay when put into operation |
C-3 |
SHUTDOWN, off state, need to have a long delay to enter the running state, reduce power consumption |
Summary of 2.3 Systrace
Overall, Systrace is more versatile than TraceView, which allows performance data sampling of CPUs, native processes, and even kernel threads to help developers perform a detailed analysis of the performance of the entire system. However, it is more complex to use than TraceView, and there is a need to make some configuration adjustments to kernel.
Android official also has some introduction to Systrace, please read the reader:
Http://developer.android.com/tools/debugging/systrace.html
3.1 Oprofile Introduction to the use of three oprofile
Oprofile is another more powerful performance data acquisition and analysis tool that works as follows:
- It leverages performance counters (performance Counter) or timers (for cases where performance counters are not supported by kernel) and obtains statistical data through continuous sampling for performance analysis of kernel and user-space processes.
- As an example of performance counters, when an event occurs, the corresponding performance counter will be added to the system during operation. An interrupt is generated when the counter's set value is reached. The Oprofile drive uses this interrupt to perform sampling statistics. By getting the value of the PC pointer at the time of the interruption and the information about the tasks that are saved in the kernel, and converting them into useful data for the evaluation.
- The oprofile includes two parts of the kernel-driven and user-space tools, including:
- The kernel driver implements a Oprofilefs virtual file system. It mounts to/dev/oprofile, which is used to report data to user space and to receive settings from user space. It is a bridge between user-space process and kernel communication. The drivers also include schema-dependent and generic drivers that access performance counter registers, collect data, and report to user space. The daemon user receives data from the kernel and saves it on disk for analysis.
- There are two tools available in the User space: oprofiled (as daemons interact in the background through and/dev/oprofile to get the data that the drive collects), Opcontrol (a control tool for user actions that controls sampling-related settings by reading and writing oprofilefs).
Android provides support for Oprofile by default, and its composition includes:
- Code: Located in Exetrnal/oprofile. However, it is only used by systems that have a compiled type other than user.
- Four main tools, namely opcontrol,oprofiled, Opreport and Opimport. Developers can only use Opcontrol and Opreport.
- Readers should be familiar with the role of Opcontrol and oprofiled tools, and we also summarize their usage here:
- Opcontrol: It is used to control the sampling process, such as the start and end of a sample, the type of event sampled, and the frequency. It is implemented internally through read-write Oprofilefs. The common options for Opcontrol are shown in table 3-1:
Table 3-1 Opcontrol Common options
Opcontrol Options |
Function |
--list-events |
List events supported by the current CPU |
--setup |
Set up the evaluation, such as shutting down the old daemon, mounting the Oprofilefs |
--vmlinux= |
Set the Android kernel image file that will be analyzed |
--callgraph |
Set the number of layers for a trace function call |
--kernel-range=start,end |
Virtual address of the beginning and end of the kernel binary file |
--start/--stop |
Start/Stop Sampling |
--event=name:count:unitmask:kernel:user |
Sets the sampling of an event. Name: Names of events Count: The number of times the event occurred during sampling Unitmask: The Mask of the event (CPU-supported events and masks see Oprofile documents) Kernel: Whether to sample kernel events User: Whether to sample users events |
- Opreport:opreport is a tool for generating reports using sampled data that can generate different reports based on user requirements. The general usage is "opreport [options] [Image]", where image specifies the name of the program that the report needs to display (referring to the program name, shared library name, and kernel). The image parameter is optional. When you do not specify it, Opreport prints the report results for all processes. Common options are shown in table 3-2:
Table 3-2 opreport Common options
OPREPRT Options |
Function |
-L |
Displays the symbolic name of the function call |
-G |
Print function symbols in the form of debugging, including the file and number of rows in which the function resides. |
-C |
Displaying the function call stack |
-O |
Report output to the specified file |
In addition, Android offers a special tool for Opimport_pull. It can pull the sampled data from the phone to the PC, and some simple processing of the data for opreport use. So, on the Android platform, developers just have to use opimport_pull.
Now, let's take a look at the usage examples of oprofile.
3.2 Oprofile Instances
The use of oprofile can be broadly divided into the following three steps:
- The kernel loads the Oprofile driver (you can skip this step if the driver is statically compiled into the kernel).
- Configure the sampling event, and then sample.
- Get reports, analyze them, and improve on the results.
Here are three steps to look at each of the following:
3.2.1 Oprofile Kernel Configuration
Examples of kernel configurations are shown below, as shown in 3-1:
Figure 3-1 oprofile kernel configuration schematic
Running oprofile requires root access, so it is best to run the Userdebug or engineer version of the Android OS on the target device.
3.2.2 Oprofile User space configuration
Oprofile is shown in example 3-2 of the user space configuration. Suppose the current directory is the Android source root directory, and the Android compilation environment has been initialized (executed build/envsetup.sh and lunch).
Figure 3-2 Oprofile User space configuration schematic
The configuration of the user space is done primarily by executing the Opcontrol command. The Opcontrol interior is accomplished by passing the corresponding control parameters to the Oprofilefs. The "Opcontrol--callgraph=16" command in example 3-2 can also be achieved by "Echo 16>/dev/oprofile/backtrace_depth".
3.2.3 Results Analysis
In the previous step, we have obtained the data for the sample evaluation. You can now use them to generate sample reports, as shown in method 3-3:
Figure 3-3 Oprofile Generating Sampling report method schematic
Figure 3-4 is part of the report:
Figure 3-4 Oprofile Assessment Report summary
In Figure 3-4, we find that the number of samples for the libc.so call is 117299, and the 4th digit is ranked. So which function is the most frequently called in libc.so? Developers can get more detailed information about libc.so by following the commands below. Method 3-5 shows the following:
Figure 3-5 Opreport Use Example
After executing the above command, result 3-6 shows:
Figure 3-6 Oprofile detailed results for libc
As shown in Figure 3-6, the memcpy () function consumes the most CPU resources. So you can consider optimizing memcpy ().
Here, the author uses arm assembly method to optimize the memcpy for CORTEX-A9 dual-core SMP processor. The optimized results are shown in 3-7. Compared to figure 3-6 and figure 3-7, it is obvious that the optimized memcpy has a 2.7% reduction in resource occupancy.
Figure 3-7 Test results after optimizing memcpy ()
Summary of 3.3 Oprofile
In performance analysis, Oprofile is undoubtedly one of the most widely used and most powerful evaluation tools. For Android platform developers, it can collect and analyze the operation status information of the whole system, which is of great significance for analyzing the system bottleneck and optimizing the system.
Iv. Summary
Performance tuning has always been an "enigmatic" task, but the tools to break their veil are the tools described above. Therefore, for those who are interested in doing this work, the first step is to understand the role of each tool and the pros and cons.
In addition to the three tools described in this article, the Android system supports a number of other more targeted testing tools, such as lmbench,lttng for evaluating the overall functionality of the system, Bootchart for testing system startup performance, and IOzone for testing file system performance. Due to the length of the relationship, I will not introduce them.
[1] For more information on VSync, readers can refer to http://blog.csdn.net/innost/article/details/8272867, "Android Project Butter Analysis" article.
Introduction to the Android System Performance tuning tool