Optimization criteria:
1. the 20% rule: in any group of things, the most important part is only a small part, about 80%, and the other, although the majority, is secondary. In optimization practices, we will focus on optimizing the 20% most time-consumingCode, The overall performance will be significantly improved; this is a good understanding. Although function A has a large amount of code, it calls only once in a normal execution process. Another function, B, has much less code than a, but is called 1000 times. Obviously, we should pay more attention to the optimization of B.
2. after coding, We can optimize the code. When coding, we always consider that the best performance may not always be good. When we emphasize the best performance encoding method, the code readability and development efficiency may be lost;
Tools:
1 GPROF
To do well, you must first sharpen your tools. We use GPROF to optimize C ++ in Linux. GPROF is a GNU profile tool that runs on Linux, Aix, sun, and other operating systems for C, C ++, Pascal, and FORTRAN.ProgramPerformance analysis is used to find and solve program performance optimization and program bottleneck problems. By analyzing the "flat profile" generated when the application is running, you can obtain the number of calls to each function and the CPU time consumed (only the CPU time is counted and there is no way to handle Io bottlenecks ), you can also obtain the "Call relationship diagram" of the function, including the hierarchical relationship of the function call. How long does each function call take.
2. GPROF usage steps
1) when compiling a program using gcc, G ++, or xlc, use the-PG parameter, for example, G ++-PG-O test.exe test. cpp.The compiler automatically inserts code snippets for performance testing in the target code. These codes collect and record the call relationship and number of calls of the function when the program is running, record the execution time of the function and the execution time of the called function.
2) execute the compiled executable program, such as./test.exe. The running time of the program in this step is slightly slower than the running time of the normally compiled executable program. After the program is running, a file named gmon. Out is generated in the path where the program is located. This file is a data file that records the program running performance, call relationship, number of calls, and other information.
3) use the GPROF command to analyze the gmon. Out file that records program running information, for example, GPROF test.exe gmon. Out. You can view the statistics and analysis information related to function calls on the monitor.The above information can also be redirected to a text file using GPROF test.exe gmon. Out> gprofresult.txt for later analysis.
The above is just a brief introduction to GPRO. For details about GPROF instances, see Appendix 1;
Practice
Our program encountered a performance bottleneck. Before adopting architecture transformation and switching to a memory database, we should consider starting with code-level optimization first;Using GPROF analysis, we found the following two most prominent problems:
1. Time consumed for initializing large objects
Analysis Report: 307 6.5% vobj1: vobj1 @ 240038Vobj1
The entire execution process is called 307 times, and the initialization time of the object accounts for 6.5%.
This object is large and contains many attributes. It belongs to the basic data structure;
Before the program enters the constructor, the parent class Object of the class and all child member variable objects have been generated and constructed. It is a waste to assign values to constructors. If the constructor already knows how to initialize the sub-member variables of the class, you should assign the initialization information to the sub-member variables through the initialization list of the constructor, instead of performing the initialization in the constructor body. Because these sub-member variables have been initialized once before they enter the constructor.
In the C ++ program, creating/destroying objects is a very prominent operation that affects performance. First, if an object is generated from the global heap, you must first perform the dynamic memory allocation operation. As we all know, dynamic allocation/recycling has always been very time-consuming in C/C ++ programs. Because it involves finding memory blocks with matching sizes, you may need to truncate them after finding them, and then you need to modify and maintain the linked list of global heap memory usage information.
Solution: We move most of the initialization operations to the initialization list, reducing the performance consumption to 1.8%.
2. Improper map use
Analysis Report:89 6.8% recordset: getfield
The getfield of recordset is called 89 times, and the performance consumption accounts for 6.8%;
Recordset is our packaging at the database level, corresponding to the record set for retrieving data; (friends who have used ADO are familiar with it );Because we use the underlying C ++ database interface, we encapsulate the original database API layer to shield developers from performing direct operations on the underlying API.The advantage of such packaging is that you do not need to directly interact with the underlying database. It is much easier to write code and the code is quite readable. The problem is the performance loss;
Analysis: (2 reasons)
1) In the getfield function, map ["A"] is used to query data. If "A" is not found, map will automatically Insert key "", and set value to 0; and M. find ("A") does not automatically Insert the above pair, which is more efficient;Original logic:
String recordset: getfield (const string & strname) {int nindex; If (hasindex = false) {nindex = m_npos;} else {nindex = m_vsort [m_npos]. m_iorder;} If (m_fields [strname] = 0) {log_err ("recordset: getfield:" <strname <"not find !! ");} Return m_records [nindex]. getvalue (m_fields [strname]-1 );}
Logic after transformation:
String recordset: getfield (const string & strname) {unordered_map: iterator iter = m_fields.find (strname); If (iter = m_fields.end () {log_err ("[recordset :: getfield] "<strname <second-1 );}
Adjusted recordset: the execution time of getfield is about 1/2 of the previous time, and the ease of operation is higher;
2) In recordset, the storage of each field uses the map m_fields; in G ++, the STL standard library uses the red/black tree by default as the underlying data structure of the map;
Through document 2 in the appendix, we found that there is actually a faster structure,In terms of efficiency, the unorder map is better than the hash map, and the hash map is better than the red/black tree;If map order is not required, unordered_map is a better choice;
Solution: Replace the map structure with unordered_map, reducing the performance consumption to 1.4%;
Summary
We modified less than 30 lines of code, and the overall performance was improved by about 10%, with significant results;The key to performance optimization is to identify the points to be optimized, and the subsequent things will come to fruition;
Appendix:
Appendix 1:Prof tool introduction and Practice
Appendix 2: Map hash_map unordered_map performance test
If you think this blog has some benefits, click the [recommendation] button in the lower right corner.
Posted by: Large CC | 05jun, 2013
Blog:Blog.me115.com
Weibo:Sina Weibo