Performance Optimization practices of C ++

Source: Internet
Author: User

Performance Optimization practices of C ++
Contents of C ++ performance optimization practices:

  • 1 Gprof
  • 2. gprof usage steps
  • 1. Time consumed for initializing large objects
  • 2. Improper Map use
Optimization criteria:

1. the 20% rule: in any group of things, the most important part is only a small part, about 80%, and the other, although the majority, is secondary. In optimization practices, we will focus on optimizing the 20% most time-consuming code, and the overall performance will be significantly improved; this is easy to understand. Although function A has A large amount of code, it calls only once in A normal execution process. Another function, B, has much less code than A, but is called 1000 times. Obviously, we should pay more attention to the optimization of B.
2. after coding, We can optimize the code. When coding, we always consider that the best performance may not always be good. When we emphasize the best performance encoding method, the code readability and development efficiency may be lost;

Tool: 1 Gprof

To do well, you must first sharpen your tools. We use gprof to optimize C ++ in Linux. Gprof is a GNU profile tool that runs linux, AIX, Sun, and other operating systems to analyze the performance of C, C ++, Pascal, and Fortran programs, it is used to find and solve program performance optimization and program bottleneck problems. By analyzing the "flat profile" generated when the application is running, you can obtain the number of calls to each function and the CPU time consumed (only the CPU time is counted and there is no way to handle IO bottlenecks ), you can also obtain the "Call relationship diagram" of the function, including the hierarchical relationship of the function call. How long does each function call take.

2. gprof usage steps

1) when compiling a program using gcc, g ++, or xlC, use the-pg parameter, for example, g ++-pg-o test.exe test. the cpp compiler automatically inserts code snippets for performance testing in the target code. These codes collect and record the call relationship and number of calls of the function when the program is running, record the execution time of the function and the execution time of the called function.
2) execute the compiled executable program, such as./test.exe. The running time of the program in this step is slightly slower than the running time of the normally compiled executable program. After the program is running, a file named gmon. out is generated in the path where the program is located. This file is a data file that records the program running performance, call relationship, number of calls, and other information.
3) use the gprof command to analyze the gmon. out file that records program running information, for example, gprof test.exe gmon. out. You can view the statistics and analysis information related to function calls on the monitor. The above information can also be redirected to a text file using gprof test.exe gmon. out> gprofresult.txt for later analysis.

The above is just a brief introduction to gpro. For details about gprof instances, see Appendix 1;

Practice

Our program encountered a performance bottleneck. Before adopting architecture transformation and switching to a memory database, we should consider starting with code-level optimization and try code-level optimization first. By using gprof for analysis, we found the following two most prominent problems:

1. Time consumed for initializing large objects

Analysis Report: 307 6.5% VOBJ1: VOBJ1 @ 240038VOBJ1
The entire execution process is called 307 times, and the initialization time of the object accounts for 6.5%.
This object is large and contains many attributes. It belongs to the basic data structure;
Before the program enters the constructor, the parent class Object of the class and all child member variable objects have been generated and constructed. It is a waste to assign values to constructors. If the constructor already knows how to initialize the sub-member variables of the class, you should assign the initialization information to the sub-member variables through the initialization list of the constructor, instead of performing the initialization in the constructor body. Because these sub-member variables have been initialized once before they enter the constructor.
In the C ++ program, creating/destroying objects is a very prominent operation that affects performance. First, if an object is generated from the global heap, you must first perform the dynamic memory allocation operation. As we all know, dynamic allocation/recycling has always been very time-consuming in C/C ++ programs. Because it involves finding memory blocks with matching sizes, you may need to truncate them after finding them, and then you need to modify and maintain the linked list of global heap memory usage information.
Solution: We move most of the initialization operations to the initialization list, reducing the performance consumption to 1.8%.

2. Improper Map use

Analysis Report: 89 6.8% Recordset: GetField
The getField of Recordset is called 89 times, and the performance consumption accounts for 6.8%;
Recordset is our packaging at the database level, corresponding to the record set for retrieving data; (friends who have used ADO are familiar with it); because we use the underlying c ++ database interface, by packaging the original database api, developers are not allowed to directly operate the underlying api. The advantage of such packaging is that you do not need to directly interact with the underlying database. It is much easier to write code and the code is quite readable. The problem is the performance loss;

Analysis: (2 reasons)
1) In the GetField function, map ["a"] is used to query data. If "a" is not found, map will automatically Insert key "", and set value to 0; and m. find ("a") does not automatically Insert the above pair, and the execution efficiency is higher; original logic:

1234567891011121314151617     string Recordset::GetField(const string &strName){    int nIndex;    if (hasIndex==false)    {        nIndex = m_nPos;    }    else    {        nIndex = m_vSort[m_nPos].m_iorder;    }    if (m_fields[strName]==0)    {        LOG_ERR("Recordset::GetField:"<<strName<<" Not Find!!");    }    return m_records[nIndex].GetValue(m_fields[strName] - 1) ;}

Logic after transformation:

1234567     string Recordset::GetField(const string &strName){    unordered_map::iterator iter = m_fields.find(strName);    if (iter == m_fields.end())    {        LOG_ERR("[Recordset::GetField] "<< strName <second - 1) ;}

Adjusted Recordset: the execution time of GetField is about 1/2 of the previous time, and the ease of operation is higher;

2) In Recordset, the storage of each field uses the map m_fields; in g ++, the stl standard library uses the red/black tree by default as the underlying data structure of the map;
Through document 2 in the appendix, we found that there is actually a faster structure. In terms of efficiency, the unorder map is better than the hash map, and the hash map is better than the red and black trees. If map order is not required, unordered_map is a better choice;
Solution: Replace the map structure with unordered_map, reducing the performance consumption to 1.4%;

Summary

We have modified less than 30 lines of code, and the overall performance has been improved by about 10%, with remarkable results. The key to performance optimization is to identify the points to be optimized, and the subsequent things will be successful;

Appendix:

Appendix 1: prof tool introduction and practices
Appendix 2: map hash_map unordered_map performance test

If you think this blog has some benefits, click the [recommendation] button in the lower right corner.

Posted by: Large CC | 05JUN, 2013

Blog: blog.me115.com

Weibo: Sina Weibo

Category: C ++ Programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.