C + + Application performance optimization

Source: Internet
Author: User

C + + Application performance optimization

[Email protected]

1. Introduction

For geometric modeling kernel opencascade, due to a large number of numerical algorithms, such as matrix-related calculations, calculus, Newton iterative solution equations, and nonlinear optimization of some algorithms, such as BFGS,FRPR,PSO, etc. for the extremum solution of multivariate functions, Therefore, the performance of these numerical algorithms directly affects the performance of the system. The performance optimization of software is an important factor in the process of computer software development, so it is necessary to learn the method of performance optimization of C + + application.

Finding the relevant information on the Internet, found that there is little information, and finally found a published by the electronics industry Press, "C + + Application Performance Optimization method", from which you can learn the performance optimization method of IBM.

This paper mainly combines the C + + performance optimization method and a code example to illustrate the effect of memory optimization processing on program performance. After reading this book, actually found that the C + + performance optimization method is mainly dependent on computer-related basic knowledge, such as computer operating systems, data structures and algorithms and so on.

2.Memory Optimize

Storage space in C + + programs can be divided into static/global storage, stack area and heap area. The size of the static/global store and the stack area is generally determined during the program compilation phase, while the heap area changes dynamically as the program runs, and each time the program runs, it behaves differently. This dynamic memory management has a very important impact on the memory size and program performance that a program occupies during its operation.

Because the static/global store is determined at compile time, and the stack area is managed by the operating system, these are not bothered by programmers. This is the heap area that is provided to the programmer free stage, can be self-willed to play. How to play the storage area of the heap area? Understand these three questions should know: first, how to generate the heap area data (from where)? The second is how the data in the heap is destroyed (where to go)? How does the heap area data be accessed? As specified by the C + + standard, the C + + implementation provides dynamic memory access and management through global new and delete. Access to the heap data is through pointers. When you use New/delete to manipulate storage areas on the heap, the operating system manages the storage areas of the heap, so this administrative effort has an impact on the performance of the application.

To solve the memory leak problem, the macro definition of the handle smart pointer is not delete,opencascade when new is no longer used. Handle is also very simple to use, just use handle (Class) to do it.

Allocating and freeing memory on the heap with the default memory management New/delete has some additional overhead. When the system receives a certain size memory request, it first looks for the internal maintained memory free block table, and needs to find a suitable size block of memory according to some algorithm (such as allocating a memory block that is not smaller than the requested size to the requestor, or allocating the memory block that is most suitable for the size of the request). If the free memory block is too large, you also need to cut the allocated portion and the smaller free block, and then the system updates the memory block table to complete a memory allocation. Similarly, when the memory is freed, the system re-joins the freed memory block to the table of free memory blocks. If possible, the adjacent free blocks can be combined into larger free blocks.

The default memory management function also takes into account multi-threaded applications, which require locking each time the memory is allocated and freed, as well as increasing overhead. It is visible that if an application frequently allocates cells on the heap to free memory, it can result in a loss of performance. It also causes a large amount of memory fragmentation in the system, reducing the utilization of memory.

Thus, in the simple new and delete behind, the system silently for us to do so many things, and do these things are going to take time ah! Although the default memory management algorithms also take performance into account, more general scenarios are considered, and more needs to be done in order to cope with more complex and broader situations. For a specific application, the appropriate memory management can achieve better performance, Opencascade introduced its own memory management mechanism, similar to the memory pool concept. OCCT memory is configured to use no optimization techniques or to use no optimizations to directly use the system's malloc and free. These configurations are implemented through environment variables, where the switch to turn memory optimization on and off is the environment variable: mmgt_opt, which defaults to 0, which means no optimizations are used, and 1 is used; setting to 2 is the use of TBB's memory optimization technology (this assumes that the third-party library is properly configured, If there is no or use of malloc and free;

// paralleling with Intel TBB #ifdef HAVE_TBB   <tbb/scalable_allocator.h>  usingnamespace  TBB; #else  #define scalable_malloc malloc  #define scalable_calloc calloc  #define Scalable_realloc realloc  #define scalable_free free#endif

Here's a code to see what happens to the performance of using memory management.

3.Code Example

Here's a procedure to see the performance impact of using these technologies:

/** Copyright (c) Shing Liu All rights reserved.** file:main.cpp* author:shing Liu (email Protected]) * date:2016-07-31 11:54 * version:opencascade7.0.0** description:test occt Memory Ma Nager and Handle (smart pointer).*/#include<OSD_Timer.hxx>#include<Poly.hxx>#include<Poly_Triangulation.hxx>#pragmaComment (lib, "TKernel.lib")#pragmaComment (lib, "TKMath.lib")/** @brief Test memory without Handle (smart pointer) management. **/voidtestmemory (Standard_integer thecount) {Osd_timer atimer;    Atimer.start ();  for(Standard_integer i =0; i < Thecount; i++) {poly_triangulation* Atriangulation =NewPoly_triangulation (Ten,5, Standard_false); Deleteatriangulation;    } atimer.stop (); Atimer.show ();}/** @brief Test memory with Handle (smart pointer) management.**/voidTesthandle (Standard_integer thecount) {Osd_timer atimer;    Atimer.start ();  for(Standard_integer i =0; i < Thecount; i++) {Handle (poly_triangulation) atriangulation=NewPoly_triangulation (Ten,5, Standard_false);    } atimer.stop (); Atimer.show ();}/** @brief Set environment variable mmgt_opt=0 to use Malloc/free directly; * Set Environment Varialbe mmgt_opt= 1 to use OCCT memory optimization technique; * Set environment variable mmgt_opt=2 to use paralleling with Interl TBB; */intMainintargcChar*argv[]) {    intAcount =100000; Std::cout<<"\ntest pointer without handle"<<Std::endl;    Testmemory (acount); Std::cout<<"\ntest pointer with handle"<<Std::endl;    Testhandle (acount); return 0;}

If you run the program that compiles the above code directly, the results are as follows:

By the figure, when using handle (smart pointer), time has little effect, that is, the use of handle on the performance impact can be ignored, but there are many benefits, the main thing is not to delete the. Using Visual Studio's profiling Tools, the results are similar:

The overhead of time is also focused on the allocation of memory:

Notice that the above allocate () is class Standard_mmgrraw, that is, the system's malloc and free are directly used to manage memory. The following sets the environment variable mmgt_opt=1 to use OCCT's memory-optimized class to see what the performance impact is?

The results of the program run are as follows:

Compared to 0.1 when memory optimization is not used, memory-optimized processing is about 40% faster.

As can be seen, the performance hotspot is also concentrated on the allocation of memory:

Note that memory allocation now uses the ALLOCATE function in the Standard_mmgropt class:

In general, setting the environment variable mmgt_opt to one using Occt's memory optimization algorithm, performance improvement is still very obvious.

4.Conclusion

When the size of the program becomes larger, the more complicated the algorithm, the more troublesome it is to find the bottleneck of performance. Performance optimization The first step is to measure and find the right measuring tool. "C + + Application performance optimization" In the book provides IBM Rational Quantify, search on the internet and the Intel VTune amplifier, and other features are very powerful. Visual Studio, which is used by developers in Windows, comes with performance analysis capabilities that are easier to use.

To find a performance bottleneck, it is necessary to analyze the cause, and then modify the program to improve performance. This methodology can be used for reference in C + + application performance optimization, from data structure, program initiation, memory management and other aspects of analysis. Pick out the flowchart of the program performance optimization in this book:

5. References

1. Feng Honghua, Team, Chengyuan, Springer. C + + Application performance optimization. Electronic industry Press.

2.Scott meryers. Effective C + + (Commentary edition). Electronic industry Press. 2011

3.OpenCASCADE Foundation Classes Document 7.0.0. 2016

PDF version:c++ Application Performance optimization

C + + Application performance optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.