Program Performance Optimization

Source: Internet
Author: User

1. Story
Background: online streaming computing, a major business version of Mario, a key module (bringing Double input data), is upgraded and launched.
Note: One of the typical paradigms of stream computing is that event streams with uncertain data rates flow into the system, and the system processing capability must match the event traffic.
The story is divided into three stages
1) after going online, there will be an online alarm, and Mario will experience a data backlog (the processing capability cannot meet the current online traffic ).
After investigation: the processed data in Mario needs to enter the remote database, and the processing thread inserts the data into the remote database in synchronous mode. In this way, the thread processing capability decreases sharply.
Solution: data is written to the disk, and another program is stored in the database.
2) After the first problem is solved, the performance problem occurs again.
Solution: Use tcmalloc (reference: http://blog.csdn.net/yfkiss/article/details/6902269)
3) after using tcmalloc, it is found that the online CPU jitter is very high, and there is a certain probability that the program hang
Query: An algorithm used to calculate the number of de-duplicated data. The dictionary is frequently constructed and deleted so that the system can frequently request and release memory, this causes CPU jitter.
Solution: for small data, use O (N ^ 2) algorithm, for big data, take O (n) algorithm (http://blog.csdn.net/yfkiss/article/details/6754786 ).

2. Principles
Program performance optimization can be done at three levels.
1) Design
2) algorithms & Data Structures
3) Code
Of course, the above three layers are only optimizations that can be done by General programmers. There are architecture and operating systems and hardware.
DESIGN: personal understanding is the most important part, including: how to process data? Multithreading or single thread? How to synchronize multiple threads? What is the lock granularity? Is the memory pool used? Synchronous or asynchronous
Algorithm and data structure: algorithm optimization can lead to a procedural leap of magnitude.
Code optimization: a typical scenario of a running program: 20% of the Code occupies 80% of the running time, and the focus of optimization is the code of 20%.
Back to story, the first phase of the problem is obviously a design problem. When network interaction is required, consider the asynchronous scheme.
In the second stage, tcmalloc is used. In essence, the memory allocation is optimized from the perspectives of design, algorithm, and code, but this optimization is done by others ~
The third stage is algorithm optimization. The original algorithm is very fast, but the memory operation overhead is incurred. In our application, the dataset 99% is very small (the average dataset size is 2 ), therefore, for small datasets, the O (N ^ 2) algorithm is used. For large datasets, the O (n) algorithm is used, which proves to be very effective. So,There is no best algorithm, but only the most suitable one.

3. How to Find the hotspot code
1) sort out programs to find out execution hotspots. Very good, but very effective
2) auxiliary tool: Google CPU profiler
Method 1 is more based on experience and a brief introduction to the auxiliary tool Google CPU profiler.
Google CPU profiler is part of Google-perftools (Google-perftools also includes tcmalloc, heap
Checkedr, heap Profiler)
Its usage is very simple:
Link to the profiler library and set the environment variable cpuprofile

4. UseAn instance of Google CPU profiler for Performance Analysis(Ld_preload, lazy method, no need to re-compile)
Code:

#include <iostream>#include <time.h>using namespace std;const int MAX_OPERATION = 2;enum TYPE{MINUS = 0, PLUS};int random(unsigned int n){        if(n != 0)        {                return rand() % n;        }        else        {                return 0;        }}void make_expression(int n){        int left = random(n);        int operation = random(MAX_OPERATION);        int right = (PLUS==operation ? random(left) : random(n));        cout << left << (operation==PLUS ? "-" : "+") << right << "=";}void make(int n, int max){        for(int i = 1; i <= n; i++)        {                make_expression(max);                if(0 != i % 3)                {                        cout << "\t" << "\t";                }                else                {                        cout << endl;                }        }}int main(int argc, char** argv){        srand((int)time(0));        if(argc != 3)        {                cout << "we need 3 argc" << endl;                return 1;        }        make(atoi(argv[1]), atoi(argv[2]));        cout << endl;        return 0;}

Set Environment VariablesLd_preload and cpuprofile
Export & quot; ld_preload =/home/work/zhouxm/google-perf_1.8.3/lib/Libprofiler. So"
Export "cpuprofile =/home/work/zhouxm/google-perf_1.8.3/bin/myprofiler"
Note: ld_preload specifies the dynamic link library that is preferentially loaded before the program runs. This function is mainly used to selectively load the same function in different dynamic link libraries. With this environment variable, we can load other dynamic link libraries between the main program and its dynamic link libraries, and even overwrite normal function libraries.This environment variable is quite dangerous and should be used with caution
Cpuprofile specifies the storage location and file name of the profiler File
Run:
$./Test 10000000 10000 1>/dev/null
Profile: interrupts/evictions/bytes = 508/228/12704

Analysis:
1) Text Analysis:
$./Pprof-text./test./myprofiler
Using local file./test.
Using local file./myprofiler.
Removing killpg from all stack traces.
Total: 508 Samples
149 29.3% 29.3% 149 _ write_nocancel
47 9.3% 38.6% 47 9.3% fwrite
41 8.1% 46.7% 41 8.1% _ io_file_xsputn @ glibc_2.2.5
41 8.1% 54.7% 41 8.1% random
33 6.5% 61.2% 33 6.5% STD: Operator <
32 6.3% 67.5% 32 6.3% STD: basic_ostream: Operator <
29 5.7% 73.2% 29 5.7% STD: has_facet
26 5.1% 78.3% 26 5.1% STD: num_put: _ m_insert_int
15 3.0% 81.3% 15 3.0% STD: basic_ostream: Sentry
14 2.8% 84.1% 97 19.1% make_expression
13 2.6% 86.6% 73 14.4% STD: num_put: do_put
11 2.2% 88.8% 11 2.2% random_r
9 1.8% 90.6% 9 1.8% strlen
7 1.4% 91.9% 7 1.4% cxxabi_1.3
7 1.4% 93.3% 7 1.4% STD: basic_ostream: Put
6 1.2% 94.5% 135 26.6% make
4 0.8% 95.3% 4 0.8% _ io_do_write @ glibc_2.2.5
4 0.8% 96.1% 4 0.8% _ init
4 0.8% 96.9% 4 0.8% STD: time_put: Put
3 0.6% 97.4% 3 0.6% _ io_file_write @ glibc_2.2.5
3 0.6% 98.0% 3 0.6% fflush
3 0.6% 98.6% 3 0.6% STD ::__ numpunct_cache: _ m_cache
2 0.4% 99.0% 2 0.4% _ gnu_cxx: stdio_sync_filebuf: File
2 0.4% 99.4% 2 0.4% STD: basic_ios: widen
2 0.4% 99.8% 2 0.4% STD: Endl
1 0.2% 100.0% 1 0.2% Rand
0 0.0% 100.0% 1 0.2% _ dynamic
0 0.0% 100.0% 8 1.6% _ bss_start
0 0.0% 100.0% 143 _ libc_start_main
0 0.0% 100.0% 143 28.1% main
2) graphic analysis
$./Pprof-dot./test./myprofiler> test. Dot
However, use graphviz to open the dot File

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.