Redis data import tool optimization process summary, redis Data Import

Source: Internet
Author: User

Redis data import tool optimization process summary, redis Data Import
Background of optimization of Redis data import tool

Developed a Redis data import tool using C ++
Import all table data from oracle to redis;
Instead of simply importing data, the original records in each oracle must be processed by the business logic,
And add indexes (redis set );
After the tool is completed, performance is a bottleneck;

Optimization results

Two sample data tests were used:
8763 records in Table a of sample data;
Table B contains 940279 records;

Before optimization, Table a took 11.417 s;
After optimization, Table a takes 1.883 s;

Tools used

Gprof, pstrace, time

Use the time tool to view the time consumed by each execution, including the user time and system time;
Use pstrace to print and run in real time, query the main system calls of the process, and find the time consumed;
Use the gprof statistical program to summarize the time consumed and concentrate on optimizing the most time consumed;
Introduction:
1. You must add-pg to all the editing and connection options of g ++ (the statistical report cannot be generated because the-pg option is not added to the connection on the first day );
2. After the program is executed, the gmon. out file will be generated in this directory;
3. gprof redistool gmou. out> report to generate a readable file report and enable the most time-consuming function in the report set;

Optimization process

Before optimization 11.417 s:

Time./redistool im a.csv real 0m11. 417 suser 0m6. 035 ssys 0m4. 782 s (it is found that the system call time is too long)
File Memory ing

System calls are too long, mainly for File Reading and Writing. The initial consideration is that the number of api calls is too frequent during file reading;
The reading sample uses the fgets row-based reading of the file. After the File Memory is mapped to mmap, you can directly use the pointer to operate the entire file memory quickly;

Log switch advance

After improving file read/write, it is found that the optimization effect is relatively limited (improved by about 2 seconds); fgets is the C file read library function, compared to the system read (), it has a buffer zone, it should not be too slow (some tests on the Internet, the File Memory ing can be an order of magnitude faster than fgets (), and it seems that the scenario should be special );

The pstrace tool later finds that log. dat is opened too many times. It turns out that the debug log switch is written to the back end, causing the debug log to open the log file open ("log. dat ");
Enable log switch in advance; 3.53 s after improvement

time ./redistool im a a.csvreal    0m3.530suser    0m2.890ssys     0m0.212s
Vector Space pre-allocated

Based on gprof analysis, the vector memory of a function is allocated many times, and there are many replications:
Improve the following line of code:

vector <string> vSegment;

Use static vector variables and pre-allocate memory:

static vector <string> vSegment;vSegment.clear();static int nCount = 0;if( 0 == nCount){    vSegment.reserve(64);}++nCount;

After optimization, It is increased to 2.286 s

real    0m2.286suser    0m1.601ssys     0m0.222s

Similarly, the vector member in another class also uses pre-allocated space (in the constructor ):

m_vtPipecmd.reserve(256);

After optimization, It is increased to 2.166 s;

real    0m2.166suser    0m1.396ssys     0m0.204s
Function rewriting & inline

Continue to execute the program and find that SqToolStrSplitByCh () function consumes too much. Rewrite the entire function logic and inline the rewritten function:
After optimization, It is increased to 1.937 s

real    0m1.937suser    0m1.301ssys     0m0.186s
Remove debuggable and optimized monitoring symbols

Finally, after removing the debug and pg debugging symbols, the final effect is 1.883 s;

real    0m1.883suser    0m1.239ssys     0m0.191s
Meet production requirements

The last few steps above seem to have a millisecond-level improvement. After the full table data is expanded, the effect is obvious;
After optimization, Table a in production is 152 million, and the import time is about 326 s (~ 6 minutes );
The data in table B is 420 million, and the import time is about 1103 s (~ 18 minutes)

Posted by: Large CC | 28JUN, 2015
Blog: blog.me115.com [subscription]
Github: Large CC

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.