Redis data import tool optimization process summary, redis Data Import

Last Update:2015-06-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Redis data import tool optimization process summary, redis Data Import
Background of optimization of Redis data import tool

Developed a Redis data import tool using C ++
Import all table data from oracle to redis;
Instead of simply importing data, the original records in each oracle must be processed by the business logic,
And add indexes (redis set );
After the tool is completed, performance is a bottleneck;

Optimization results

Two sample data tests were used:
8763 records in Table a of sample data;
Table B contains 940279 records;

Before optimization, Table a took 11.417 s;
After optimization, Table a takes 1.883 s;

Tools used

Gprof, pstrace, time

Use the time tool to view the time consumed by each execution, including the user time and system time;
Use pstrace to print and run in real time, query the main system calls of the process, and find the time consumed;
Use the gprof statistical program to summarize the time consumed and concentrate on optimizing the most time consumed;
Introduction:
1. You must add-pg to all the editing and connection options of g ++ (the statistical report cannot be generated because the-pg option is not added to the connection on the first day );
2. After the program is executed, the gmon. out file will be generated in this directory;
3. gprof redistool gmou. out> report to generate a readable file report and enable the most time-consuming function in the report set;

Optimization process

Before optimization 11.417 s:

Time./redistool im a.csv real 0m11. 417 suser 0m6. 035 ssys 0m4. 782 s (it is found that the system call time is too long)

File Memory ing

System calls are too long, mainly for File Reading and Writing. The initial consideration is that the number of api calls is too frequent during file reading;
The reading sample uses the fgets row-based reading of the file. After the File Memory is mapped to mmap, you can directly use the pointer to operate the entire file memory quickly;

Log switch advance

After improving file read/write, it is found that the optimization effect is relatively limited (improved by about 2 seconds); fgets is the C file read library function, compared to the system read (), it has a buffer zone, it should not be too slow (some tests on the Internet, the File Memory ing can be an order of magnitude faster than fgets (), and it seems that the scenario should be special );

The pstrace tool later finds that log. dat is opened too many times. It turns out that the debug log switch is written to the back end, causing the debug log to open the log file open ("log. dat ");
Enable log switch in advance; 3.53 s after improvement

time ./redistool im a a.csvreal    0m3.530suser    0m2.890ssys     0m0.212s

Vector Space pre-allocated

Based on gprof analysis, the vector memory of a function is allocated many times, and there are many replications:
Improve the following line of code:

vector <string> vSegment;

Use static vector variables and pre-allocate memory:

static vector <string> vSegment;vSegment.clear();static int nCount = 0;if( 0 == nCount){    vSegment.reserve(64);}++nCount;

After optimization, It is increased to 2.286 s

real    0m2.286suser    0m1.601ssys     0m0.222s

Similarly, the vector member in another class also uses pre-allocated space (in the constructor ):

m_vtPipecmd.reserve(256);

After optimization, It is increased to 2.166 s;

real    0m2.166suser    0m1.396ssys     0m0.204s

Function rewriting & inline

Continue to execute the program and find that SqToolStrSplitByCh () function consumes too much. Rewrite the entire function logic and inline the rewritten function:
After optimization, It is increased to 1.937 s

real    0m1.937suser    0m1.301ssys     0m0.186s

Remove debuggable and optimized monitoring symbols

Finally, after removing the debug and pg debugging symbols, the final effect is 1.883 s;

real    0m1.883suser    0m1.239ssys     0m0.191s

Meet production requirements

The last few steps above seem to have a millisecond-level improvement. After the full table data is expanded, the effect is obvious;
After optimization, Table a in production is 152 million, and the import time is about 326 s (~ 6 minutes );
The data in table B is 420 million, and the import time is about 1103 s (~ 18 minutes)

Posted by: Large CC | 28JUN, 2015
Blog: blog.me115.com [subscription]
Github: Large CC

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Redis data import tool optimization process summary, redis Data Import

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Redis data import tool optimization process summary, redis Data Import

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support