C # How to efficiently read and write data to MySQL with massive data volumes,

Source: Internet
Author: User

C # How to efficiently read and write data to MySQL with massive data volumes,

Prerequisites

As a result of work, it is often necessary to process massive data and perform data crawling. Tens of thousands of GB of data is frequently used in a single table. The main development language is C #, and the database uses MySQL.

The most common operation is to read data in select, process the data in C #, and insert the data into the database. Simply put, select-> process-> insert are three steps. If the data volume is small (millions or hundreds of megabytes), the processing may be completed in at most one hour. However, tens of millions of data records may last several days or even more. So the question is, how to optimize it ??

(Database overview, with images and truth)

Step 1 solve the read Problem

There are many ways to deal with databases. Let me list them as follows:

1. [heavy weapons-tank cannon] use heavy ORM frameworks, such as EF and NHibernat.
2. [Light Weapons-AK47] use a single cs file such as Dapper and PetaPoco. Flexible, efficient, and easy to use. Essential for getting out of stock at home (I prefer PetaPoco :))
3. [cold weapons? Dagger ?] Use native Connection and Command. Then write the native SQL statement ..

Analysis:

[Heavy weapons] Here we must be passed directly. They should be used in large projects.

[Light Weapons] Dapper, PetaPoco you can see the source code, you will find that using reflection, although the use of IL and cache technology, but still will affect the reading efficiency, PASS
Well, you only need to use the dagger. Native SQL starts to use DataReader for efficient reading, and uses indexes to retrieve data (faster), instead of column names.
The approximate code is as follows:

Using (var conn = new MySqlConnection ('Connection String... ') {conn. open (); // read timeout is set here. Otherwise, it is easy to time out var c = new MySqlCommand ('set net_write_timeout = 9999999; set net_read_timeout = 100 ', conn); c. executeNonQuery (); MySqlCommand rcmd = new MySqlCommand (); rcmd. connection = conn; rcmd. commandText = @ 'select 'f1 ', 'F2' FROM 'table1'; // set the command execution timeout rcmd. commandTimeout = 99999999; var myData = rcmd. executeReader (); while (myData. read () {var f1 = myData. getInt32 (0); var f2 = myData. getString (1); // process data here ....}}

Haha, what's the problem? The code is very primitive. I still use indexes to retrieve data, which is prone to errors. Of course, we can't help it.

Step 2 Data Processing

In fact, according to your business needs, the code is certainly different, but it is nothing more than some string processing, type conversion operations, this is the time to test your C # basics. And how to efficiently write regular expressions...

The specific Code cannot be written. Read the CLR via C # and discuss it with me. O (operator _ operator) O hahaha ~ Skip ....

Part 3 data insertion

How can batch insert be the most efficient? Some people may say that they use transactions, BeginTransaction, and then EndTransaction. Well, this can indeed improve the insertion efficiency. However, there is a more efficient way to merge insert statements.

So how to merge it?

insert into table (f1,f2) values(1,'sss'),values(2,'bbbb'),values(3,'cccc');

It means to link all values with commas and then execute them at one time.

Of course, a mb SQL statement cannot be submitted at a time. The MySQL server has a limit on the length of each command. You can view the max_allowed_packet attribute on the MySQL server. The default value is 1 MB.

Let's look at the pseudo code.

// Use StringBuilder to efficiently splice the string var sqlBuilder = new StringBuilder (); // Add the insert statement header string sqlHeader = 'insert into table1 ('f1 ', 'F2 ') values '; sqlBuilder. append (sqlHeader); using (var conn = new MySqlConnection ('Connection String... ') {conn. open (); // read timeout is set here. Otherwise, it is easy to time out var c = new MySqlCommand ('set net_write_timeout = 9999999; set net_read_timeout = 100 ', conn); c. executeNonQuery (); MySqlCommand Rcmd = new MySqlCommand (); rcmd. connection = conn; rcmd. commandText = @ 'select 'f1 ', 'F2' FROM 'table1'; // set the command execution timeout rcmd. commandTimeout = 99999999; var myData = rcmd. executeReader (); while (myData. read () {var f1 = myData. getInt32 (0); var f2 = myData. getString (1); // process data here .... sqlBuilder. appendFormat ('({0},' {1} '),', f1, AddSlash (f2); if (sqlBuilder. length> = 1024*1024) // Of course the 1 MB length string here is not Packet equal to 1 MB... I know: {insertCmd. execute (sqlBuilder. remove (sqlBuilder. length-1, 1 ). toString () // remove the comma and execute sqlBuilder. clear (); // Clear sqlBuilder. append (sqlHeader); // insert header added }}}

Now, the efficient query and insertion after optimization are completed.

Conclusion

To sum up, there are only two key technical points. DataReader and SQL merge are old technologies.

In fact, the above Code can only be called efficient, but it is not very elegant... Even ugly...

So what's the problem? How to refactor it? Abstract An available class by refactoring, without having to worry about concatenate strings to splice these messy things. It supports multi-threaded merge writing to maximize write IO. Let's talk about it in the next article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.