How C # can read and write efficiently in massive data mysql_mysql

Source: Internet
Author: User
Tags bulk insert

Premise

Because of the work, often need to deal with massive data, do the Data Crawler-related, easily tens other data, single table Dozens of g are all common. The main development language is C # and the database uses MySQL.

The most common operation is to select the data to read and then process the data in C # before inserting it into the database. In short, select-> process-> Insert three steps. For small amounts of data (millions or hundreds of trillion) may be processed at most 1 hours. But for TENS data, it may be days or even more. So the question comes, how to optimize??

(A list of databases, there are pictures of the truth)

The first step is to solve the problem of reading

There are a lot of ways to deal with databases, so let me just list them:

1. "Heavy weapons-tank cannons" use a heavy-duty ORM framework, such as Ef,nhibernat.
2. "Light Weapons-ak47" use Dapper,petapoco, such as a single CS file. Flexible and efficient, easy to use. Home more goods are necessary (I prefer petapoco:))
3. "Cold Weapon"? Dagger? "Use the native connection, Command. Then write the native SQL statement ...

Analysis:

"Heavy weapons" must be pass directly on us, they should be used in large projects.

"Light Weapons" Dapper,petapoco look at the source code you will find that use of reflection, although the use of IL and caching technology, but still affect the reading efficiency, pass
All right, that's it. With a dagger, native SQL walks, uses DataReader for efficient reading, and uses indexes to fetch data (faster) rather than column names.
The approximate code is as follows:

using (var conn = new Mysqlconnection (' Connection String ... '))
{
  Conn. Open ();
  Sets the read timeout here, otherwise it is easy to timeout
  var c = new Mysqlcommand (' Set net_write_timeout=9999999; set net_read_timeout=9999999 ') in massive data. conn);
  C.executenonquery ();

  Mysqlcommand rcmd = new Mysqlcommand ();
  Rcmd. Connection = conn;
  Rcmd.commandtext = @ ' SELECT ' F1 ', ' F2 ' from ' table1 ';
  Sets the execution timeout for the command
  rcmd.commandtimeout = 99999999;
  var myData = Rcmd. ExecuteReader ();

  while (Mydata.read ())
  {
    var f1= mydata.getint32 (0);
    var f2= mydata.getstring (1);
    Here do the data processing ...
  }


Haha, how, the code is very primitive, or use the index to fetch data, it is easy to make mistakes. Of course, it's all for the sake of performance.

Second Step data processing

In fact this step, according to your business needs, the code is certainly different, but nothing more than a string processing, type conversion operations, this is the test of your C # basic skills. And how to write regular expressions efficiently ...

The specific code can not write Ah, first read the CLR via C # in to discuss it with me, O (∩_∩) o hahaha ~ Skip ....

Part III Data insertion

How to BULK INSERT is most efficient? Some alumni say, use affairs, BeginTransaction, and then endtransaction. Well, it does improve insertion efficiency. But there are more efficient ways to merge INSERT statements.

So how do we merge?

Insert into table (F1,F2) VALUES (1, ' SSS '), Values (2, ' bbbb '), Values (3, ' CCCC ');


is to put all the values after the comma, linked together, and then executed.

Of course, you cannot commit 100MB of SQL execution at a time, and the MySQL server has a limit on the length of each execution of the command. The MySQL server side of the Max_allowed_packet property can be viewed, the default is 1MB

Let's take a look at the pseudocode.

 Use StringBuilder efficient stitching string var sqlbuilder = new StringBuilder ();
 Add the header string sqlheader = ' insert INTO table1 ' (' F1 ', ' F2 ') values ' in INSERT statement
 Sqlbuilder.append (Sqlheader); using (var conn = new Mysqlconnection (' Connection String ... ')) {conn.
   Open (); Sets the read timeout here, otherwise it is easy to timeout var c = new Mysqlcommand (' Set net_write_timeout=9999999; set net_read_timeout=9999999 ', conn) in massive data
   ;

   C.executenonquery ();
   Mysqlcommand rcmd = new Mysqlcommand (); Rcmd.
   Connection = conn;
   Rcmd.commandtext = @ ' SELECT ' F1 ', ' F2 ' from ' table1 ';
   Set the execution timeout for the command rcmd.commandtimeout = 99999999; var myData = Rcmd.
   ExecuteReader ();
     while (Mydata.read ()) {var f1 = mydata.getint32 (0);
     var F2 = mydata.getstring (1);
     Here do the data processing ... sqlbuilder.appendformat (' {0}, ' {1} '), ', F1,addslash (F2)); if (sqlbuilder.length >= 1024 * 1024)//Of course the 1MB Length string here is not equal to 1MB packet ... I know: {Insertcmd.execute (Sqlbuilder.remove) (sqlbuilder.length-1,1). ToString ())//Remove the comma, and then execute SQlbuilder.clear ()//Empty sqlbuilder.append (Sqlheader);//In addition insert header}}}

 

All right, here's the optimized query, the insertion is done.

Conclusion

summed up, nothing more than 2 key technical points, DataReader, SQL Merge, are some of the old technology.

In fact, the above code can only be called efficient, but, but very not elegant ... Even ugly ...

So, what's the problem? How do you refactor? By refactoring the abstraction of an available class without caring for string concatenation of these messy things, supporting multithreaded merge writes, maximizing write Io, we'll talk about it in the next article.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.