How to handle millions data written to the database

Source: Internet
Author: User
Tags php cli what array
In a text file stored 1 million data, one line,

I need to write each line of data that matches the criteria into the database.

According to the previous practice is to read the data in the file, and then into the array, and then the foreach array to do a one-line processing (eligible to write to the database),

But in the face of the millions data, if I continue to do this seems to be suicide, but to deal with big data is really a girl I sat sedan but the first time, no experience,

From the Internet to find information on the PHP process/thread to solve, I am a process and thread is a sewage, please Daniel come in to share this experience, how to deal with big data, if through the process/thread and how to achieve it?

Reply content:

In a text file stored 1 million data, one line,

I need to write each line of data that matches the criteria into the database.

According to the previous practice is to read the data in the file, and then into the array, and then the foreach array to do a one-line processing (eligible to write to the database),

But in the face of the millions data, if I continue to do this seems to be suicide, but to deal with big data is really a girl I sat sedan but the first time, no experience,

From the Internet to find information on the PHP process/thread to solve, I am a process and thread is a sewage, please Daniel come in to share this experience, how to deal with big data, if through the process/thread and how to achieve it?

Millions ... Not big numbers ah ...

Data processing bottlenecks basically in Io, you can directly single-threaded read and write (especially you have to do database insertion, add a random index of your insertion becomes a bottleneck).

But why do you want to deposit the array AH classmate! Are you going to cram all the files into memory? Read a, judge once, OK to save the database is not OK to throw away, so simple, what array to save ...

and are you really ready to do data processing with PHP ...

Based on 10 million rows of data, let's say you're most familiar with PHP and develop the fastest, assuming you're writing to MySQL.

  1. Using the shell to cut 10 million rows of files into 100 files, so that each file has 100,000 lines, the practice can be man split.

  2. Write PHP script, the script content is read a file, and then output the valid data. Note The data format, which is written in the order of the fields in the table, separated by half-width semicolons and separated by \ n between the rows. Specific parameters can be configured, see MySQL's Load Data command parameters. Note that the PHP CLI works, do not run this thing under Apache or any other Web server. If you don't know what to do with a row, you can directly use the PHP file () function, and the resulting SQL statement is written to the file by Error_log ($sql, 3, "/path/to/dataa") function. At the same time can echo some debugging information, for follow-up inspection.

  3. Write the shell script to invoke the PHP processing log. Scripts can be written like

    /path/to/php/bin/php-f genmysqldata.php Source=loga out=dataa >/errora.log &/path/to/php/bin/php-f genmysqldata.php source=logb out=datab >/errorb.log &/path/to/php/bin/php-f genmysqldata.php source=logc out= Datac >/errorc.log &, ..... Repeat 100 lines, the machine configuration is low can be written in batches, each write 10 lines also line. This script content is very regular, itself can also be generated in PHP. Time has been saved again. Executing this shell script on the machine actually launches multiple PHP processes to generate the data. If you configure enough cattle, you start 100 PHP processes to process the data. It's fast.

  4. Continue to write the shell script, which opens MySQL with load data.

    mysql-h127.0.0.1-uuser-ppwd-p3306-d dbname-e ' Load data infile '/path/to/dataa ' into table TableName (Field1, Field1, FIELD1); ' One of the field1 ... to correspond to the order in which the data is generated, this command can be executed directly, or it can be put into the shell to repeat N rows and execute the shell script.

PS: Note coding

If it is a one-time to import this 1 million data MySQL is done, with the MySQL load data can be done, I use load data to import the previous CSDN leaked account password 6.5 million or so the data will be more than 2 minutes more points ...
If it is multiple reuse program, want to make a script, can read 100,000 every time, foreach outside the explicit open transaction (PS: Loop Insert must explicitly open transactions, performance is better, once written, then unified commit,10 speed up the minimum hundred times even thousand, disk IO is also low), Variables run out remember unset, insert 1 million data that's a small case.
You can also use INSERT into values (2,3) ... Stitching the way, the performance is the fastest. Note, however, that the SQL statement is limited in length and can be inserted 1000 at a time. Without explicitly opening transactions, foreach inserts are the most garbage, the slowest, and the IO pressure is the biggest, because each insert, there is an expensive system call Fsync (). Looping 1 million times is equivalent to calling 1 million times fsync. Displays the open transaction, once every 100,000 commit calls once fsync,100 million times only calls 10 times fsync.

This scenario, the bottleneck is no longer PHP, 1 million data insertion takes a short time,
Do not have to read all the memory once, read one at a time, match the conditions of the SQL, do not rush to plug in MySQL,
And so the spell hundreds of in the plug once, a lot faster, pay attention to the last spelling out the SQL size not more than Max_ allowed_ packet,

Using Python, first read the text in Python to generate SQL, and then import with source, ok!

f = open("/data/data.txt")t = open("/data/sql.txt", 'w+');s = '';i = 0;line = f.readline();while line:    a  = line.split("\t");    t.write("INSERT INTO `table`(`id`, `add_time`) VALUES");    b = "('%s', '%s');" %(a[0], a[2]);    line = f.readline();     if line == '':        b = "('%s', '%s');" %(a[0], a[2]);    t.write(b);f.close();t.close();

Add, remember to delete the index, and then load data in file. Finally add the index, so it will be much faster

In fact, why don't you try to write a thread to handle the import of data, before importing I suggest to open things, I generally prefer to open things in the program.

The general database provides the function of buck Insert ~

LOAD DATA INFILE http://dev.mysql.com/doc/refman/5.7/en/load-data.html

You should not insert a database from the file filter data, but instead insert the entire file into the database and then filter from the database, which is much simpler.

The database is not the one who did that?

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.