20 million records need to be inserted in mysql. The insert statement has a limited insert speed. Hundreds of records are inserted every second. This speed can also be applied to hadoop clusters, which may be caused by database problems, the insert speed of SQL server and oracle is not very fast on the Internet. A simple optimization method is as follows: 1. insert multiple records in an INSERT statement [SQL] insert INTO tablename (field0, field1 ,...) VALUES (value0, value1 ,...), (value0, value1 ,...), (value0, value1 ,...),... (value0, value1 ,...) in this way, the insertion speed can be improved many times, but it is still not enough. For 20 million records, the insertion speed of 1000 or 2000 records per second is still too slow. 2. Importing data from text files mysql can directly import records from text files, but text files must be Row Records, in addition, each field is separated by the same character, and each line is also separated by the same character. After writing a program to process the text file format, you can use the following statement on the mysql client to import data: [SQL] mysql> LOAD DATA LOCAL INFILE 'filename' INTO TABLE 'tablename' FIELDS TERMINATED BY '\ t' LINES TERMINATED BY' \ n '; '\ t' and' \ n' are the separators of fields and rows, which may be different under different circumstances.
In this way, I feel that the import speed is mainly related to the file size, and the number of records is not very relevant (it may be that there are not enough 20 million records ..) It takes 3 minutes to import a 20 million MB text file (rows) on a single machine for preprocessing, it took 7 minutes to import the database (the machine configuration was i5-2400CPU, 8 GB memory, hard disk reading speed was about 90 MB/S) and then it took another 11 GB of text files, which was estimated to run in the cluster. Author bhq2010