MYSQL mass Data insertion

Last Update:2016-06-02 Source: Internet

Author: User

Tags types of tables

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently in the MySQL batch data test, simply summarize the problems encountered:

First I was simply writing a MySQL loop to insert data into the SP, as follows:

This is the process of inserting 100W data and results, you can see that the 55min +20s for about 3,320 seconds (about 300rows/s), see after I was only crash, on the internet to find some ways to speed up:

0. The fastest is of course the data file of the Direct Copy database table (the version and platform should preferably be identical or similar);
1. Set innodb_flush_log_at_trx_commit = 0, relative to Innodb_flush_log_at_trx_commit = 1 can significantly improve the import speed;
2. Using the load data local infile speed is obvious;
3. Modifying parametersbulk_insert_buffer_size, the cache of large batch insert is adjusted;
4. Combine multiple inserts into one: INSERT into T values (a,b,c), (d,e,f),,,
5. Manual use of things;

Importing data into a MySQL database is often a time-consuming process when data volumes are large, such as millions or even tens of thousands of records. You can often use the following methods to speed up this process:

first, for myisam types of tables, you can quickly import large amounts of data in the following ways. ALTER TABLE tblname DISABLE KEYS; Loading the data ALTER TABLE tblname ENABLE KEYS; These two commands are used to open or close an update for a non-unique index of the MyISAM table. When importing large amounts of data into a non-empty MyISAM table, you can increase the efficiency of the import by setting these two commands. For importing large amounts of data into an empty MyISAM table, the default is to import the data first and then create the index, so you don't have to set it up.

second, for the InnoDB type of table, there are several ways to improve the efficiency of the import: ① because the InnoDB type of table is stored in the order of the primary key, so the imported data in the order of the primary key, you can effectively improve the efficiency of the import data. If the InnoDB table does not have a primary key, an internal column is created by default as the primary key, so if you can create a primary key for the table, you can use this advantage to improve the efficiency of importing data.

② performs a set unique_checks=0 before importing data, turns off the uniqueness check, performs a set Unique_checks=1 after the import is complete, and restores the uniqueness check to improve the efficiency of the import.

③ If an app uses autocommit, it is recommended to perform set autocommit=0 before importing, turn off Autocommit, execute set autocommit=1 after import, turn on autocommit, or increase the efficiency of the import.

And I created a table of type InnoDB, divided by 128 partitions. And I follow the above method, set the following:

The SPS that insert millions of data are as follows:

It can be seen that inserting millions of data is around 100S, with a 33 times-fold increase in speed.

Speed is improved a lot, then increase the amount of data inserted, increase by 10 times times, that is, insert tens of millions of data, the specific SP is as follows:

You can see that the time is about 1200s, because the field is longer and may have an effect on the insertion speed.

For specific verification, you insert by tens of thousands of rows, the length of the field is 1000 bytes, to see the results, the specific SP and the results are as follows:

Can be seen time 33min 51s months (about 2031 seconds), that is (4900ROW/S), speed down a lot, the character length appears to be affected.

varchar field

The restriction of a field has the following rules when the field is defined:
a) storage limits
The varchar field stores the actual content separately from the clustered index, and the content begins with 1 to 2 bytes representing the actual length (2 bytes in length over 255), so the maximum length cannot exceed 65535.
b) Encoding length limit
If the character type is GBK, the maximum number of characters per character is 2 bytes, and the max length cannot exceed 32766.
If the character type is UTF8, each character can be up to 3 bytes and the maximum length cannot exceed 21845.
for a more English-speaking forum, use GBK to occupy 2 bytes per character, while using UTF-8 in English only takes one byte.
If the limit above is defined, the varchar field is forcibly converted to the text type and generates warning.
c) Limit of the length of the President
The length of a row definition is the limit of the varchar length in the actual application. MySQL requires that a row's definition length cannot exceed 65535. If the defined table length exceeds this value, the prompt
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have the to change some columns to TEXT or BLOBs.
2. Calculation Example
give two examples to illustrate the actual length of the calculation.
A ) If a table has only one varchar type, as defined as
CREATE table t4 (c varchar (N)) CHARSET=GBK;
The maximum value for n here is (65535-1-2)/2= 32766.
the reason for minus 1 is that the actual row storage starts with the second byte ';
the reason for minus 2 is that the 2 bytes of the varchar header represent the length;
the reason for the addition of 2 is that the character encoding is GBK.

b) If a table is defined as
CREATE table t4 (c int, C2 char (+), C3 varchar (N)) Charset=utf8;
The maximum value of n here is (65535-1-2-4-30*3)/3=21812
minus 1 and minus 2 are the same as in the previous example;
The reason for minus 4 is that the int type C accounts for 4 bytes;
the reason for reducing 30*3 is that char (30) occupies 90 bytes and the encoding is UTF8.
if the varchar exceeds the B rule above, and is strongly turned into the text type, then each field occupies a defined length of 11 bytes, which is not "varchar", of course.

See the character set in MySQL with "show VARIABLES like '%char% '":

Re-upgrade the amount of data inserted, up to 10 times times, look at the time of insertion and the memory occupied, the field byte is also 1000, the specific SP and the results are as follows:

As can be seen clearly, the time to insert 100 million data is 5hours +20 min+ 56s=19256s, the average number of inserted bars is (5193 rows/s). The last time the root inserted 10 million, and then look at the amount of disk space consumed, 98G of space, with the last time when inserting thousands data (26g-17g=9g) is also linear relationship. According to the 500G disk space, storage 1 rows of 1K bytes of data, the machine can store the ideal limit of 500 million data, conservative for 4~4.5 billion or so appropriate, in case of other applications or database undo, index space occupancy.

Finally, look at the time of the query, the last insert million, the time to query the amount of data

Because the index is created, when checking the amount of data in millions, the time is about 1 seconds, when the amount of data rises to tens of millions, query 105 million, time is 3Min 30S, then insert 100 million data, query the amount of data, time reached 27min 43s, visible, not linear relationship, is the geometric level increase.

=====================================================================================================

Now the test that describes the cluster environment

Cluster: 32G memory, 500G hard disk, 3 virtual machines are 3 nodes: 188.188.2.181 (master node, data node and SQL node), 188.188.2.182 (Data node and SQL node), and 188.188.2.183 (Data node and SQL node). /root directory partition disk space 200G (originally the default is 50G), the amount of data inserted is 8000KW, the disk space occupied by

Pre-insertion memory :

Post-insert memory :

Space occupied by data memory: (910051-5) *32k=27.77g-----300W/g

Index memory occupied space: 54980*8k=430m

disk space before insertion:

disk space after insertion:

Disk space: 200g* (34%-5%) =58g-----143W/g

number of bars before insertion:

number of bars after insertion:

Number of articles: 82,551,267

MYSQL mass Data insertion

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More