How to migrate massive data in MySQL

Source: Internet
Author: User
Tags import database

The company's data center plans to migrate a massive amount of data and add a certain time field (originally datatime type, now added a date type). The data volume in a single table reaches more than 0.6 billion records, the data is based on the time (month). Due to the busy schedule, the summary has not been made, so I cannot remember the details. Here I will simply summarize the situation at that time and forget it.

Chaos

When a task is initially received, there is no clear start point. It is directly select * from db limit 10000, which dynamically modifies the number of pages to view the time consumption through the console, which is slow.

Copy codeThe Code is as follows:
SELECT IR_SID, IR_HKEY, IR_GROUPNAME, IR_SITENAME, IR_CHANNEL, IR_MID, IR_URLNAME, IR_STATUS_CONTENT, IR_CREATED_AT, date_format (IR_CREATED_AT, '% Y. % m. % D'), IR_LASTTIME, IR_VIA, random, IR_RTTCOUNT, IR_COMMTCOUNT, IR_UID, IR_SCREEN_NAME, IR_RETWEETED_UID, random, IR_RETWEETED_MID, IR_RETWEETED_URL, IR_STATUS_BODY FROM TB_SINA_STATUS WHERE IR_SID> 40000 AND IR_SID <50001 into outfile '/home/mysql/data/data_outfile.txt '; load data infile '/home/mysql/data/data_outfile.txt' into table NEW_TB_SINA_STATUS;

Can I read data based on partition?

Since the database is partitioned by partition, can I read data by partition? What syntax can I use to read data? As long as the data is read by month, mysql will automatically read data based on partition. You can use the command: explain partition to see which partition is based on.
Reading big data is very time-consuming. We may be very curious about the status of the data. You can use the command: show status to view the status. I think it is mainly sending data, writting to net.
What is the performance of the innodb engine compared with that of the myIsam engine?
Export and import data for one month in the import database Experiment (8 GB of text, million records), less than 4 hours in the myisam engine (PC testing environment ), however, it takes 32 hours for the innodb engine to improve indexes and the like. It also takes 28 h, with eight times worse performance.

I found the difference between innodb and myisam on the internet, saying that innodb_buffer_pool_size and innodb_flush_log_at_trx_commit need to be modified.

It can be guaranteed that there is no big difference, and there is no obvious improvement in the attempt. It can be done on the local machine. Why ??? It took me a long time

Innodb_flush_log_at_trx_commit

Is it because Innodb is 1000 times slower than MyISAM? Maybe you forgot to modify this parameter. The default value is 1, which means that each commit of the update transaction (or a statement other than each transaction) will be refreshed to the disk, which is quite resource-consuming, especially when there is no battery backup cache. Many applications, especially those transformed from MyISAM, set the value to 2, that is, do not refresh the log to the disk, instead, it is only refreshed to the operating system cache. Logs are still refreshed to the disk every second, so the consumption of 1-2 updates per second is usually not lost. If it is set to 0, it will be much faster, but it is relatively insecure-some transactions will be lost when the MySQL server crashes. If set to 2, only the transaction that is refreshed to the operating system cache will be lost.

Innodb_buffer_pool_size

Innodb is similar to snail bait in the default innodb_buffer_pool_size setting. Because Innodb caches data and indexes, there is no need to leave too much memory for the operating system. Therefore, if you only need Innodb, you can set it to up to 70-80% of available memory.

In the end, two other parameters were found.

Innodb_log_file_size

It is important to write data at a high load, especially for large datasets. The larger the value, the higher the performance, but note that the recovery time may increase. I often set it to 64-512 MB, which is different from the server size.

Innodb_log_buffer_size
By default, the server performance is acceptable when the write load is moderate and the transaction is short. If there is a peak update operation or a large load, you should consider increasing the value. If its value is set too high, memory may be wasted-it will refresh once every second, so you do not need to set the memory space more than 1 second. Usually 8-16 MB is enough. The smaller the system, the smaller its value.


In the end, the performance of myisam and innodb's data import is basically the same. 2500 of the data needs about 3.5 h, and 2 h for a single database to read data, this is just a schematic value for reference only (testing on pc). The testing results on the formal service are more obvious.

Performance Tuning statement reference

Copy codeThe Code is as follows:
Set profiling = 1;
Show profiles \ G
SHOW profile CPU, block io for query 1;
Show status
Show Processlist
Explain

Will parallel reading be faster?

If the data is imported Based on partition, but it still cannot reach the set goal, I will write a shell script to allow multiple processes to import data in parallel based on partition, that is, Start Multiple mysql-uroot-p db <exp201. SQL, mysql-uroot-p db <exp201202. SQL, and read and write each SQL statement by day (partition is performed by month in the event environment)

Copy codeThe Code is as follows:
SELECT IR_SID, IR_HKEY, IR_GROUPNAME, IR_SITENAME, IR_CHANNEL, IR_MID, IR_URLNAME, IR_STATUS_CONTENT, IR_CREATED_AT, date_format (IR_CREATED_AT, '% Y. % m. % D'), IR_LASTTIME, IR_VIA, kernel, IR_RTTCOUNT, IR_COMMTCOUNT, IR_UID, IR_SCREEN_NAME, kernel, kernel, IR_RETWEETED_MID, IR_RETWEETED_URL, IR_STATUS_BODY
Into outfile '/home/mysql/data/sinawb20120724/111101.txt'
Fields terminated by ', 'optionally enclosed by' "'escaped '\\'
Lines terminated by '\ N'
FROM TB_SINA_STATUS
WHERE ir_created_at> = '2017-11-01 00:00:00 'and ir_created_at <'2017-11-01 23:59:59'

Load data local infile '/home/mysql/data/sinawb20120724/111101.txt'
Ignore into table 'new _ TB_SINA_STATUS'
Character set UTF8
Fields terminated by ', 'optionally enclosed by' "'escaped '\\'
Lines terminated by '\ N'

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.