Mysql Sea Volume data migration step sharing _mysql

Source: Internet
Author: User

The company's data center plans to migrate massive amounts of data, while adding a Time field (originally a datatime type, now adds a date type), a single table data volume of more than 600 million records, data is based on the time (month) to do partition due to busy, has not summed up, So the details of the place can not remember clearly, here is simply summed up the situation at that time, the memo

Disorderly Bump

Initially received the task, there is no clear starting point, directly is select * from DB limit 10000, dynamically modify the number of pages, through the console to see the time consuming, slow

Copy Code code as follows:

SELECT Ir_sid,ir_hkey,ir_groupname,ir_sitename,ir_channel,ir_mid,ir_urlname,ir_status_content,ir_created_at, Date_format (ir_created_at, '%y.%m.%d '), Ir_lasttime,ir_via,ir_thumbnail_pic,ir_rttcount,ir_commtcount,ir_uid,ir_ Screen_name,ir_retweeted_uid,ir_retweeted_screen_name,ir_retweeted_mid,ir_retweeted_url,ir_status_body from TB_ Sina_status WHERE ir_sid>40000 and ir_sid<50001 into outfile '/home/mysql/data/data_outfile.txt '; LOAD DATA INFILE '/home/mysql/data/data_outfile.txt ' into TABLE new_tb_sina_status;

Can I read data based on partition?

Since the database is to do partition by partition, can you press partition to read the data, if you can use what kind of grammar to read? Time as long as the monthly reading data, MySQL will automatically based on partition read, can use the command: Explain partition can see based on which partition
Reading large data can be time-consuming, and as to what the data is going to be, we may be very interested to know that you can use the command: Show Status View, my impression is mainly sending data,writting to net and so on.
What is the performance of the InnoDB engine better than the MyISAM engine?
In the pilot Library experiment, export and import one months of data (8G text volume, 2500w Records), under the MyISAM engine requires less than 4h (test environment for PC), but under the InnoDB engine, it takes 32 hours, improve the index and so on, also need 28h, performance is 8 times times the difference.

Found high man on the internet about InnoDB and MyISAM difference, said need to modify innodb_buffer_pool_size, Innodb_flush_log_at_trx_commit

Can guarantee that there is not much difference, try to have no obvious improvement, in this machine is OK, why??? This has been killing me for a long time.

Innodb_flush_log_at_trx_commit

is the InnoDB 1000 times times slower than MyISAM and the head is big? It seems that you have forgotten to revise this parameter. The default value is 1, which means that each submitted update transaction (or statements outside of each transaction) is flushed to disk, which is quite resource-intensive, especially when there is no battery-standby cache. Many applications, especially those from MyISAM, set the value to 2, which means that the log is not flushed to disk, but only to the operating system's cache. Logs are still flushed to disk per second, so there is usually no loss of 1-2 updates per second. If set to 0 is much faster, but also relatively insecure-mysql server crashes will lose some transactions. Setting to 2 will only lose the part of the transaction that is flushed to the operating system cache.

Innodb_buffer_pool_size

The InnoDB is like a snail in the default Innodb_buffer_pool_size setting. Because InnoDB caches data and indexes without leaving the operating system with too much memory, you can set it up to 70-80% usable memory if you only need to use InnoDB.

The last arduous, the journey, found two other parameters

Innodb_log_file_size

is important in the case of high write loads, especially large data sets. The larger the value, the higher the performance, but note that recovery time may be increased. I often set it to 64-512MB, which varies according to the size of the server.

Innodb_log_buffer_size
The default settings are also available for medium-intensity write load and shorter transactions. If there is a peak update operation or a large load, you should consider increasing its value. If its value is set too high, it may waste memory-it refreshes every second, so there is no need to set the required memory space for more than 1 seconds. Usually the 8-16MB is enough. The smaller the system, the smaller its value.


Finally, MyISAM and InnoDB data performance is basically consistent, 2500 of the data about 3.5h, library read data need 2h, this is only a schematic value for reference (PC test), formal service on the test results more obvious

Performance Tuning Statement Reference

Copy Code code as follows:

Set profiling = 1;
Show Profiles\g
Show Profiles Cpu,block io io for query 1;
Show status
Show Processlist
Explain

Will parallel reads be faster?

If based on partition data, or can not achieve the target, I finally by writing shell footsteps, multi-process parallel based on partition data, that is, start multiple mysql-uroot-p db < Exp201201.sql, mysql- Uroot-p DB < Exp201202.sql, read/write under each SQL Day (event environment is monthly partition)

Copy Code code as follows:

SELECT Ir_sid,ir_hkey,ir_groupname,ir_sitename,ir_channel,ir_mid,ir_urlname,ir_status_content,ir_created_at, Date_format (ir_created_at, '%y.%m.%d '), Ir_lasttime,ir_via,ir_thumbnail_pic,ir_rttcount,ir_commtcount,ir_uid,ir_ Screen_name,ir_retweeted_uid,ir_retweeted_screen_name,ir_retweeted_mid,ir_retweeted_url,ir_status_body
into outfile '/home/mysql/data/sinawb20120724/111101.txt '
FIELDS terminated by ', ' optionally enclosed by ' ' escaped by ' \
LINES terminated by ' \ n '
From Tb_sina_status
WHERE ir_created_at >= ' 2011-11-01 00:00:00 ' and Ir_created_at < ' 2011-11-01 23:59:59 '

LOAD DATA local INFILE '/home/mysql/data/sinawb20120724/111101.txt '
IGNORE into TABLE ' New_tb_sina_status '
CHARACTER SET UTF8
FIELDS terminated by ', ' optionally enclosed by ' ' escaped by ' \
LINES terminated by ' \ n '

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.