Summary of data migration experience-heterogeneous data migration with hundreds of millions of tables

Source: Internet
Author: User

Summary of data migration experience-heterogeneous data migration with hundreds of millions of tables

Due to the system revision, we have been doing data migration in the last three months. Due to the special nature of our business, we basically step on all the pitfalls of data migration and decided to make a summary.

Migration type-historical data with large changes in the table structure of the new and old systems

I. Core Issues

1. The structure of the new and old tables is greatly changed. The new table is based on deliver. It also involves a table in the warehouse system and four tables in the order system, which are migrated according to the new logical ing relationship.
2. incremental migration. There must be new data during full migration, and the data should be migrated in real time.
3. Hundreds of millions of Data performance and efficiency considerations. Because order services are very important, the qps caused by data migration puts a lot of pressure on the database. Therefore, we need to continuously test and iterate to find a suitable migration efficiency and performance.
4. Old data format problems. 12. Due to the long history of data generated over the past 13 years, there may be problems such as missing data and incomplete data. All these problems should be fault-tolerant in the program.
5. It is difficult to check the data. Because the new table involves multiple old tables with a large amount of data, it is difficult to verify the data. Currently, you can write your own scripts on a weekly basis and check the data based on certain logic, with the help of visual comparison.


Ii. Data Migration Process

Full synchronization java program

Incremental synchronization of otter + extended java programs

Historical data must be synchronized to backend and front-end cobar libraries.


1. Historical Data is downloaded into multiple texts by year (1 text is about 2 GB)

2. multi-threaded reading of text data

3. The main processing types are insert batch_insert update. Depending on the type, the thread calls different logical methods to process data and write the data to the new table.


Iii. Discuss

Expand the above questions and summarize the technical problems.

1. Project architecture level

The migration project is developed in java and serves only as a migration task. The positioning is lightweight. the backend db adopts jdbcTemplate, and the spring management javabean is divided into three layers, A layer of manager processes a layer of business logic for data streams to be distributed to different business logic based on data types-a layer of dao for processing ing relationships.

2. db Problems

The backend db is a master-slave multi-slave master used to receive write requests, and the slave database is a read request. Slave database performance is far lower than the master database

2.1 data retrieval by calling Interfaces

At the beginning, to make the project very light, the old data can be obtained through calling the order interface and wms interface to obtain the corresponding data. However, the interface is very fragile. According to statistics, the http interface can only defend against requests of about-per second. In large data volumes, the interface quickly becomes a bottleneck. I tested it once. It takes two weeks for 0.1 billion data to be exported. Therefore, you must stop calling the interface and read the database for query.

2.2 cold/hot database Problems

Most of the historical data is stored in the cold warehouse. In db query, You need to first check the hot warehouse and then the cold warehouse, and check the database according to the corresponding logic.

2.3 batch insert Master/Slave latency

At first, data is processed and inserted one by one, so that 0.1 billion data records will generate 0.1 billion insert statements. Writing to the master database has no impact, but the slave database has a big problem. Its performance is far different from that of the master database. At the same time, the synchronization of binlog results in great latency, 0.1 billion insert means 0.1 billion times of network transmission will be performed when 0.1 billion times of commited 0.1 billion times of binlog synchronization are submitted, and 0.1 billion insert operations will be performed from the slave database. At this time, the latency will lead to more problems, because the master database has other business data writes, the slave database is not synchronized in real time, resulting in the Customer Service can not find the order.

Later, it was changed to an insert statement with multiple values. Each time a batch of data is processed, a commit is submitted. This improves the insert performance and solves the latency problem.

The jdbcTemplate is changed to the updatebshortname () method, but the SQL statement is still a data insert, which must beHow does jdbc support batch insert,The jdbc: mysql: //... & rewriteBatchedStatements = true must be added to the jdbc connection;

2.4 dead connection of SQL

When multiple threads read data, several threads are blocked. After jstack, Mysql reports sokcetAvaiable, which usually occurs when the front-end cobar is connected. This is because the connection timeout is not set for the jdbc connection. The connectTime socketTIme must be added. This is the network timeout time for your application to connect to the database. If this parameter is not set, the system will wait.

Set the parameter: & failOverReadOnly = false & connectTimeout = 60000 & socketTimeout = 60000 failOverReadOnly: the default timeout value for jdbc is read-only. The failOverReadOnly parameter must be added.

3. Program-level optimization

3.1 full data multi-thread Solution

It takes about one week to migrate 0.1 billion million data in a single process. The time and efficiency are too low. In fact, it is very easy to change multithreading. Historical data is in the form of text. You only need to read the text one by one in multiple threads for concurrent processing. At the same time, countdownlatch is used to ensure that all threads are completed before exiting the main program.

3.2 multi-thread solution for incremental data

Otter consumes binlogs sequentially, and the consumption process is very fast. However, if the single-process mode is adopted, the bottleneck will be stuck in my write program, so the program should be multithreading. The otter reads 2000 binlog records each time. Once the master process receives 2000 records, it starts multiple threads to concurrently process these records. You also need to pay attention to the problem. It is possible that there are multiple changes to a record in the same time period. The last change must prevail. Therefore, I set the maximum number of threads to the number of buckets, according to the order_id and the remainder of the bucket, the value is a map (order_id, row object) at the same time. This ensures that each batch of processing is the latest record.

3.3 otter Extension

In the future, I will write another blog post to analyze otter. Here I will give a brief introduction. Otter is an open-source project of Alibaba. It mainly solves the problem of real-time data backup in remote data centers, the principle is to disguise itself as a slave to read the binlog of the master, parse it into an eventData object, and write it into a new table. By default, it only supports one-to-one replication and simple table field ing logic. However, Alibaba's project is powerful. It provides an interface for users to synchronize data according to their own logic after inheritance. The inheritance class is AbstractEventProcessor. This class will receive the obtained eventData object (A Piece Of binlog content), which includes all the field values and SQL types of the table you subscribed. After inheritance, introduce your own synchronization program in the method of this class, so that the data subscribed by otter will enter your own program.


4. Data remediation policies

There are still some problems after data migration:

1. During the incremental process, the binlog reading may be slow due to network and other reasons. canal will retry, but it seems that the data at this time point will still be lost.

2. Some time point errors during the migration process lead to data loss or exceptions.

Because the results of the new and old tables differ greatly, the data volume is too large. You cannot simply use an SQL statement, which will drag the database. Therefore, I write a program to check the data for one week, mainly to check whether there are old tables in the new table, which need to be deleted; if there are new tables in the old table, these need to be added; if the status of the new table is inconsistent with that of the old table, update the table based on the old table.


Summary:

1. performance-I didn't know much about mysql performance and data size before migration. I used the interface to obtain data and used multiple threads to read the database. I wasted more than half a month in the past, before migration, you must consider the data volume, migration time, table results, and mysql performance.

2. code Writing-This program is too tightly coupled. When an operation type is added in the later stage, the original code requires a lot of judgment, and the next write operation must be interface-oriented, use inheritance and factory methods (refer to the seckill project); jdbcTample is very lightweight. After a single SQL statement is too large, the amount of code will come up. Next time, make an assessment in advance and use mybatis; there are too many validation scripts and execution scripts, and too many logs are printed. log4j can be used to output multiple texts next time, and the script writing should be more elegant and concise.

3. Check the data-data migration is a dirty and exhausting task. This migration has spent most of its time on data and pair, constantly writing SQL statements, and constantly reading and verifying data. In the future, you must plan and verify similar migration. For example, select several key fields for hash.



References:

1. http://m.blog.csdn.net/blog/baoqiangwang_11109/5492910 of "methods, steps and experiences of data migration"

2. http://www.cnblogs.com/chenjianjx/archive/2012/08/14/2637914.html of the rewriteBatchedStatements parameter of the Jdbc driver

3. in-depth understanding of JDBC timeout settings http://www.importnew.com/2466.html

4. otter https://github.com/alibaba/otter

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.