Online Data Migration experience: How to change engines for flying airplanes

Last Update:2015-07-15 Source: Internet

Author: User

Tags mysql code

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Online data migration refers to the migration of data from one place to another where the service is being delivered, and the entire migration process requires no downtime and the service is unaffected. According to the level of data, can be divided into cache migration and storage migration, according to the changes before and after the data migration, can be divided into translation and transfer.

Translation means that the data organization is unchanged before and after the migration, for example, MySQL expands from 1 instances to 4 instances, Redis extends from 4 ports to 16 ports, HBase expands from 20 machines to 30 machines, and so on. If in the initial design for the future expansion and contraction of the convenience provided, then the data migration work will be much simpler, such as MySQL has done a sub-library table, the expansion of the instance, only need to do a few more from the library, switch access, and finally delete the redundant library table. Further, the implementation has been fully automated data migration, such as HBase, is simpler: add machines, manually modify the configuration or automatic discovery system, and then, brew a cup of coffee, waiting for the system to complete the migration.

Transfer refers to the change of data organization before and after data migration. Years ago, a social platform once for the ID upgrade to do a data migration, the ID from the original self-amplification algorithm modified to cleverly designed UUID algorithm, this migration is the biggest challenge is to modify the primary key of the data, the primary key is the unique identity of the data, it changes, it means that the original data no longer exist, New data is generated out of thin air, resulting in huge compatibility challenges for all business processes, peripherals, and upstream and downstream departments throughout the system. However, most data migration projects do not modify the primary key, or even modify the data itself, only change the organization of the data. For example, a social platform counter to save storage space, using Redis hash for storage, and later in order to improve the performance of the bulk query, the migration into KV form, such as a social platform of the forward list and the fan list, initially using MySQL storage, and later for better scalability and cost, are migrated to hbase storage.

The biggest challenge in online data migration is how to ensure that the migration process service is unaffected. Many people liken it to "changing the engine during the flight" and "changing the tires for a moving car", but in fact it is not so difficult, a technician who has been in the industry for a year or two, following some experience guidance, can complete it. Here is to share with you some of the personal experience in this regard as a catalyst.

On-line data migration is generally divided into four steps: First, on-line double write, that is, write both new and old data; second, the historical data is moved offline, that is, the historical stock data is transferred offline from the old system to the new system; third, cut-and-read requests are routed to the new system; and clean up the old code logic, the old supporting system and so on, the migration process of the lessons of the summary precipitation, the process of development or use of tools for general transformation, in case of the next use. Note that, in some cases, step one and step two may also be reversed, to do historical data relocation, and then write new data, it is necessary to carefully deal with the relocation of the time generated by the new data, generally using the queue cache write way, called "Chasing data."

Figure 1: Online Data Migration steps

The following is an example of migrating a social platform fan list from MySQL to HBase to talk about the specific implementation of each step, possible problems, and countermeasures.

Prior to the migration, a more detailed process was developed based on previous experience,

Figure 2: Fan list migration to hbase workflow flowchart

　　 On -line double write

Before writing double-write code logic, you first determine the table structure and primary key design of hbase based on business rules and performance metrics.

HBase has two typical uses for list classes, the high-table pattern, which is very similar to the traditional MySQL model, where each item in the list is stored on one line, each row has a fixed attribute column, the other is a wide table pattern, a list is stored in a row, and each item in the list is stored in a separate column. Various properties are packaged into value inside the column.

Figure 3: Fan list business using HBase High Table mode and wide table mode storage respectively

The advantage of the high table model is that, like MySQL, the implementation of various business logic is similar to the lower cost of cognition and transformation because the implementation mechanism of HBase causes a single list to be stored separately in several different region, and the performance of the query is poor. The advantages and disadvantages of the wide table are just the opposite of high table. In high-concurrency large-flow systems, many features of technical solutions can be compromised, but performance is never compromised, so we choose the wide table mode.

Many high-concurrency systems use upstream asynchrony to peak load shifting and get better usability by translating operations into messages, writing message queues, and processing in the background asynchronously. Most message queues support a single message being repeated by multiple business modules, and support concatenation and paralleling. So here we put the code logic that writes hbase into a single module and configures it to be in series or in parallel with writing old MySQL code.

To support the retry mechanism for message asynchronous processing, it is recommended that the business module be designed with idempotent characteristics, that is, the same message can be retried multiple times without destroying the final result. There are modules, such as counters, reminders, etc., and the business itself does not support retries, which can be provided for a short period of retry support through the duplicate message detection module. Most of the MySQL storage through the primary key or separate unique key index to achieve idempotent requirements, corresponding, hbase high table mode through the primary key guarantee, wide table mode is guaranteed by the column qualifier. In the fan list migration process, because column qualifier does not guarantee idempotent, resulting in data consistency is not met, and finally through the introduction of additional duplicate message detection module resolution.

In addition, HBase currently does not provide a two-level index, an overlay index, join, order BY, and other MySQL advanced query features that need to be evaluated prior to migration to determine that the new scenario can support all business features. For example, a fan list is usually a query for the latest 5,000 fans, but if you want to support the ability to query the first 100 fan lists, it can be a bit more laborious.

On-line Double write completion, you need to double write the data for consistency check. Data consistency checks need to be done from two dimensions: storage dimensions and business dimensions. A storage dimension is a comparison of data that is directly taken from MySQL and HBase, and a business dimension is a validation from the data dimension that the end user sees, that is, accessing the Fan list page to see if the result is the same as the original. Data consistency check recommendation for large systems The pass line is 6 9, or 99.9999%, which means that the difference cannot exceed 1 per 1 million data.

　　 Historical Data Relocation

After the online double write and verify the confirmation pass, you can start moving historical data.

The biggest difficulty in moving historical data is to ensure that the relocation process does not interfere with online business writing. For List class features, the biggest distraction is from a business scenario where the removal program selects a list from MySQL and changes the list before inserting it into hbase. If you add an element, due to the power of hbase, the end result will not be biased, but if you delete one or more elements, it will end up as if the delete operation in HBase did not take effect because the removal was performed by the move program after the online business performed the delete operation. In essence, this is due to the fact that we cannot use transactions at such a scale, and if we introduce transactions, we can solve this problem, but also extend the relocation time from a few days to weeks or months. To solve this problem, you can simulate serializable-level transaction isolation by introducing a lightweight memcache lock.

The consistency check is also required after the historical data relocation is completed. In fact, it is recommended that you move some data before moving the full amount of data and perform a consistency check. After some data consistency check is passed, the whole amount of data is moved. This method can greatly save the moving time and reduce the risk of delay due to the relocation process or the imperfect code.

　　 Cut Read

After the full data is moved and verified, the read request can be switched. The general way to switch is to embed the switch in the code and switch it through the Config Service or a similar mechanism. The process of switching is: tcpcopy environment--online environment UID Whitelist (internal engineer)--on-line environment percent grayscale 0.01%,1%,10%--full amount of the line environment. The tcpcopy environment is used to verify that the code is on-line, the UID whitelist is used to verify that the function is normal, and the percent grayscale is used to verify that the performance and resource pressures are normal, and all validation passes before the full-volume switch is made. Typically this process lasts from a week to two weeks.

　　 Cleaning up sedimentation

After the completion of the cut, the entire data migration process can be considered complete. But the project work is not finished, the old logic code cleanup, the old matching system offline, the old resource recovery, and the most important part: Lessons learned summary, sharing, process improvement, tool generalization transformation.

Online data migration is not a job that requires advanced technology, it is more about controlling the business logic, understanding the operation flow, mastering the features of the old and new systems, and the awe of the details.

From: Meilichuanshuo > My folders

Online Data Migration experience: How to change engines for flying airplanes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More