[Production library combat] How to use triggers for real-time synchronization of billions of large tables in the production library

Last Update:2016-05-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The trigger migrates the same principle as MV, by creating a trigger in the source table, recording the DML operations log of the source table, and then synchronizing the DML affected records through the stored procedure to achieve the same effect as the target table and the source table data. This method is only a supplement to the prebuilt MV operation. Materialized view/on-line redefinition there is a disadvantage in the processing of DML large tables, mainly in the whole amount of initialization this step, such as: this should be multiple scans, direct path reading efficiency is very high, but due to the individual data block in memory resulting in a single block read; Or because DML operations are relatively frequent, it is possible to read data that needs to be consistent with the undo fabric read Data block, at this time, look at the waiting event is a single-block read, from the surface phenomenon of the initial synchronization is very fast synchronization, the later the slower, and ultimately unable to quickly complete or the snapshot over the old problem. Of course, creating trigger on a DML-frequent table will have an impact on the DML performance of the source table and needs to be evaluated in advance. For some small table synchronization, it is still recommended to use prebuilt MV, the reason is simple and convenient, this idea is also suitable for other databases.

This example achieves the goal:

the source-side user01.user_table table data is migrated to the target-user02.user_table, and the target-side data is synchronized with the source-side data in real-time data consistency. In the case of business migration, controlling the business impact within 1 minutes and choosing the right point-in-time switch can basically do the time required to restart the application, in other words, the impact on the business depends on the restart time of the application.

Database Source: Business Database 1

Database Target side : Business Database 2

source-side user01.user_table table:
Table Size: 60gb+, Number of records: 480 million +, Daily DML Volume: 200W or so, after 22 o'clock to the next morning 8 points DML volume is less, this time period is idle time period, can be initialized in this time period;
Use user_id to do hash partition, partition number: 16, primary key field: User_id+serviceid, unique index field: Uin+serviceid;

target end user02.user_table table:
use UIn to do hash partition, partition number: 64, primary key field: Uin+serviceid, unique index field: User_id+serviceid (the index query is not used in the current production library according to the business situation).

Limitations:

1, the source table needs to have a primary key or non-null unique constraints;
2, during synchronization, do not support DDL operation of the source table (note: Can improve the code to achieve the effect of supporting DDL);
3. DML frequent tables do not fit, may cause incremental to keep up (note: No standard, depending on database performance).

Brief description of the implementation steps:

Note: The Red font is the target-side operation, and the Green font is the source-side operation.

4, create a synchronization log table (User01. user_table _rep_sync_log), which records DML operation information for the source table;
5, Create a trigger (USER01.TRIGGER_USER_TABLE_REP) to insert the captured information into the Log Synchronization table (Note: The CREATE trigger is made in the idle time period of the business table, reducing the DML log volume production and having less impact on the business, And be sure to create a before full-volume synchronization;
6, Create a Dblink (to_xxxxx) for the destination library connection source database. Localdomain), and synchronize the table User01. 11, monitor the synchronization log table (User01. user_table_rep_sync_log), observe the amount of data that needs to be synchronized.

If the business system is cut to meet the need for business process coordination, monitoring the amount of data that needs to be synchronized at the least during the refresh interval, stop the scheduled refresh job, close the old business program, manually perform a synchronous flush in the target library until no new records are generated in the Synchronization log table, start a new business program, To achieve the least impact on the business system.

[Production library combat] How to use triggers for real-time synchronization of billions of large tables in the production library

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Production library combat] How to use triggers for real-time synchronization of billions of large tables in the production library

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support