MySQL Database Master-slave replication architecture

Source: Internet
Author: User

In the previous article, "MySQL database Transaction and Replication" analyzes how to ensure the consistency between binlog and transactional data during MySQL replication, this paper further analyzes the need to consider what aspects are needed to ensure the data consistency of master and slave after the introduction from the library.

Native Replication Schema

The native copy architecture of MySQL works as shown in. The I/O thread threads from the library are responsible for continuously reading the Binlog log file of the main library and writing to the local Relay log temporary cache. The SQL thread thread from the library continuously reads the Relay log replay event into the storage. The whole process seems simple and clear, but there are a few points that have a key impact on master-slave data consistency, and we analyze each one below.

In the case of master-slave replication, there are two phenomena of inconsistent data:

    1. Data loss
    2. Data duplication

The I/O Thread from the library reads the binlog of the main library over the network, which can result in data loss in the event of a network failure. To avoid data loss due to network failure, reconnecting from the library after network recovery requires knowing where to retransmit the data from the main library Binlog. From the library you need to remember the location of the Binlog when the interrupt occurred and reread it from that breakpoint, which we call a retransmission checkpoint from the library. A reliable retransmission checkpoint must be established after reading from the library to the data and writing to the local Relay log persistence, otherwise there is a possibility of data loss.

Due to the distributed characteristics of the master-slave replication process, it is necessary to ensure the power of the replication process, that is, duplication of the same piece of data will eventually not produce duplicate data. Anti-weight strategy is necessary, generally conforms to the paradigm characteristics of the database table design by the primary key to prevent weight, and no primary key table data can be combined with all fields unique index to prevent weight. With the anti-weight policy, you can backtrack the replication process without having to consider generating duplicate data from the library.

In order to ensure the consistency of master-slave data, the copy process must not only ensure that it is not lost, not duplicated, but also that the operation sequence is consistent. The event log of Binlog reflects the sequence of operations of the main library concurrent transactions, and eventually the sequence is reflected as it is from the library. So in order to do this, the native replication architecture uses a single-threaded model of serialization. This is also not possible, because at the database level is not aware of the causal and dependency between different data, and therefore cannot be in parallel storage.

The native replication architecture achieves no loss, no duplication, and sequential consistency, which is basically available under normal scenarios, but there are some deficiencies:

    • The visibility and manageability are relatively weak.
    • There is nothing to do with heterogeneous data.
    • A generic single-threaded model can become a performance bottleneck, causing replication latency to be too high.
    • A one-to-many scenario creates too much replication pressure on the main library, affecting the availability of the main library.

Some special scenarios for database replication distribution, using the native replication schema is not necessarily appropriate, the possible scenarios are:

    • Large-scale library, large data volume, large write volume, but also need to cross the region, across the computer room replication, but also for replication delay length is more sensitive, such as large-scale e-commerce orders, transaction class database.

That's why we need to consider customizing the replication schema for special scenarios, let's look at a conceptual schematic of a custom replication architecture.

Customizing the Replication Schema

As above, the custom replication schema reference native schema is modeled as a MySQL from the library, which consists of three main roles inside:

    1. Pull Worker, acting similar to the native I/O Thread.
    2. Buffer + Persistent Storage, acting similar to the native Relay log.
    3. The load Worker, acting like the native SQL Thread.

Because it is a custom program implementation, you can provide additional functionality without retrofitting MySQL, and the relative application and MySQL can be transparent. Compared to the lack of native replication architectures, a custom replication architecture can provide better replication process monitoring and management capabilities, support for heterogeneous data transformations, and more. For large library replication that requires a cross-region, cross-room, and latency-sensitive approach, the replication process can be accelerated through appropriate policies.

For example, the aforementioned large-scale e-commerce orders, transaction database, is generally divided into sub-list. After the sub-database table, the data between the different library tables is in fact completely independent in business and can support parallel writing. So let's see why we've drawn two Load workers, which are expressions that can be written in parallel for a business-independent table. The replication delay for a single piece of data includes:

Total Duration T = P + N + L; Where P is the pull worker processing time, N is the length of the network transmission, and L is the Load worker processing time.

The binlog of the same library is not a good parallel pull of the sequence, the network length of the transmission process is also rigid, the only thing that can accelerate is the storage processing. Parallel multi-threaded inbound operations can be done on a separate table by business to shorten the overall duration of L, as shown in.

Summarize

This paper analyzes the principle of MySQL master-slave replication based on Binlog, and considers the key elements of master-slave architecture Design in distributed network environment from data consistency. Based on the analysis of MySQL native replication architecture, this paper gives a high-level design of flexible and controllable custom replication architecture. After understanding how the master-slave replication architecture guarantees data consistency, we can further consider how to do two-way replication synchronization and ensure the final consistency of data in dual-master libraries. The next article in this series will specifically analyze this issue.

Reference

[1] MySQL internals Manual. Replication.
[2] MySQL internals Manual. The Binary Log.
[3] in355hz. The implementation of the database ACID.
[4] jb51. MySQL handling instructions for Binlog.
[5] Repls. Brief analysis of Innodb_support_xa and Innodb_flush_log_at_trx_commit.
[6] 68IDC. MySQL 5.6 DBA and Developer's Guide.
[7] Csdn. High-performance MySQL master-slave architecture replication principle and configuration detailed

Below is my public number "wink", in addition to writing technical articles, as well as products, industry and life thinking, hope to be able to walk with more people on this road to communicate.

MySQL Database Master-slave replication architecture

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.