MySQL 5.7 Multi-source master and slave replication uniqueness

Last Update:2017-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

About MySQL master-slave replication

Replication technology as its name suggests, is through the replication of the database to a data-oriented, copied into another storage, the source of the data to do the main library, storing replicated data called from the library. MySQL has a number of replication schemes, such as master-slave replication, semi-synchronous replication, multiple owners and master replication. Basically, by writing the operation of the main library into the binary log, the binary log is routed to the library and repeats the actions recorded in the log to follow the main library state in order to achieve the effect of synchronizing from the library data.

Among them, master-slave replication can transform, expand a lot of combination methods, such as multiple source replication (multiple master to send data to 1 database), 1 main from the server or from the servers and then extended from the server.

The following is a list of database master-slave replication schemas:

Note: The main library is master (1,2,.., N), slave from the library (1,2,..., N).

Master-slave replication has the following advantages:

Share load: read and write separation of the business, reduce the main library I/O load, share some of the pressure from the library, shorten the customer query response time.

Increase Robustness: When there is a problem in the main library, you can set up the main library from the library through a variety of scenarios, replace the primary library to support the business, and shorten the downtime window.

advantageous backups: backups from the library, which do not affect the primary library's transactions, and do not affect the primary library performance and disk space.

Query Analysis: from the library can be used as statistics, reports and other data analysis work using the OLAP library.

Offsite Backup: will be placed offsite from the library to be used as a synchronous backup of offsite data.

Starting with the 5.7 version of MySQL to support multi-source master-slave replication (multi-source Replication) is to send multiple database (Master) datasets to 1 from the library (Slave), which also has the advantages of master-slave replication as mentioned earlier, In addition to these, its uniqueness is also:

aggregation data: especially in some scenarios where the database is divided into tables, the statistical analysis operations in the dataset can be implemented on 1 sets from the library server.

Save cost: Data centralized storage can avoid the waste of hardware and software resources such as servers, 5.7 before 1 main 1 from or 1 master multiple from the program needs for each host a standby unit 5.7 After the introduction of Multi-Source replication, multiple from the library can be merged, as to whether the merger is stored at the high-end or low-end servers, depending on the analysis, statistics and other services in the overall business priorities, the level of busyness and other factors.

Centralized Backup: the convenience of backing up all received database data on a single server.

Offsite disaster preparedness: will be located from the library far away, can be used for off-site backup projects.

Basic 1 main 1 from the replication implementation process

Let's start with a step-by-step understanding of the basic 1 main 1 from (1 master,1 slave) replication implementation process:

In the figure, Master is the host name of the main library, slave from the library hostname. The database being synchronized is called music. The binary log from the library receiving main library (Binlog dump thread) writes the log from the library's I/O thread (I/Os threads) from the library's relay log, and then applies the contents of the log to the library from the SQL thread (SQL threads).

From the library, you can see 2 prerequisite processes (I/O thread and SQL thread) are on standby, and the thread status is as follows:

The function of a thread is primarily confirmed by the State field:

I/O thread:

Waiting for master to send event

SQL (Coordinator) Thread:

Slave has read all relay log; Waiting for more updates

After the concurrency is turned on, the following threads are also available:

Worker thread:

waiting for a event from coordinator

The implementation of multiple-source replication is similar to the 1 master 1 from sending binary logs again, but a slight difference on SQL threads (SQL thread) provides a set of SQL and IO threads for each main library instance:

How to configure a multiple-source replication action method

The configuration of multiple-source replication is relatively simple:

Stop slave;

SET GLOBAL master_info_repository = ' TABLE ';

SET GLOBAL relay_log_info_repository = ' TABLE ';

Change Master to master_host= ' 192.168.5.160 ', master_user= ' slave1 ', master_password= ' Gaoqiang ' for Channel ' Master1 ';

Change Master to master_host= ' 192.168.5.163 ', master_user= ' slave1 ', master_password= ' Gaoqiang ' for Channel ' Master2 ';

Start slave;

The diagram uses 2 main libraries: music and habit, assuming music holds singers, names, and other information, habit the user's idol, favorite songs, listening songs, playing peak time and location information, Converge to from the library can be realized that the economic benefits from the library to merge and achieve a unified user behavior analysis, but also with a mysqldump command plus a--all-databases parameter all exported to do backup.

Considering the multiple source replication operation, a data from the library to accept multi-party, compared to the pressure will be more than a single library replication, so you need to optimize the configuration from the library, so as to enhance the efficiency from the library execution.

Performance improvement Tool--parallel replication

the next thing to say is: performance-enhancing tools-parallel replication. in order to improve the execution efficiency of SQL threads (SQL thread), reduce the delay between the main library and the library, MySQL provides the characteristics of parallel replication, which can be used to replay the transaction from the library, so as to achieve the effect of accelerating the synchronization speed.

It should be noted that the use of the process or some problems need a little attention, if set improperly, but may be superfluous.

Take performance, it is not concurrent open the higher the better, and too high and too low, can bring negative performance effects, such as causing coordinator judgment, distribution and other processing costs increased, the use of sysbench to carry out the pressure test process, the cost of elevated symptoms reflected in the high CPU consumption. May be in different environments and business scenarios will have a corresponding response, need tailored, local conditions.

The following figure is 1 main 1 copying the playback process in parallel from the SQL thread:

In Figure 2, SQL threads (SQL thread) from the original 1, split into 1 coordinator and n worker. Coordinator is mainly used to distribute work to different worker, and to repeat itself if necessary. The figure is divided into 18 parallel, that is, 18 worker work in parallel. They are responsible for parallel playback of transactions that can be divided into a group in the log.

When the transaction is applied to the library, the value of last_committed in binary log (which requires setting the parallel type to logical clock) is used to determine whether a group can be placed in parallel playback and, if the value is the same, executed in parallel.

Log information:

As you can see from the log, the database has turned on the Gtid (Global transaction Identity) feature, which appears in version 5.6 to ensure that each committed transaction will have a globally unique number, and gtid information can be seen in the log.

Multi-source replication opens a concurrent schema diagram:

The way to turn on parallel replication operations is the same for 1 and multiple sources:

Stop slave;

SET GLOBAL slave_parallel_type= ' Logical_clock ';

SET GLOBAL slave_parallel_workers=30;

Start slave;

Verify that the configuration is in effect:

The show processlist command will see multiple coordinator threads and 30 worker threads allocated by each co thread, totaling more than 60 threads. (because the worker lineup is too large to be occupied, it is not shown here)

Why choose the degree of parallelism is 30, because from 1 Master 1 from the test found that the degree of parallelism in this test environment disk resource utilization is slightly higher than other scenarios, CPU consumption is relatively low, memory consumption difference, the overall efficiency of the highest, the shortest execution time.

1 Master 1 Slave The CPU performance state characteristics of the database from the time of replication:

CPU Performance Statistics:

As you can see from the diagram, in this test environment, the best CPU state is the concurrency degree of 18 and 30, the concurrency is 300 when the CPU consumption is obvious, the concurrency of 2 CPU consumption, and processing time is more time-consuming, concurrency is 0, actually equals not concurrency, CPU utilization is not high, Time consuming is also long; it is worth noting that when the concurrency is set to 1, even if there are only 1 workers, but it is a concurrency pattern, it is also consuming coordinator resources, and at this time Coordinator also participate in the replay, the equivalent of 2 threads to repeat, So it's close to the concurrency setting of 2, so be sure to set it to 0 instead of 1 if you want to close concurrency.

Next, stress testing and performance monitoring for multiple source replication:

The stress test for Multi-Source replication uses Sysbench to pressurize 2 main libraries (Master and Master1) while performing performance monitoring from the library slave. (3 databases are in exactly the same server configuration)

Test statement:

Parameter description:

OLTP scenario

InnoDB engine

Operation on 5 sheets

10,000 operation requests per database (altogether 20,000 operation requests)

Mixed Read and write

100 concurrency per database

200,000 data per table

Considering that the 30 concurrency resource utilization is relatively full and the execution efficiency is relatively high, the test results are tested after 30 SQL threads are opened from the library, and the visual graph is analyzed from the monitoring data of the library.

The graphs generated from the following 3 tests show that for the 20,000 requests sent from the main library, the parallel execution time (the horizontal axis is time) is shorter than the single thread, the resource utilization is higher, and the execution efficiency is higher.

CPU Usage:

As you can see from the diagram, the CPU utilization is higher than before, the load is OK, up to 36%, using more than single-threaded, the operation completes time earlier (the curve first restores down to 0).

Disk Write kb/s

Hard disk usage from the figure, parallel SQL thread relay process is relatively smooth, there is no obvious jitter, and 30 concurrent curve first to zero, the end of the operation.

Free Memory remaining memory:

Memory usage is not much worse.

It can be seen that, after reasonable opening and parallel, multi-source replication helps to improve replication efficiency and shorten data latency.

Summary

In general, MySQL's multi-source replication provides a more economical, convenient, and secure database environment. If you are interested in friends, welcome to exchange, analog business scenarios to test, put forward test recommendations, correct errors and joint research are very welcome, and hope to cooperate with everyone!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More