MySQL Parallel copy feature

Source: Internet
Author: User
Tags crc32

mysql5.7 version relative to mysql5.6 the largest change in parallel replication, 5.6 supports schema only, that is, library-based, and 5.7 supports parallel replication based on group transactions.

There are several important parameters

master_info_repository

When the MTS feature is turned on, be sure to set the parameter master_info_repostitory to table so that the performance can be 50%~80% promoted. This is because after parallel replication is turned on, the update to the meta-master.info file will be greatly improved and the competition for resources will become larger. In previous versions of Innosql, parameters were added to control how often the Master.info file was refreshed, even if the file could not be refreshed. Because it is not necessary to refresh this file, that is, according to master-info.log this file recovery itself is unreliable. In MySQL 5.7, Inside recommends setting master_info_repository to table to reduce the overhead of this section.

slave_parallel_workers

If the slave_parallel_workers is set to 0, MySQL 5.7 is degraded to the original single-threaded copy, but the slave_parallel_workers is set to 1, the SQL thread function is converted to coordinator thread, However, only 1 worker threads are played back and are single-threaded. However, there are some differences between the two performance, because more than once coordinator thread forwarding, so Slave_parallel_workers=1 performance is worse than 0, in inside June's test also has about 20% performance degradation, as shown:

This introduces another problem, if the host on a small load, then the efficiency of the group submission is not high, it is likely that the number of transactions per group commits only 1, then in the slave playback, although the parallel replication is turned on, but the performance is worse than the original single-threaded phenomenon, that is, the delay increased.

enhanced multi-threaded slave configuration

Said so much, to open enhanced multi-threaded slave is actually very simple, just according to the following settings:

123456 # slaveslave-parallel-type=LOGICAL_CLOCKslave-parallel-workers=16master_info_repository=TABLErelay_log_info_repository=TABLErelay_log_recovery=ON

mysql5.6 's parallel replication capabilities:

Admittedly, MySQL version 5.6 also supports so-called parallel replication, but its parallelism is based on schema, which is library-based. If there are multiple schemas in a user's MySQL db instance, the speed of copying from the slave can indeed be a big help. The schema for MySQL 5.6 parallel replication is as follows:

The red box is the key to implementing parallel replication. Prior to MySQL version 5.6, there were two thread I/O threads and SQL threads on the slave server. The I/O thread is responsible for receiving the binary log (more accurately the event of the binary log), and the SQL thread is playing back the binary log. If the parallel copy feature is turned on in MySQL version 5.6, then the SQL thread becomes the coordinator thread, and the coordinator thread is primarily responsible for the contents of the previous two parts:

    • If the judgment can be executed in parallel, then select the worker thread to execute the binary log of the transaction
    • If the judgment cannot be executed in parallel, if the operation is DDL, or if the transaction spans the schema, wait for all worker threads to complete before executing the current log

This means that the coordinator thread does not just send the log to the worker thread, but it can also replay the log, but all the operations that can be performed in parallel are delivered by the worker thread. Coordinator threading and worker is a typical producer and consumer model.

The above-mentioned mechanism realizes the two problems of parallel replication based on schema, first of all, the crash safe function is not good, because it is possible to perform the transaction after the execution of the parallel replication relationship, then when the crash occurs, this part of the processing logic is more complex. From the code point of view, 5.6 here introduced the Low-water-mark tag to solve the problem, from the design point of view (wl#5569), it is hoped to use the power of the log to solve the problem, but 5.6 of the binary log playback can not achieve idempotent. Another key issue is that the parallel replication effect is not high, and if the user instance has only one library, then parallel playback is not possible, even if the performance is worse than the original single thread. Single-Library multi-table is a more common situation than multi-Library multi-table.

Parallel replication of mysql5.7:

MySQL 5.7 is called true parallel replication, the main reason is that the playback of the slave server is consistent with the host, that is, how to perform parallel playback on slave on the master server. There is no longer a parallel replication limit for libraries and there is no special requirement for binary log formats (library-based parallel replication is not required).

From the MySQL official point of view, its original plan of parallel replication is to support the table-level parallel replication and row-level parallel replication, row-level parallel replication by parsing the row format of the binary log to complete the way, wl#4648. But the final show to the small partner is indeed in the development plan called: mts:prepared transactions slave parallel applier, visible: wl#6314. The idea of parallel replication was first proposed by MARIADB's Kristain, and has appeared in Mariadb 10, believing that many of the features chosen by MARIADB are one of the most important functions of parallel replication.

The idea of MySQL 5.7 parallel replication is straightforward, Word: A group commits a transaction that can be replayed in parallel because these transactions have entered the prepare phase of the transaction, which means there is no conflict between the transactions (otherwise it is impossible to commit).

To be compatible with MySQL 5.6 Library-based parallel replication, 5.7 introduces a new variable slave-parallel-type, which can be configured with the following values:

    • Database: Default value, library-based parallel replication
    • Logical_clock: Parallel replication method based on group submission
Gtid support for parallel replication

How to know if a transaction is in a group is another problem because the original MySQL does not provide such information. In MySQL version 5.7, it was designed to store the information submitted by the group in Gtid. So if the user does not turn on the Gtid function, will the parameter Gtid_mode set to OFF? MySQL 5.7 also introduces a binary log event type called Anonymous_gtid, such as:

1234567891011 mysql> SHOW BINLOG EVENTS in ‘mysql-bin.000006‘;+------------------+-----+----------------+-----------+-------------+-----------------------------------------------+| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |+------------------+-----+----------------+-----------+-------------+-----------------------------------------------+| mysql-bin.000006 | 4 | Format_desc | 88 | 123 | Server ver: 5.7.7-rc-debug-log, Binlog ver: 4 || mysql-bin.000006 | 123 | Previous_gtids | 88 | 194 | f11232f7-ff07-11e4-8fbb-00ff55e152c6:1-2 || mysql-bin.000006 | 194 | Anonymous_Gtid | 88 | 259 | SET @@SESSION.GTID_NEXT= ‘ANONYMOUS‘ || mysql-bin.000006 | 259 | Query | 88 | 330 | BEGIN || mysql-bin.000006 | 330 | Table_map | 88 | 373 | table_id: 108 (aaa.t) || mysql-bin.000006 | 373 | Write_rows | 88 | 413 | table_id: 108 flags: STMT_END_F |......

This means that even if Gtid is not turned on in MySQL version 5.7, there will be a anonymous_gtid before each transaction starts, and there is a group submission in this Gtid.

Logical_clock

However, through the above show BINLOG EVENTS, we did not find any information about the group submissions. However, with the Mysqlbinlog tool, users can discover the internal information submitted by the group:

123456789101112131415 [email protected]:~# mysqlbinlog mysql-bin.0000006 | grep last_committed#150520 14:23:11 server id 88 end_log_pos 259 CRC32 0x4ead9ad6 GTID last_committed=0 sequence_number=1#150520 14:23:11 server id 88 end_log_pos 1483 CRC32 0xdf94bc85 GTID last_committed=0 sequence_number=2#150520 14:23:11 server id 88 end_log_pos 2708 CRC32 0x0914697b GTID last_committed=0 sequence_number=3#150520 14:23:11 server id 88 end_log_pos 3934 CRC32 0xd9cb4a43 GTID last_committed=0 sequence_number=4#150520 14:23:11 server id 88 end_log_pos 5159 CRC32 0x06a6f531 GTID last_committed=0 sequence_number=5#150520 14:23:11 server id 88 end_log_pos 6386 CRC32 0xd6cae930 GTID last_committed=0 sequence_number=6#150520 14:23:11 server id 88 end_log_pos 7610 CRC32 0xa1ea531c GTID last_committed=6 sequence_number=7#150520 14:23:11 server id 88 end_log_pos 8834 CRC32 0x96864e6b GTID last_committed=6 sequence_number=8#150520 14:23:11 server id 88 end_log_pos 10057 CRC32 0x2de1ae55 GTID last_committed=6 sequence_number=9#150520 14:23:11 server id 88 end_log_pos 11280 CRC32 0x5eb13091 GTID last_committed=6 sequence_number=10#150520 14:23:11 server id 88 end_log_pos 12504 CRC32 0x16721011 GTID last_committed=6 sequence_number=11#150520 14:23:11 server id 88 end_log_pos 13727 CRC32 0xe2210ab6 GTID last_committed=6 sequence_number=12#150520 14:23:11 server id 88 end_log_pos 14952 CRC32 0xf41181d3 GTID last_committed=12 sequence_number=13...

It can be found that more than the original binary log content last_committed and sequence_number,last_committed represent the time the transaction was committed, the last transaction committed the number, if the transaction has the same last_committed, Indicates that these transactions are within a group and can be played back in parallel. For example, there are 6 transactions with the above last_committed of 0, which indicates that 6 transactions are committed when the group commits, and that the 6 transactions can be played back in parallel in the slave machine.

The above-mentioned last_committed and sequence_number represent the so-called Logical_clock. First look at the source code for the definition of Logical_clock:

12345678910111213 class Logical_clock{  private:  int64 state;  /*  Offset is subtracted from the actual "absolute time" value at  logging a replication event. That is the event holds logical  timestamps in the "relative" format. They are meaningful only in  the context of the current binlog.  The member is updated (incremented) per binary log rotation.  */  int64 offset;  ......

State is a self-increment value, offset is updated every time the binary log occurs rotate, and the state value when rotate occurs is recorded. In fact, state and offset record the global count value, while there is only the relative value of the current file in the binary log. The scenario for using Logical_clock is as follows:

123456789 class mysql_bin_log:  public tc_log {    ...    public :    /* Committed transactions timestamp */    logical_clock max_committed_transaction;    /* "Prepared" Transactions timestamp */    logical_clock Transaction_counter;   

You can see that there are two Logical_clock variables defined in class Mysql_bin_log:

    • Max_c ommitted_transaction: Records the logical_clock of the last group submission, representing the last_committed in the mysqlbinlog above
    • Transaction_counter: Records the Logcial_clock of each transaction in the current group submission, representing the sequence_number in the mysqlbinlog above
Parallel Replication Test

Shows the QPS of the slave server after MTS is turned on. The testing tool is Sysbench's single-table full-update test, which shows the best performance under 16 threads, the QPS of the slave can reach more than 25000, and the further increase in parallel execution of threads to 32 does not bring a higher elevation. The original single-line Cheng of the QPS only around 4000, see MySQL 5.7 mts brought performance improvement, and because the test is a single table, so MySQL 5.6 MTS mechanism is completely powerless.

MySQL Parallel copy feature

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.