The principle and problem of Mysql semi-synchronous replication _mysql

Source: Internet
Author: User
Tags epoll readable

The difference between MySQL semi-synchronous replication and asynchronous replication as shown in the previous schema diagram: In the case of MySQL asynchronous replication, MySQL master server transfers its own binary log through the replication thread, MySQL Master Sever automatically returns data to the client regardless of whether the binary log was accepted on the slave. In a semi-synchronous replication architecture, when master sends its own binlog to slave, make sure that slave has accepted the binary log before returning the data to the client. Compare two schemas: Asynchronous replication can ensure a fast response structure for the user, but it does not ensure that the binary logs do reach the slave; semi-synchronous replication is slightly slower for the customer's request response, but he can guarantee the integrity of the binary log.

1. Background to the problem

By default, on-line MySQL replication is asynchronous replication, so in extreme cases, the main standby switch, there will be a certain probability of cubby the main library data, so after switching, we will roll back through the tool to ensure that the data is not lost. Semi-synchronous replication requires that the main library perform every transaction, requiring at least one standby to be successfully received before it is truly completed, so that the strong consistency of the primary repository can be maintained. To make sure that the primary repository data is strong and consistent, and to reduce data loss, try to open MySQL's replicated semi-synchronous (Semi-sync) feature in a production environment. In the actual operation process, found that most of the instances of half synchronization can be normal operation, but a small number of instances are always not open (can only be normal replication mode of operation), more exotic is the same host two instances, one can be opened, one cannot. The final positioning of the problem is also very simple, but the investigation came out or took some effort, the following will describe the entire issue of the process of troubleshooting.

2. Semi-synchronous replication principle

MySQL's primary repository is consistent with the Binlog log, the main library performs the transaction locally, and the Binlog log is returned to the user after the disk is dropped, and the library synchronizes the main library by pulling the main library Binlog log. By default, the main library and the repository are not strictly synchronized, so there is a certain probability of the database and the main library is not the data. The emergence of the semi synchronous feature is to ensure that the data is consistent at any time. As opposed to asynchronous replication, every transaction required by a semi-synchronous replication requires that at least one of the repositories be received successfully before being returned to the user. The implementation principle is also very simple, after the main library is executed locally, waiting for the response message of the standby (containing the Binlog (File,pos) received by the latest repository), receiving the standby response message, and returning it to the user, such a transaction is actually completed. On the main library instance, there is a dedicated thread (Ack_receiver) to receive the response message from the repository, and notifies the master repository that the log has been received and can continue execution. About the specific implementation of the half synchronization, you can refer to another article, MySQL semi-synchronous (Semi-sync) source code implementation.

3. Problem analysis

A brief introduction to the principle of semi-synchronous replication, now look at the specific problems. The state variable "rpl_semi_sync_master_status" of the problem instance is always off after the primary repository opens the semi-synchronous switch, indicating that replication is running in the state of normal replication.

(1). Modify the Rpl_semi_sync_master_timeout parameter.

There is a rpl_semi_sync_master_timeout parameter in the semi-synchronous replication parameter to control the time that the main library waits for the standby to respond to the message, and if this value is exceeded, the standby has not been received (the standby may be dead, or the library may perform very slowly, far from the main library), This time the replication will switch to normal replication, avoiding the main library's execution transactions for a long time. On the line this value is 50ms, simply think this value is too small, and then change it to 10s, but the problem is still puzzling.

(2). Print Log

The simplest and most stupid way to troubleshoot a problem is to log it and see which link is out of the question. The primary and standby libraries have rpl_semi_sync_master_trace_level and Rpl_semi_sync_slave_trace_level parameters to control the semi-synchronous replication print log respectively. Sets two parameter values to 80 (64+16), logs verbose log information, and calls to and from functions.

Master

2016-01-04 18:00:30 13212 [note] Replsemisyncmaster::updatesyncheader:server ( -1721062019), (mysql-bin.000006, 500717950) sync (1), REPL (1)
2016-01-04 18:00:40 13212 [Warning] Timeout waiting for reply of Binlog (file:mysql-bin.000006, pos:500717950), Semi-syn C up to file, position 0.
2016-01-04 18:00:40 13212 [note] semi-sync replication switched off.

Slave

2016-01-04 18:00:30 38932 [note]---> replsemisyncslave::slavereply enter
2016-01-04 18:00:30 38932 [note] replsemisyncslave::slavereply:reply (mysql-bin.000006, 500717950)
2016-01-04 18:00:30 38932 [note] <---replsemisyncslave::slavereply exit (0)

From the master log you can see that in 2016-01-04 18:00:30, the main library set a semi-synchronous flag and started waiting for a response from the standby, waiting for the 10s, still not receiving a response, think of the timeout, and then shut down the semi-synchronous mode and switch to normal mode. However, from the slave log, in 2016-01-04 18:00:30 (mysql-bin.000006, 500717950) has been sent to the main library, indicating that the log has been received. This means that the master log has played Semi-sync, Slave received the log, and also back to the package, Master did wait for 10s, is not received packets, so switch to normal replication. Now the question becomes, why does master not receive it?

(3) Select function

As mentioned earlier, there is a thread (ack_receiver) on the main library instance that is dedicated to receiving a response packet, which listens to the socket through the SELECT function and, after discovering a slave response message, reads the message and notifies the worker that the thread can continue executing. So the question is, does it appear on the Select function? Because select is a system call, there is no doubt, but has been followed here, it must be seen. There are several important macro definitions and descriptions associated with the Select function. Mainly implemented in the three files of/usr/include/bits/typesizes.h,/usr/include/bits/select.h and/usr/include/sys/select.h.

Fd_zero (Fd_set *fdset): Clears Fdset connection to all file handles. Fd_set (int fd, Fd_set *fdset): Establishes the connection between the file handle FD and the Fdset. FD_CLR (int fd, Fd_set *fdset): Clears the connection between the file handle FD and Fdset. Fd_isset (int fd, Fd_set *fdset): Check fdset associated file handle FD is readable and writable when >0 indicates that it is writable.

 array {__fd_mask __fds_bits[__fd_setsize/__nfdbits]; 1024/64=16 (long int)}fd_set #de Fine __fd_set_size 1024 typedef long INT __fd_mask; 8 bytes #define __NFDBITS (8 * (int) sizeof (__fd_mask))//64-bit #define __FDMASK (d) ((__fd_mask) 1 << ((d)% __nfdbit S))//fd%64=n, the nth bit is set to 1 #define __fdelt (d) ((d)/__nfdbits)//representation in the first few long int #define __FDS_BITS (set) (set)->__fds_bi TS) #define __FD_SET (d, set) (__fds_bits (set) [__fdelt (d)] |= __fdmask (d)) #define __FD_CLR (d, set) (__fds_bits (set) [_ _fdelt (d)] &= ~__fdmask (d)) #define __fd_isset (d, set) \ ((__fds_bits (set) [__fdelt (d)] & __fdmask (d))!= 0) /pre> 

By Fd_set you can set the handle that we want to listen to, the handle information is stored in the Fd_set bit array, the number of array elements is determined by __FD_SETSIZE/64, and for __fd_setsize=1024 the entire array is only 16 long int. Each handle occupies one bit and is 1024 bits, and can store 1024 handles. Assuming the handle value is 138, then 138/64=2,138%64=10, then the handle is labeled in the array at the 10th position of the 2nd long int 1. So if the handle value is more than 1024, does this not overflow? I carefully masturbate the code, found that there is no fault tolerant judgment, if the handle value of more than 1024 will definitely overflow. Since the Select function iterates through each bit of the array and then determines whether the handle is readable or writable, the handle is never judged for more than 1024, so the master library never knows whether the repository sent the response package.

(4) Verify

The above is only theoretical analysis, if the actual operation of the instance handle is indeed more than 1024, then the problem is located.

1. Get MySQL process mysql-pid

Ps–aux | grep mysqld | grep Port

2.GDB attach to the process

Gdb–p Mysql-pid

3. Find the ack_receive thread and switch

Info thread
Thread thread_id

4. Print the value of the socket, where the FD value is 2344.

(5) How to solve

We see that because of the definition of __fd_setsize, which is typically 1024, the Select function can only listen for up to 1024 handles, and the maximum handle value is no more than 1024. The first method is to increase the parameter, but this method requires recompiling the Linux kernel. And because of the select mechanism, each of the bits that needs to be traversed each time to determine whether a message arrives on the handle, so if the setting is large, the efficiency is very low. Select is an older IO multiplexing mechanism, and more advanced poll,epoll have similar functionality and are more powerful, with no handle totals and maximum handle constraints. The select,poll,epoll and other mechanisms, we can go to the Internet to check the information, here does not open discussion.

(6) Official version

Looking at the latest Oracle version of Git 5.7 of the source code, this piece is also implemented with SELECT, so there are similar problems. Of course, because the handle number has multiplexing mechanism, when the number of connections on the instance is very small, or long connection is not much, it is not easy to appear fd>1024, so this bug is not very easy to appear, but the problem is universal.

(7) Problem extension

After the problem was fixed, another problem bothered me for half a day. Because there are 3 pieces of listening in the MySQL kernel, 1 is the listening port select,2 is the thread pool monitor epoll,3 is a half synchronous select Monitor. Slave Binlog dump thread is a normal worker thread, and the work thread socket will be monitored by the epoll, so that the Binlog dump socket will be simultaneously by the half synchronous select monitor and thread pool Epoll monitoring, this is not chaotic? Then carefully looked at the code, only to find that the thread pool Epoll listening to use the Epolloneshot mode, each receive message will be untied, need to re-register, so will not appear the same handle by the two monitoring mechanism simultaneously monitoring the situation.

To this end, the process of troubleshooting is over, the conclusion is relatively simple, but the positioning of the problem does take some effort. Because select a more general multiplexing mechanism for multipath IO, it is useful to select the children's shoes, may have to pay attention to its limitations.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.