switch logic that was previously designed
1 query Slve delay, more than n seconds delay wait or return failure to ensure that the business impact time is the shortest
2 Login proxy node, disable current hproxy, make subsequent business connections through proxy fail
3 Login proxy node, shutdown currently connected through proxy session (if SQL can be quickly completed, this step can be used to do not do, you could do a time threshold detection, when n seconds after the business layer connection is Kill)
4 Record the Binlog and POS points of the current main library
5 waiting for the data to be identical between the two-master, that is, after the POS point recorded from the library to step 4 can be implemented using the Master_pos_wait (Log_name,log_pos[,timeout]) built-in function
6 Perform a switch, that is, an alternate node in the Enable M-m
7 Return switch Successful
Failure Phenomenon
After switching, show slave status appears a large number of primary key conflict error
cause Analysis
Through the analysis, found that through the Haproxy to kill the session is not really reliable, when the connection is established, even if through Haproxy kill the session or kill the MySQL client in the process, actually no effect, will continue to execute successfully, If the main library inserts a 10 self ID that is not complete
When the new main library is connected, the slow SQL has not run out, that is, the record with ID 10 is not synchronized.
The new main library inserts a record again, and the resulting ID is still 10.
Slow SQL at this time in the main library execution completed, upload to the new main database key conflict, error scene simulation
1 login MySQL via proxy, execute the following command
Mysql> select * from Test_ha_switch;
Empty Set (0.00 sec)
mysql> INSERT INTO Test_ha_switch (DD) VALUES (sleep (100));
SQL is in operation at this time
2 login proxy, kill session by Socat
echo "Shutdown Sessions Server ha-proxy/10.9.188.208" | Socat Stdio/opt/udb/instance/haproxy/7378229c-7ada-4f90-b1ec-96d704979426/stats
3 At this point go back to the session in step 1 and find that it is already in error 2013 (HY000): Lost connection to MySQL server during query
mysql> INSERT INTO Test_ha_switch (DD) VALUES (sleep (100));
ERROR 2013 (HY000): Lost connection to MySQL server during query
4 Query the table again, found that the record is still written in the
Mysql> select * from Test_ha_switch;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect ...
Connection id:1536352
Current Database:jiang
+----+------+
| ID | DD |
+----+------+
| 1 | 0 |
+----+------+
1 row in Set (0.01 sec)
In addition, if there is no proxy, it is also well simulated by killing the process where the client is located
# mysql-s/opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock-uucloudbackup jiang-e ' show Processlist; "
+---------+--------------+-------------------+-------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+---------+--------------+-------------------+-------+---------+------+-------+------------------+
| 1536517 | Ucloudbackup | 10.9.125.20:62374 | Jiang | Sleep | 7 | | NULL |
| 1536566 | Ucloudbackup | localhost | Jiang | Query | 0 | init | Show Processlist |
+---------+--------------+-------------------+-------+---------+------+-------+------------------+
# nohup Mysql-s/opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock-uucloudbackup jiang-e "INSERT into Test_ha_switch (DD) select DD from Test_ha_switch;" &
[1] 28752
# nohup:ignoring input and appending output to ' nohup.out '
# Ps-ef|grep Test_ha_switch
Root 28752 17668 0 18:10 pts/0 00:00:00 mysql-s/opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/ mysqld.sock-uucloudbackup-px xxxxxxxx jiang-e INSERT INTO Test_ha_switch (DD) select DD from Test_ha_switch;
Root 29117 17668 0 18:10 pts/0 00:00:00 grep test_ha_switch
# kill-9 28752
[1]+ killed Nohup mysql-s/opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock -uucloudbackup jiang-e "INSERT into Test_ha_switch (DD) select DD from Test_ha_switch;"
# mysql-s/OPT/UDB/INSTANCE/MYSQL-5.6/5770E236-FEF6-4D61-BC61-BEB6D69F7DBE/MYSQLD.SOCK-UUCLOUDBACKUP-PJFKS9FFXVK Jiang-e "Show processlist;"
+---------+--------------+-------------------+-------+---------+------+--------------+------------------------- -------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+---------+--------------+-------------------+-------+---------+------+--------------+------------------------- -------------------------------------+
| 1536517 | Ucloudbackup | 10.9.125.20:62374 | Jiang | Sleep | 29 | | NULL |
| 1536570 | Ucloudbackup | localhost | Jiang | Query | 16 | Sending Data | INSERT into Test_ha_switch (DD) SELECT DD from Test_ha_switch |
| 1536575 | Ucloudbackup | localhost | Jiang | Query | 0 | init | Show Processlist |
+---------+--------------+-------------------+-------+---------+------+--------------+------------------------- -------------------------------------+
Optimization Method
The optimization method is also relatively simple, that is, in the switching logic between 3 and 4, by direct access to the main node of the way (at this time has been unable to login DB by proxy), if there is a business connection, then kill the current business connection through the MySQL kill syntax;
It is worth mentioning that kill the thread to ensure that the normal exit, this may be the case, there are two connections, when you kill the first connection has been disconnected, then you go to kill the thread may be an error exit, so that the second thread actually did not kill