The FailoverwiththeMySQLUtilities-Part1: mysqlrpladmin_MySQL

Source: Internet
Author: User
FailoverwiththeMySQLUtilities-Part1: mysqlrpladmin MySQL Utilitiesare a set of tools provided by Oracle to perform extends kinds of administrative tasks. When GTID-replication is enabled, 2 tools can be used for slave promotion: mysqlrpladminAnd mysqlfailover. We will review mysqlrpladmin(Version 1.4.3) in this post.

Summary
  • mysqlrpladminCan perform manual failover/switchover when GTID-replication is enabled.
  • You need to have your servers configured--master-info-repository = TABLEOr to add--rpl-userOption for the tool to work properly.
  • The check for errant transactions is failing in the current GA version (1.4.3) so be extra careful when using it or watchbug # 73110to see when a fix is committed.
  • There are some limitations, for instance the inability to pre-configure the list of slaves in a configuration file or the inability to check that the tool will work well without actually doing a failover or switchover.
Failover vs switchover

Mysqlrpladmin can help you promote a slave to be the new master when the master goes down and then automate replication reconfiguration after this slave promotion. there are 2 separate scenarios: unplanned promotion (failover) and planned promotion (switchover ). beyond the words, it has implications on the way you have to execute the tool.

Setup for this test

To test the tool, our setup will be a master with 2 slaves, all using GTID replication.mysqlrpladminCan show us the current replication topology withhealthCommand:

$ mysqlrpladmin --master=root@localhost:13001 --discover-slaves-login=root health# Discovering slaves for master at localhost:13001# Discovering slave at localhost:13002# Found slave: localhost:13002# Discovering slave at localhost:13003# Found slave: localhost:13003# Checking privileges.## Replication Topology Health:+------------+--------+---------+--------+------------+---------+| host | port | role| state| gtid_mode| health|+------------+--------+---------+--------+------------+---------+| localhost| 13001| MASTER| UP | ON | OK|| localhost| 13002| SLAVE | UP | ON | OK|| localhost| 13003| SLAVE | UP | ON | OK|+------------+--------+---------+--------+------------+---------+# ...done.

$ Mysqlrpladmin -- master = root @ localhost: 13001 -- discover-slaves-login = roothealth

# Discovering slaves for master at localhost: 13001

# Discovering slave at localhost: 13002

# Found slave: localhost: 13002

# Discovering slave at localhost: 13003

# Found slave: localhost: 13003

# Checking privileges.

#

# Replication Topology Health:

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

| Host | port | role | state | gtid_mode | health |

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

| Localhost | 13001 | MASTER | UP | ON | OK |

| Localhost | 13002 | SLAVE | UP | ON | OK |

| Localhost | 13003 | SLAVE | UP | ON | OK |

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

#... Done.

As you can see, we have to specify how to connect to the master (no surprise) but instead of listing all the slaves, we can let the tool discover them.

Simple failover scenario

What will the tool do when ‑ming failover? Essential tially we will give it the list of slaves and the list of candidates and it will:

  • Run a few sanity checks
  • Elect a candidate to be the new master
  • Make the candidate as up-to-date as possible by making it a slave of all the other slaves
  • Configure replication on all the other slaves to make them replicate from the new master

After killing-9 the master, let's try failover:

$ mysqlrpladmin --slaves=root:@localhost:13002,root:@localhost:13003 --candidates=root@localhost:13002 failover

$ Mysqlrpladmin -- slaves = root: @ localhost: 13002, root: @ localhost: 13003 -- candidates = root @ localhost: 13002 failover

This time, the master is down so the tool has no way to automatically discover the slaves. Thus we have to specify them with--slavesOption.

However we get an error:

# Checking privileges.# Checking privileges on candidates.ERROR: You must specify either the --rpl-user or set all slaves to use --master-info-repository=TABLE.

# Checking privileges.

# Checking privileges on candidates.

ERROR: Youmustspecifyeitherthe -- rpl-userorsetallslavestouse -- master-info-repository = TABLE.

The error message is clear, but it wowould have been nice to have such details when runninghealthCommand (maybe a warning instead of an error ). that wocould allow you to check beforehand that the tool can run smoothly rather than to discover in the middle of an emergency that you have to look at the documentation to find which option is missing.

Let's choose to specify the replication user:

$ mysqlrpladmin --slaves=root:@localhost:13002,root:@localhost:13003 --candidates=root@localhost:13002 --rpl-user=repl:repl failover# Checking privileges.# Checking privileges on candidates.# Performing failover.# Candidate slave localhost:13002 will become the new master.# Checking slaves status (before failover).# Preparing candidate for failover.# Creating replication user if it does not exist.# Stopping slaves.# Performing STOP on all slaves.# Switching slaves to new master.# Disconnecting new master as slave.# Starting slaves.# Performing START on all slaves.# Checking slaves for errors.# Failover complete.## Replication Topology Health:+------------+--------+---------+--------+------------+---------+| host | port | role| state| gtid_mode| health|+------------+--------+---------+--------+------------+---------+| localhost| 13002| MASTER| UP | ON | OK|| localhost| 13003| SLAVE | UP | ON | OK|+------------+--------+---------+--------+------------+---------+# ...done.

$ Mysqlrpladmin -- slaves = root: @ localhost: 13002, root: @ localhost: 13003 -- candidates = root @ localhost: 13002 -- rpl-user = repl: replfailover

# Checking privileges.

# Checking privileges on candidates.

# Parameter Ming failover.

# Candidate slave localhost: 13002 will become the new master.

# Checking slaves status (before failover ).

# Preparing candidate for failover.

# Creating replication user if it does not exist.

# Stopping slaves.

# Specify Ming STOP on all slaves.

# Switching slaves to new master.

# Disconnecting new master as slave.

# Starting slaves.

# Specify Ming START on all slaves.

# Checking slaves for errors.

# Failover complete.

#

# Replication Topology Health:

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

| Host | port | role | state | gtid_mode | health |

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

| Localhost | 13002 | MASTER | UP | ON | OK |

| Localhost | 13003 | SLAVE | UP | ON | OK |

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

#... Done.

Simple switchover scenario

Let's now restart the old master and configure it as a slave of the new master (by the way, this can be donemysqlreplicate, Another tool from the MySQL Utilities). If we want to promote the old master, we can run:

$ mysqlrpladmin --master=root@localhost:13002 --new-master=root@localhost:13001 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --quiet switchover# Discovering slave at localhost:13001# Found slave: localhost:13001# Discovering slave at localhost:13003# Found slave: localhost:13003+------------+--------+---------+--------+------------+---------+| host | port | role| state| gtid_mode| health|+------------+--------+---------+--------+------------+---------+| localhost| 13001| MASTER| UP | ON | OK|| localhost| 13002| SLAVE | UP | ON | OK|| localhost| 13003| SLAVE | UP | ON | OK|+------------+--------+---------+--------+------------+---------+

$ Mysqlrpladmin -- master = root @ localhost: 13002 -- new-master = root @ localhost: 13001 -- discover-slaves-login = root -- demote-master -- rpl-user = repl: repl -- quietswitchover

# Discovering slave at localhost: 13001

# Found slave: localhost: 13001

# Discovering slave at localhost: 13003

# Found slave: localhost: 13003

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

| Host | port | role | state | gtid_mode | health |

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

| Localhost | 13001 | MASTER | UP | ON | OK |

| Localhost | 13002 | SLAVE | UP | ON | OK |

| Localhost | 13003 | SLAVE | UP | ON | OK |

+ ------------ + -------- + --------- + -------- + ------------ + --------- +

Notice that the master is available in this case so we can usediscover-slaves-loginOption. Also notice that we can tune the verbosity of the tool by using--quietOr--verboseOr even log the output in a file--log.

We also used--demote-masterTo make the old master a slave of the new master. Without this option, the old master will be isolated from the other nodes.

Extension points

In general doing switchover/failover at the database level is one thing but informing the other components of the application that something has changed is most often necessary for the application to keep on working correctly.

This is where the extension points are handy: you can execute a script before switchover/failover--exec-beforeAnd after switchover/failover--exec-after.

For instance with these simple scripts:

# cat /usr/local/bin/check_before#!/bin/bash/usr/local/mysql5619/bin/mysql -uroot -S /tmp/node1.sock -Ee 'SHOW SLAVE STATUS' > /tmp/before# cat /usr/local/bin/check_after#!/bin/bash/usr/local/mysql5619/bin/mysql -uroot -S /tmp/node1.sock -Ee 'SHOW SLAVE STATUS' > /tmp/after

# Cat/usr/local/bin/check_before

#! /Bin/bash

/Usr/local/mysql5619/bin/mysql-uroot-S/tmp/node1.sock-Ee 'show SLAVE status'>/tmp/before

# Cat/usr/local/bin/check_after

#! /Bin/bash

/Usr/local/mysql5619/bin/mysql-uroot-S/tmp/node1.sock-Ee 'show SLAVE status'>/tmp/after

We can execute:

$ mysqlrpladmin --master=root@localhost:13001 --new-master=root@localhost:13002 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --quiet --exec-before=/usr/local/bin/check_before --exec-after=/usr/local/bin/check_after switchover

$ Mysqlrpladmin -- master = root @ localhost: 13001 -- new-master = root @ localhost: 13002 -- discover-slaves-login = root -- demote-master -- rpl-user = repl: repl -- quiet -- exec-before =/usr/local/bin/check_before -- exec-after =/usr/local/bin/check_afterswitchover

And looking the/tmp/before and/tmp/after, we can see that our scripts have been executed:

# cat /tmp/before# cat /tmp/after*************************** 1. row *************************** Slave_IO_State: Queueing master event to the relay logMaster_Host: localhostMaster_User: replMaster_Port: 13002[...]

# Cat/tmp/before

# Cat/tmp/after

* *************************** 1. row ***************************

Slave_IO_State: Queueingmastereventtotherelaylog

Master_Host: localhost

Master_User: repl

Master_Port: 13002

[...]

If the external script does not seem to work, using-verbose can be useful to diagnose the issue.

What about errant transactions?

We already mentioned that errant transactions can createlots of issueswhen a new master is promoted in a cluster running GTIDs. So the question is: howmysqlrpladminBehaves when there is an errant transaction?

Let's create an errant transaction:

# On localhost:13003mysql> CREATE DATABASE test2;mysql> FLUSH LOGS;mysql> SHOW BINARY LOGS;+------------------+-----------+| Log_name | File_size |+------------------+-----------+| mysql-bin.000001 | 69309 || mysql-bin.000002 | 1237667 || mysql-bin.000003 | 617 || mysql-bin.000004 | 231 |+------------------+-----------+mysql> PURGE BINARY LOGS TO 'mysql-bin.000004';

# On localhost: 13003

Mysql> CREATEDATABASEtest2;

Mysql> FLUSHLOGS;

Mysql> SHOWBINARYLOGS;

+ ------------------ + ----------- +

| Log_name | File_size |

+ ------------------ + ----------- +

| Mysql-bin.000001 | 69309 |

| Mysql-bin.000002 | 1237667 |

| Mysql-bin.000003 | 617 |

| Mysql-bin.000004 | 231 |

+ ------------------ + ----------- +

Mysql> PURGEBINARYLOGSTO 'MySQL-bin.000004 ';

And let's try to promote localhost: 13003 as the new master:

$ mysqlrpladmin --master=root@localhost:13001 --new-master=root@localhost:13003 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --quiet switchover[...]+------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| host | port | role| state| gtid_mode| health|+------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| localhost| 13003| MASTER| UP | ON | OK|| localhost| 13001| SLAVE | UP | ON | IO thread is not running., Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Slave has 1 transactions behind master.|| localhost| 13002| SLAVE | UP | ON | IO thread is not running., Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Slave has 1 transactions behind master.|+------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

$ Mysqlrpladmin -- master = root @ localhost: 13001 -- new-master = root @ localhost: 13003 -- discover-slaves-login = root -- demote-master -- rpl-user = repl: repl -- quietswitchover

[...]

+ ------------ + -------- + --------- + -------- + ------------ + ----------------------------------------------- +

| Host | port | role | state | gtid_mode | health |

+ ------------ + -------- + --------- + -------- + ------------ + ----------------------------------------------- +

| Localhost | 13003 | MASTER | UP | ON | OK |

| Localhost | 13001 | SLAVE | UP | ON | IOthreadisnotrunning ., gotfatalerror1236frommasterwhenreadingdatafrombinarylog: 'The slave is ING using change master to MASTER_AUTO_POSITION = 1, but The master has purged binary logs containing GTIDs that the slave requires. ', Slavehas1transactionsbehindmaster. |

| Localhost | 13002 | SLAVE | UP | ON | IOthreadisnotrunning ., gotfatalerror1236frommasterwhenreadingdatafrombinarylog: 'The slave is ING using change master to MASTER_AUTO_POSITION = 1, but The master has purged binary logs containing GTIDs that the slave requires. ', Slavehas1transactionsbehindmaster. |

+ ------------ + -------- + --------- + -------- + ------------ + ----------------------------------------------- +

Oops! Although it is forbidden by the documentation, the tool does not check errant transactions. This is a major issue as you cannot run logs/switchover reliably with GTID replication if errant transactions are not correctly detected.

The documentation suggests errant transactions shoshould be checked and a quick look at the code confirms that, but it does not work! So it has beenreported.

Some limitations

Apart from the missing errant transaction check, I also noticed a few limitations:

  • You cannot use a configuration file listing all the slaves. This becomes boring once you have a large amount of slaves. In such a case, you shoshould write a wrapper script aroundmysqlrpladminTo generate the right command for you
  • The slave election process is either automatic or it relies on the order of the servers given in--candidatesOption. This is not very sophisticated.
  • It wocould be useful to have a-dry-run mode which wocould validate that everything is configured correctly but without actually failing/switching over. This is something MHA does for instance.
Conclusion

mysqlrpladminIs a very good tool to help you perform manual failover/switchover in a cluster using GTID replication. the main caveat at this point is the failing check for errant transactions, which requires a lot of care before executing the tool.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.