1. Brief introduction of MHA
MHA (Master high availability) is currently a relatively mature solution for MySQL's highly available, and it is developed by the Japanese Dena company Youshimaton (now a Facebook company) is an excellent set of highly available software for failover and master-slave upgrade in MySQL high availability environments. During the MySQL failover process, MHA can automatically complete the failover operation of the database within 0-30 seconds, and in the process of failover, MHA can ensure the consistency of the data to the maximum extent, in order to achieve a true sense of high availability
managed to GitHub with the corresponding address as follows
manager地址,https://github.com/yoshinorim/mha4mysql-managernode地址,https://github.com/yoshinorim/mha4mysql-node截止本文档编写时间为止,最新版本为0.58
Advantages
Refer to the official documentation for more information about Https://github.com/yoshinorim/mha4mysql-manager/wiki/Advantages
It is summarized as follows:
1、主从切换非常迅速,通常10-30s2、最大程度上解决数据一致性的问题3、不需要修改当前已有的MySQL架构和配置4、不需要另外多余的服务器5、没有性能损耗6、没有存储引擎限制
??
2. Overview of MHA Construction
Database schema: one master two from
master:192.168.142.48slave1:192.168.142.49slave2:192.168.142.50
MHA Architecture:
manager:192.168.142.49node:192.168.142.48,192.168.142.49,192.168.142.50
Keepalived Architecture
VIP:192.168.142.235节点:192.168.142.48,192.168.142.49
Build the Environment version
数据库版本:5.7.18,5.7.18-log MySQL Community Server (GPL)MHA版本:v0.58,mha4mysql-manager-0.58 + mha4mysql-node-0.58Keepalived:keepalived-1.4.3Linux主机:centos7.2
List of host and installation Services
Host IP |
Installation Services |
192.168.142.48 |
MySQL master; MHA node; keepalived |
192.168.142.49 |
mysql slave; MHA node && MHA Manager; keepalived |
192.168.142.50 |
mysql slave; MHA node |
Build a general step
1、keepalived的搭建2、linux机器之间配置互信3、mysql主从环境的搭建4、MHA的安装和配置5、配置文件测试 测试ssh连通性 测试集群中的主从复制6、启动MHA管理节点
?
3. Construction of Keepalived
This is installed using the RPM method, respectively, installing keepalived on 192.168.142.48 and 192.168.142.49, or compiling with the source code.
1、安装yum install keepalived -y2、配置文件修改vim /etc/keepalived/keepalived.conf! Configuration File for keepalivedglobal_defs { notification_email { [email protected]} notification_email_from [email protected] smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id LVS_DEVEL}vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.142.235 }}note:其中主MASTER优先级高(100),BACKUP优先级(90)低一些3、启动、关闭、查看systemctl start keepalivedsystemctl stop keepalivedsystemctl status keepalived
?
4. Configure trust between Linux machines
There are many ways to build SSH trust between servers, here is a more convenient way to create a private
1、创建秘钥(回车连续三次即可)[[email protected]]# ssh-keygen -t rsa2、进入.ssh目录查看文件[[email protected]]# cd /root/.ssh/ && lsid_rsaid_rsa.pub3、修改id_rsa.pub名称mv id_rsa.pub authorized_keys4、将/root/.ssh文件夹拷贝到192.168.142.49和192.168.142.50即可建立三台机器之间的互信(如果端口不是默认22,需要额外指定参数 -P port)scp -P 8822 -r /root/.ssh 192.168.142.49:/root/scp -P 8822 -r /root/.ssh 192.168.142.50:/root/
??
5. mysql master and slave environment construction
1, manual construction or automation scripts in three host configuration MySQL instance, here can refer to MySQL Single instance construction method, not repeat
2. Configure replication user L and monitor users on each host DB instance
1、复制用户GRANT SUPER, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO ‘repl‘@‘192.168.142.%‘ IDENTIFIED BY ‘123456‘;FLUSH PRIVILEGES;2、监控用户GRANT ALL PRIVILEGES ON *.* TO ‘root‘@‘192.168.142.%‘ IDENTIFIED BY ‘123456‘;FLUSH PRIVILEGES;
3, configuration from the synchronization
1、master上执行show master status;,记录对应File名称和Position列的数字,这里为 mysql-bin.000001 1542、slave1和slave2上分别执行CHANGE MASTER TO MASTER_HOST=‘192.168.142.48‘,MASTER_USER=‘repl‘,MASTER_PASSWORD=‘123456‘,MASTER_PORT=5700,MASTER_LOG_FILE=‘mysql-bin.000001‘,MASTER_LOG_POS=154;
??
6. Installation and configuration of the MHA
Refer to the official documentation for more information about Https://github.com/yoshinorim/mha4mysql-manager/wiki/Installation
?
1, each host installed node node, where the use of RPM installation, you can also use the source code compiled
1、下载https://github.com/yoshinorim/mha4mysql-node/releases 选择对应的rpm包下载2、安装依赖yum install perl-DBD-MySQL -y3、安装node节点rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
?
2, one of the host (192.168.142.49) Installation Manager node, where the use of RPM installation, you can also use the source code compiled
1、下载https://github.com/yoshinorim/mha4mysql-manager/releases 选择对应的rpm包下载2、安装依赖yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y3、安装manager节点rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
??
3. Create a MHA configuration file
APP1.CNF, the full version of the personal installation configuration instance has been placed on GitHub and can be viewed as a link
Https://github.com/dwwang1992/configs-of-MHA/blob/master/app1.cnf
??
4, master_ip_failover_script corresponding script
Master_ip_failover, the full version of the personal installation configuration instance has been placed on GitHub and can be viewed as a link
Https://github.com/dwwang1992/configs-of-MHA/blob/master/master_ip_failover
??
5, master_ip_online_change_script corresponding script
Master_ip_online_change, the full version of the personal installation configuration instance has been placed on GitHub and can be viewed as a link
Https://github.com/dwwang1992/configs-of-MHA/blob/master/master_ip_online_change
??
6, report_script corresponding script
Send_report, the full version of the personal installation configuration instance has been placed on GitHub and can be viewed as a link
Https://github.com/dwwang1992/configs-of-MHA/blob/master/send_report
?
7. configuration file test test SSH connectivity
[[email protected] scripts]# masterha_check_ssh--conf=/data/mha/app1.cnf Fri may 14:24:34 2018-[WARNING] Globa L configuration file/etc/masterha_default.cnf not found. Skipping.fri 14:24:34 2018-[INFO] Reading application default configuration from/data/mha/app1.cnf. Fri 14:24:34 2018-[INFO] Reading server configuration from/data/mha/app1.cnf. Fri may 14:24:34 2018-[INFO] Starting SSH connection tests. Fri may 14:24:37 2018-[debug] Fri, 14:24:34 2018-[debug] connecting via SSH from [email protected] (192. 168.142.48:8822) to [email protected] (192.168.142.49:8822): Fri may 14:24:35 2018-[debug] OK. Fri 14:24:35 2018-[debug] connecting via SSH from [email protected] (192.168.142.48:8822) to [Email pro Tected] (192.168.142.50:8822): Fri may 14:24:36 2018-[debug] OK. Fri may 14:24:38 2018-[debug] Fri, 14:24:35 2018-[debug] connecting via SSH from [email protected] (192. 168.142.49:8822)to [email protected] (192.168.142.48:8822): Fri may 14:24:35 2018-[debug] OK. Fri 14:24:35 2018-[debug] connecting via SSH from [email protected] (192.168.142.49:8822) to [Email pro Tected] (192.168.142.50:8822): Fri may 14:24:37 2018-[debug] OK. Fri may 14:24:39 2018-[debug] Fri, 14:24:35 2018-[debug] connecting via SSH from [email protected] (192. 168.142.50:8822) to [email protected] (192.168.142.48:8822): Fri may 14:24:37 2018-[debug] OK. Fri 14:24:37 2018-[debug] connecting via SSH from [email protected] (192.168.142.50:8822) to [Email pro Tected] (192.168.142.49:8822): Fri may 14:24:39 2018-[debug] OK. Fri 14:24:39 2018-[INFO] All SSH connection tests passed successfully.
Test master-slave replication in a cluster
[[email protected] scripts]# masterha_check_repl--conf=/data/mha/app1.cnf ...... 192.168.142.48 (192.168.142.48:5700) (current master) +--192.168.142.49 (192.168.142.49:5700) +--192.168.142.50 ( 192.168.142.50:5700) Fri 14:25:57 2018-[INFO] Checking replication Health on 192.168.142.49..Fri 25 14:25:57 2 018-[INFO] OK. Fri 14:25:57 2018-[INFO] Checking replication Health on 192.168.142.50..Fri may 14:25:57 2018-[info] OK. Fri may 14:25:57 2018-[INFO] Checking master_ip_failover_script STATUS:FRI * * 14:25:57 2018-[INFO]/data/mha/ Scripts/master_ip_failover--command=status--ssh_user=root--orig_master_host=192.168.142.48--orig_master_ip= 192.168.142.48--orig_master_port=5700--orig_master_ssh_port=8822unknown option:orig_master_ssh_portin SCRIPT TEST ====systemctl Stop keepalived==systemctl start keepalived===checking The Status of the script. OK Fri may 14:25:57 2018-[info] OK. Fri may 14:25:57 2018-[WARNING] Shutdown_scriptis not defined. Fri 14:25:57 2018-[INFO] Got exit code 0 (not master dead). MySQL Replication Health is OK.
?
8. Start the MHA management node
Start command
nohup masterha_manager --conf=/data/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/mha/app1/manager.log 2>&1 &参数含义具体可参照官网说明 https://github.com/yoshinorim/mha4mysql-manager/wiki/masterha_manager,此处简单说明--remove_dead_master_conf,发生failover后,MHA会自动从配置文件里移除dead master的相关信息--ignore_last_failover,默认情况下,之前如果存在failover,那么再次启动MHA是不成功的,必须删除对应目录下的failover error文件, (manager_workdir)/(app_name).failover.error;设置此参数,就可以忽略上次是否错误而可以继续进行failover
Close command
masterha_stop --conf=/data/mha/app1.cnf
View status
masterha_check_status --conf=/data/mha/app1.cnf
?
9. Online manual Switching process
1, first need to ensure that MHA manager is turned off
masterha_stop --conf=/data/mha/app1.cnf
2. Manually Switch Master Master
masterha_master_switch --conf=/data/mha/app1.cnf --master_state=alive --new_master_host=192.168.142.48 --new_master_port=5700 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0参数解释:new_master_host:指定哪台成为新的主库new_master_port:指定对应的数据库端口orig_master_is_new_slave:将原来的主库变为从库running_updates_limit:指定复制延迟在10000s内的都可切换interactive:表示不需要人工干预,自动执行
3. Switching process
1, check the current configuration information and the master-slave server information, including read MHA configuration file and check the current slave health status 2, prevent the current master update mainly by the following steps: 1> wait 1.5s ( $time _until_kill_threads*100ms), waits for the current connection to disconnect 2> execution Read_only=1, prevents the new DML operation 3> waiting for 0.5s, waits for the current DML operation to complete 4> kill off all connections 5> flush No_write_to_binlog TABLES 6> flush TABLES with READ LOCK3, waiting for new Master executes all of the relay log and logs the corresponding log-point waiting to execute all relay logs on 192.168.142.49 (192.168.142.49:5700) after execution. . Getting new master ' s binlog name and position. mysql-bin.000001:24884, set the read_only of the new master to off and add VIP5, slave switch to the new Master 1, wait for slave to apply the relay log generated from the copy, and then execute Chan The GE Master operation switches to the new master on the waiting to execute all relay logs on 192.168.142.50 (192.168.142.50:5700): resetting slave 192.168.142.50 (192.168.142.50:5700) and starting replication from the new master 192.168.142.49 ( 192.168.142.49:5700): 2, Release the lock on the original master unlocking all tables on thE orig master:executing UNLOCK TABLES. 3, because the Masterha_master_switch command line with the--orig_master_is_new_slave parameter, the original master is also switched to the new master from starting orig master as a Slave. resetting slave 192.168.142.48 (192.168.142.48:5700) and starting replication from the new master 192.168.142.49 ( 192.168.142.49:5700).. 6, clean up the new master of the relevant information 192.168.142.49:resetting slave info succeeded
??
10. Automatic failover process
1, the first need to ensure that the MHA manager in the running state
nohup masterha_manager --conf=/data/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/mha/app1/manager.log 2>&1 &
2. When the master master is down, the new master master is automatically detected and selected
3, the process of switching
When the Master_manager monitoring to the main library mysqld service is stopped, first SSH login check (save_binary_logs--command=test) for the main library, and then the MYSQLD service is checked (PING (SELECT) Check every 3 seconds, lasts 3 times), parameter secondary_check_script can be used for double check, finally make Master is down! judgment, master failover start 1, Based on the configuration file to detect which servers are in the current replication environment, MHA also checks such as replication exceptions and whether there are some different main libraries from the library, starting failover (excluding the last failover failure or failover time interval is too short)? 2. Quarantine Master Server, disable the VIP of the failed main library (provided you need to specify the relevant script, for example: if there is a master_ip_failover_script will call the script to stop the VIP, if there is Shutdown_ Script script to call scripts to close master to avoid brain crack, specifically in the configuration file app1.cnf)?? 3, elect the new main library and try to fill in the new main library data 1, get the sync location from the top of the library: compare all the Master_log_file and Read_master_log_pos locations from the library to find out the location of the latest and oldest fault main library corresponding to the Binlog location 2. Save Dead Master's Binlog: Execute save_binary_logs command on fault main library to get slave difference between lastest Binlog sync location and master (using 3.1 steps to find the most forward from library Binlog location of synchronization If the failure of the main library system is not hanging) and SCP to MHA Manager server on SCP from [Email protected]:/data/mha/mha/tmp/saved_master_binlog_ From_192.168.142.48_5700_20180525155119.binlog to Local:/data/mha/mha/app1/saved_master_binlog_from_ 192.168.142.48_5700_20180525155119.binlog succeeded. 3. Determine and decide the new main libraryDetermine the new Main library: Use the command first apply_diff_relay_logs--command= Find the synchronization position identified in the previous 3.1 steps of the most forward and the corresponding main library of the Binlog position as a parameter, in the synchronization location of the most forward from the library to execute this command in the secondary log to find the Binlog between the location of the relay Log and generate files are used to recover other from the library (this is to check the top of the synchronization from the library to see if there is a trunk log from the oldest location, which is why the MHA environment has performed the trunk log can not be deleted, otherwise this comparison is more troublesome) and then find and decide the new main library, based on the configuration to choose how to New Main Library (check if you have set candidate_master=1 and no_master=1, if you have set candidate Main library, then the candidate main library is not necessarily the slave with the latest data, so we need to compare it with other libraries. Of course, if the candidate main library happens to be the top-most slave library in the synchronization location, it does not need to be compared with other relay logs from the library, if the candidate main library is not set, then the synchronization position is the most forward from the library. MHA Manager server will also replicate the previously replicated diff Binlog to the new Main Library 4, the new Main Library application log (if any errors occur from this stage and need to be manually restored) the new main library needs to be compared first Master_log_fi Le=relay_master_log_file,read_master_log_pos=exec_master_log_pos confirm that you have performed the replication, and if the new main library is not the one from the top of the synchronization location, you need to use Apply_ The Diff_relay_logs--command=generate_and_send command compares the relay log between itself and the top of the sync location from the library to see if there is a difference, and if so, generates a difference relay Log (if the new main library is the one from the top of the sync location, you only need to perform the MHA Manager server's diff log) and then use the two differential logs to recover the data (apply_diff_relay_logs--command= Apply command). After the restore is complete, get the Binlog location and generate the Change Master statement ready for the other from the library change master to the new Main library, and set the read_only=0. Then bind the VIP to the new Main library. To this step the new main library switch is complete? 4, other recovery from the library: make the other from the library data as full as possible (allParallel execution from the library uses the Apply_diff_relay_logs--command=generate_and_send command to determine the relay log differences between the relay log position and the top of the synchronization location from the library, And the difference file from the synchronization location of the most forward from the library sent to the corresponding each from the library parallel using two differential logs for recovery: MHA Manager Server Binlog differential copy to the individual from the library, and then each from the library through the Master_log_file=relay_ The Master_log_file,read_master_log_pos=exec_master_log_pos first confirms that it has performed the replication and then applies two diff log recovery data. Finally, perform reset slave and re-chang master to the new Main library?? 5, clean up the information of the new master, here to switch the main library to the new main library complete resetting slave info on the new master.
?
11. After the failure of the host new to join the resource group
The first of these methods
1、vim /data/mha/app1.cnf 添加server1主机信息[server1]candidate_master=1client_bindir=/usr/local/mysql-5.7.18/bin/client_libdir=/usr/local/mysql-5.7.18/lib/hostname=192.168.142.48port=57002、将server1指向当前的master充当从的角色CHANGE MASTER TO MASTER_HOST=‘192.168.142.49‘,MASTER_USER=‘repl‘,MASTER_PASSWORD=‘123456‘,MASTER_PORT=5700,MASTER_LOG_FILE=‘mysql-bin.000001‘,MASTER_LOG_POS=2488;3、启动mha managernohup masterha_manager --conf=/data/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/mha/app1/manager.log 2>&1 &
?
The second method of
1、通过masterha_conf_host将server1主机信息添加进资源组masterha_conf_host --command=add --conf=/data/mha/app1.cnf --block=server1 --hostname=192.168.142.48 --params="candidate_master=1;client_bindir=/usr/local/mysql-5.7.18/bin/;client_libdir=/usr/local/mysql-5.7.18/lib/;port=5700"参数解释:command:添加或者删除一个主机信息到配置文件conf:配置文件的路径hostname:主机信息ipblock:新添加的块名params:额外参数列表,(key1=value1;key2=value2;...)2、将server1指向当前的master充当从的角色CHANGE MASTER TO MASTER_HOST=‘192.168.142.49‘,MASTER_USER=‘repl‘,MASTER_PASSWORD=‘123456‘,MASTER_PORT=5700,MASTER_LOG_FILE=‘mysql-bin.000001‘,MASTER_LOG_POS=2488;3、启动mha managernohup masterha_manager --conf=/data/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/mha/app1/manager.log 2>&1 &
?
MHA Cleanup Relay log (purge_relay_logs)
1. Description:
MySQL数据库主从复制在缺省情况下从库的relay logs会在SQL线程执行完毕后被自动删除但是对于MHA场景下,对于某些滞后从库的恢复依赖于其他从库的relay log,因此采取禁用自动删除功能以及定期清理的办法对于清理过多过大的relay log需要注意引起的复制延迟资源开销等。MHA可通过purge_relay_logs脚本及配合cronjob来完成此项任务
2, the function of Purge_relay_logs
a、为relay日志创建硬链接(最小化批量删除大文件导致的性能问题)b、SET GLOBAL relay_log_purge=1; FLUSH LOGS; SET GLOBAL relay_log_purge=0;c、删除relay log(rm –f /path/to/archive_dir/*)
3, the use of purge_relay_logs and related parameters
###用法# purge_relay_logs --helpUsage: purge_relay_logs --user=root --password=rootpass --host=127.0.0.1###参数描述--user:mysql用户名,缺省为root--password:mysql密码--port:端口号--host:主机名,缺省为127.0.0.1--workdir:指定创建relay log的硬链接的位置,默认是/var/tmp,成功执行脚本后,硬链接的中继日志文件被删除由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,建议指定为relay log相同的分区--disable_relay_log_purge:默认情况下,参数relay_log_purge=1,脚本不做任何处理,自动退出,设定该参数,脚本会将relay_log_purge设置为0,当清理relay log之后,最后将参数设置为OFF(0)
4, custom cleaning relay log cronjob (each host is set)
purge_relay_logs脚本在不阻塞SQL线程的情况下自动清理relay log。对于不断产生的relay log直接将该脚本部署到crontab以实现按天或按小时定期清理1、编辑脚本,vim /data/mha/mha/purge_relay_logs.sh#!/bin/bashuser=rootpasswd=123456port=5700host=127.0.0.1work_dir=‘/data/mha/mha/data‘purge=‘/bin/purge_relay_logs‘log_dir=‘/data/mha/mha/log‘if [ ! -d $log_dir ]then mkdir $log_dir -pfiif [ ! -d $work_dir ]then mkdir $work_dir -pfi$purge --user=$user --password=$passwd --port=$port --host=$host --workdir=$work_dir --disable_relay_log_purge >> $log_dir/purge_relay_logs.log 2>&12、最后添加到计划任务:##每2天凌晨1点清空relay log00 01 * * * /bin/bash /data/mha/mha/purge_relay_logs.sh
Mha+mysql for MySQL high availability