How to handle keepalive deficiency and keepalive deficiency?
Keepalived high-availability monitoring script for MySQL (or other services)
Script Development Requirements: We know that keepalive is based on the survival of the virtual ip to determine whether to seize the master mechanism, but if we do the high availability of the keepalived MySQL, it is important to consider a situation where, if the machine Nic is not disconnected, the service is closed due to the unstable MySQL service or human error, the possible result is that keepalive does not switch because the service is stopped instead of because the virtual ip address of the master node does not exist. If you do not switch to backup in time, as you can imagine, I would like to share with you a script for monitoring the keepalive master service.
Implemented Functions: When the service of the master machine (the machine where the vip is located) is down, the keepalive Service of the master machine is disabled, so that the vip can be switched to the slave machine, so that the service remains available. As you can see, no matter what service is highly available with keepalive, as long as you use the listening port number as the monitoring judgment object, this is actually a very wide range of scripts, I hope you can write General scripts as much as possible to improve your thinking skills! If there is something bad, please correct it.
1 #! /Bin/sh 2 # author feifei 3 # date 20161212 4 # email eeeee@qq.com 5 # version 1.0 6 # function guard mysql service 7. /etc/init. d/functions 8 # define var 9 # usage (usage) 10 if [$ #-ne 1]; then11 echo "usage: sh $0 {tcp_port} "12 exit 113 fi14 # define var (using netstat is an important basis for us to determine whether the service is alive. It can be based on the service name or port number. We recommend that you use the port number, because the port number is unique, 15 check = 'netstat-lnutp | grep $1 | wc-l' 16 17 # check if there is already have a same monitori Ng exit (daemon script, you must determine whether there is an identical monitor, otherwise it will lead to a waste of resources) 18 count = 'ps-ef | grep "$0 $1" | grep-v "grep" | wc-l' 19 20 if [$ count-gt 2]; then21 echo-e "\ nERROR: There is already have a same monitoring! "22 exit 1 23 fi24 25 function dmail () {26 echo" $1-$ (hostname): down ">/var/log/$1. log27 mail-s "$1-$ (hostname): down" 00000000@qq.com </var/log/$1. log28} 29 30 function umail () {31 echo "$1-switch-success">/var/log/$1. log32 mail-s "$1-switch-success" 00000000@qq.com </var/log/$1. log33} 34 35 # check = 'namp 192.168.1.21 $1 | grep open | wc-l' (you can also use the nmap command to check whether a port number of a host is enabled. functions are similar to nestat and ss) 36 if [$ check-eq 0 ]; Then37 echo "$1 is not listening! Pls input again! "38 exit 139 else40 while true41 do42 if ['netstat-lnutp | grep $1 | wc-l'-eq 0]; then43 echo 044 dmail $1 45/etc/init. d/keepalived stop46 sleep 547 ping-c 2-W 2 192.168.1.22 &>/dev/null48 if [$? -Eq 0]; then49 echo 150 umail $151 break52 else53 echo "$1-swith-failed">/var/log/$1. log54 mail-s "$1-switch-failed" 00000000@qq.com </var/log/$1. log55 break56 fi57 fi58 done59 fi
Simple script description:
1. Running Condition: The currently monitored port number must be in the listening status. If it is not enabled, a prompt will be prompted to start the service first.
2. Implementation principle: the principle of monitoring is to use while loop + if to judge
3. failed handling mechanism: if you have some thoughts, you can try to add some measures after the service is down. For example, you can restart the service. If the restart is successful, you can continue monitoring, disable keepalive if it fails.
4. event Recording: records the success or failure of each process in the corresponding file. If necessary, send an email notification (a good O & M, you must understand the necessary fault records, convenience for future summary)
Ps. It is not a particularly difficult script. It focuses on the sorting process. If there is anything wrong, I hope you can correct it.