The failover processing process bitsCN.com after the master database of db0101 is down in 48 nodes groups online
Failover processing after 48 nodes in the online group and db0101 master database are down
When a call is received, db0101 is Down and an error is reported:
(1) Error 500,503,504 Error on the application page
(2) email alert db0201 is down now!
1. Initial Determination of symptoms
Ping ipv222.21.173 and report the unreachle error. call the system administrator and hardware engineers to log on to the physical host and check the fault.
Now, go to the mmm control server to check the situation?
[Nova @ db0203 ~] $ Sudo-u mmmd mmm_control show
# Warning: agent on host db1 is not reachable
Db1 (20.222.21.173) master/HARD_OFFLINE. Roles: reader (20.222.22.57), writer (20.222.22.56)
Db2 (connector 222.22.145) master/ONLINE. Roles: reader (connector 222.22.58)
Master/HARD_OFFLINE, probably caused by hardware medium failure.
2. urgent failover to restore the application
Because the application page reports an error and db0201 is down, you need to perform the failover operation immediately and switch it to db0202 as soon as possible. The following is a manual switch.
[Nova @ db0203 ~] $ Sudo-u mmmd/usr/sbin/mmm_control move_role writer db2
OK: Role 'write' has been moved from 'db1' to 'db2 '. Now you can wait some time and check new roles info!
[Nova @ db0203 ~] $ Sudo-u mmmd mmm_control show
# Warning: agent on host db1 is not reachable
Db1 (20.222.21.173) master/HARD_OFFLINE. Roles: reader (20.222.22.57)
Db2 (connector 222.22.145) master/ONLINE. Roles: reader (connector 222.22.58), writer (connector 222.22.56)
It is OK. You can see that you have switched to db0202, writer has pointed to db0202, and no error is reported on the page. log on to db0202 and execute show full processlist; more than 500 client connections are displayed, indicating that the application has been switched to db0202.
3. are you confused about making a new failover?
What should I do before failover? Do you need to wait? Or can I simply execute failover? This is an online operation. I can't use it for reference. here I am directly executing the failover operation.
Execution time: 18: 45
Run the following command: sudo-u mmmd/usr/sbin/mmm_control move_role writer db2.
After an hour, sa and hard engineer have checked the physical host, which is out of memory. by default, they kill the mysql virtual machine with the largest memory in the war. They adjusted the parameter settings and protection measures (the details are not too well understood)
4. set db1 online
After the db0201 server is started, you need to manually enable replication and manually execute start slave; replication starts data synchronization normally. Check the mmm status again.
[Nova @ db0203 ~] $ Sudo-u mmmd mmm_control show
Db1 (20.222.21.173) master/AWAITING_RECOVERY.Roles: reader (20.222.22.57)
Db2 (connector 222.22.145) master/ONLINE. Roles: reader (connector 222.22.58), writer (connector 222.22.56)
Do not panic when we see this awaiting_recovery. this is because of a media fault. although mmm_control has monitored db1, it does not set db1 to online. we need to determine whether db1 is normal, if it is normal, we can set db1 to online by ourselves, which is also a cautious place for mmm. So after I check db1 and find that replication of db1 is normal, we can set db1 online.
Run the following command: sudo-u mmmd mmm_control set_online db1.
Db1 (20.222.21.173) master/ONLINE. Roles: reader (20.222.22.57), OK, db1 is online
5 Change writer from db2 to db1
Check that db1 and db2 dual master run for a period of time. after about 20 minutes, you can perform the switchover operation. after all, db1 is an ssd and db2 is a common medium.
[Nova @ db0203 ~] $ Date
Thu Sep 5 12:11:02 GMT 2013
[Nova @ db0203 ~] $ Sudo-u mmmd/usr/sbin/mmm_control move_role writer db1
OK: Role 'write' has been moved from 'db2 'to 'db1'. Now you can wait some time and check new roles info!
[Nova @ db0203 ~] $ Sudo-u mmmd mmm_control show
Db1 (20.222.21.173) master/ONLINE. Roles: reader (20.222.22.57), writer (20.222.22.56)
Db2 (connector 222.22.145) master/ONLINE. Roles: reader (connector 222.22.58)
We can see that db1 has become a writer.
BitsCN.com