In the high-availability MySQL cluster experiment of Corosync + DRBD, It is unexpectedly discovered that each node cannot identify the other node. If the connection is StandAlone, the master and slave nodes cannot communicate, as shown below:
[root@node1 ~]# cat /proc/drbdversion: 8.3.15 (api:88/proto:86-97)GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder17.centos.org, 2013-03-27 16:04:08 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:0 dw:36 dr:285 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:24 [root@node2 ~]# cat /proc/drbdversion: 8.3.15 (api:88/proto:86-97)GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder17.centos.org, 2013-03-27 16:04:08 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8
Check the logs of Node 1 to find split-brain:
[root@node1 ~]# tail -n 12 /var/log/messagesSep 21 11:19:53 lab4 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!Sep 21 11:19:53 lab4 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0Sep 21 11:19:53 lab4 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)Sep 21 11:19:53 lab4 kernel: block drbd0: conn( WFReportParams -> Disconnecting )Sep 21 11:19:53 lab4 kernel: block drbd0: error receiving ReportState, l: 4!Sep 21 11:19:53 lab4 kernel: block drbd0: meta connection shut down by peer.Sep 21 11:19:53 lab4 kernel: block drbd0: asender terminatedSep 21 11:19:53 lab4 kernel: block drbd0: Terminating drbd0_asenderSep 21 11:19:53 lab4 kernel: block drbd0: Connection closedSep 21 11:19:53 lab4 kernel: block drbd0: conn( Disconnecting -> StandAlone )Sep 21 11:19:53 lab4 kernel: block drbd0: receiver terminatedSep 21 11:19:53 lab4 kernel: block drbd0: Terminating drbd0_receiver
The manual recovery process for split-brain in DRBD is as follows:
Set Node1 as the master node and mount the test
[root@node1 ~]# drbdadm primary mydrbd [root@node1 ~]# mount /dev/drbd0 /mydata[root@node1 ~]# ll /mydata/total 20-rw-r--r-- 1 root root 1666 Sep 20 18:18 inittabdrwx------ 2 root root 16384 Sep 20 18:15 lost+found
Set Node2 to slave node and discard resource data
[root@node2 ~]# drbdadm secondary mydrbd[root@node2 ~]# drbdadm -- --discard-my-data connect mydrbd
Manually connect to resources on Node1 master node
[root@node1 ~]# drbdadm connect mydrbd
Finally, check the status of each node and the connection has been restored to normal.
[root@node1 ~]# cat /proc/drbd version: 8.3.15 (api:88/proto:86-97)GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder17.centos.org, 2013-03-27 16:04:08 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:36 nr:0 dw:24 dr:185 al:0 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
[root@node2 ~]# cat /proc/drbdversion: 8.3.15 (api:88/proto:86-97)GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder17.centos.org, 2013-03-27 16:04:08 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:36 dw:36 dr:0 al:0 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Online Help Documentation Reference: (Manual split brain recovery)
Http://www.drbd.org/users-guide/s-resolve-split-brain.html
This article from the "Don't dead birds a Hui" blog, please be sure to keep this source http://phenixikki.blog.51cto.com/7572938/1305253