0. Introduction
As a member management layer (membership layer) in Ha scheme, Corosync is responsible for the functions of cluster member management, communication mode (unicast, broadcast, multicast), pacemaker as the CRM layer. In the practice of using Corosync +pacemaker, the problemof brain fissure is encountered. What is brain fissure: in HA cluster, the network communication between nodes through the heartbeat line, once the heartbeat network is abnormal. Causes the members to each other, each as a DC in the cluster, so that the resources at the same time in the main, standby two node start. is Corosync or pacemaker caused by brain fissure? At first I thought it was Corosync because the heartbeat caused corosync not to communicate properly. Later found in the Pacemaker official website has found the brain fissure (split-brain) program. Pacemaker as a CRM, the main responsibility is management resources, there is a role to choose Leader.
1. The programme
A workaround is given in [1] (http://drbd.linbit.com/users-guide-emb/s-configure-split-brain-behavior.html). The other approach discussed in this article is to configure preemption resources for pacemaker. The principle is that pacemaker can define the order in which resources are executed. If the exclusive resource is placed first, the subsequent resource's start-up relies on it, as well as exclusive resources, and also exclusive resources. When the heartbeat network fails, who first preemption to the resource, the node takes over the service resources and provides services. This solution must solve two problems, one is to define a preemption resource, and the other is to customize pacemaker RA to rob Resources.
2. Defining preemption Resources
This article uses mutexes to implement exclusive resources. Specifically, Python implements a simple Web service that provides lock,unlock,updatelock services.
__author__ = ' ZHANGTIANJIONG629 ' Import basehttpserverimport threadingimport timelock_ Timeout_seconds = 8lock = threading. Lock () lock_client_ip = "" Lock_time = 0class lockservice ( Basehttpserver.basehttprequesthandler): def do_get (self): ' Define url route ' pass def lock (SELF, CLIENT_IP): global lock_client_ip global lock_time # if lock is free if lock.acquire (): Lock_client_ip = client_ip &nbSp;lock_time = time.time () Self.send_response (200, ' OK ') self.close_connection return # if current client hold lock,updte lock time elif lock_client_ip == client_ip: lock_time = time.time () self.send_ Response (200, ' ok,update ') self.close_connection return else: # lock timeout,grab Lock if time.time () - lock_time > lock_timeout_seconds: lock_client_ip = client_ip; lock_time = time.time () self.send_response (200, ' Ok,grab lock ') self.close_connection return else: &nbSp; self.send_response (403, ' Lock is hold by other ') self.close_connection def update_lock (SELF, CLIENT_IP): global lock_client_ip global lock_time if lock_client_ip = = client_ip: lock_time = time.time () self.send_response ( 200, ' ok,update ') self.close_ connection return else:&Nbsp; self.send_response (403, ' lock is hold by other ') self.close_connection return def unlock (SELF, CLIENT_IP): global lock_client_ip global lock_time if lock.acquire (): lock.release () Self.send_response (200, ' Ok,unlock ') self.close_connection return elif lock_client_ip == client_ip: lock.release () Lock_time = 0 lock_client_ip = ' self.send_response (200, ' Ok,unlock ') self.close_ connection return else: Self.send_response (403, ' lock is hold by other ') self.close_connection returnif __name__ == ' __main__ ': http_server = basehttpserver.httpserver (' 127.0.0.1 ', ' 88888 '), lockservice) http_server.serve_forever ()
The next article describes a custom RA script.
Corosync+pacemaker Two-node brain fissure problem