Corosync+pacemaker Two-node brain fissure problem

Source: Internet
Author: User

0. Introduction

As a member management layer (membership layer) in Ha scheme, Corosync is responsible for the functions of cluster member management, communication mode (unicast, broadcast, multicast), pacemaker as the CRM layer. In the practice of using Corosync +pacemaker, the problemof brain fissure is encountered. What is brain fissure: in HA cluster, the network communication between nodes through the heartbeat line, once the heartbeat network is abnormal. Causes the members to each other, each as a DC in the cluster, so that the resources at the same time in the main, standby two node start. is Corosync or pacemaker caused by brain fissure? At first I thought it was Corosync because the heartbeat caused corosync not to communicate properly. Later found in the Pacemaker official website has found the brain fissure (split-brain) program. Pacemaker as a CRM, the main responsibility is management resources, there is a role to choose Leader.

1. The programme

A workaround is given in [1] (http://drbd.linbit.com/users-guide-emb/s-configure-split-brain-behavior.html). The other approach discussed in this article is to configure preemption resources for pacemaker. The principle is that pacemaker can define the order in which resources are executed. If the exclusive resource is placed first, the subsequent resource's start-up relies on it, as well as exclusive resources, and also exclusive resources. When the heartbeat network fails, who first preemption to the resource, the node takes over the service resources and provides services. This solution must solve two problems, one is to define a preemption resource, and the other is to customize pacemaker RA to rob Resources.

2. Defining preemption Resources

This article uses mutexes to implement exclusive resources. Specifically, Python implements a simple Web service that provides lock,unlock,updatelock services.

__author__ =  ' ZHANGTIANJIONG629 ' Import basehttpserverimport threadingimport timelock_ Timeout_seconds = 8lock = threading. Lock () lock_client_ip =  "" Lock_time = 0class lockservice ( Basehttpserver.basehttprequesthandler):     def do_get (self):          ' Define url route '         pass     def lock (SELF, CLIENT_IP):         global lock_client_ip        global lock_time         # if lock is free         if lock.acquire ():             Lock_client_ip = client_ip           &nbSp;lock_time = time.time ()              Self.send_response (200,  ' OK ')              self.close_connection            return             # if current client hold  lock,updte lock time        elif lock_client_ip  == client_ip:            lock_time  = time.time ()             self.send_ Response (200,  ' ok,update ')              self.close_connection            return         else:            # lock timeout,grab  Lock            if time.time ()  -  lock_time > lock_timeout_seconds:                 lock_client_ip = client_ip;                 lock_time = time.time ()                  self.send_response (200,   ' Ok,grab lock ')                  self.close_connection                 return            else:      &nbSp;          self.send_response (403,  ' Lock is  hold by other ')                  self.close_connection    def update_lock (SELF, CLIENT_IP):         global lock_client_ip         global lock_time        if lock_client_ip = = client_ip:            lock_time =  time.time ()             self.send_response ( 200,  ' ok,update ')             self.close_ connection            return         else:&Nbsp;           self.send_response (403,  ' lock  is hold by other ')              self.close_connection            return     def unlock (SELF, CLIENT_IP):         global lock_client_ip        global lock_time         if lock.acquire ():             lock.release ()              Self.send_response (200,  ' Ok,unlock ')              self.close_connection            return         elif lock_client_ip == client_ip:             lock.release ()              Lock_time = 0            lock_client_ip  =  '             self.send_response (200,   ' Ok,unlock ')             self.close_ connection            return         else:             Self.send_response (403,  ' lock is hold by other ')              self.close_connection             returnif __name__ ==  ' __main__ ':     http_server = basehttpserver.httpserver (' 127.0.0.1 ',  ' 88888 '),  lockservice)     http_server.serve_forever ()

The next article describes a custom RA script.

Corosync+pacemaker Two-node brain fissure problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.