Distributed master selection-use MySQL acid and lease protocols to achieve master selection and high availability

Source: Internet
Author: User

In actual production and development, if multiple nodes coexist, You need to select the master node and implement ha automatic fault tolerance. I have thought about the writing method and shared it with you.

 

  1. Lease protocol, MySQL Acid
  2. High Availability primary selection solution design
  3. Applicable scenarios
  4. JAVA Implementation description
  5. Further Optimization

There are many application scenarios in the system that are similar to the master-slave architecture. The master server (master) provides external services and hot backup from the server (Salve). It does not provide services but is always alive, if the master node goes down or has network problems, slave can take over the external services of the master node and upgrade it from slave to master (new master ). Typical multi-node coexistence,However, only one master can exist at the same time, and the status of all nodes can be maintained in a unified manner..

We must first think of the famous paxos algorithm (http://baike.baidu.com/view/8438269.htm ). To put it simply, paxos uses the voting algorithm of each node to determine one thing. When the excess 1/2 nodes pass the vote, paxos generates a resolution with a unique result, each node is notified to maintain this information. For example, the paxos master election generates a vote for a node to be a master, and then each node gives feedback. Finally, the paxos cluster maintains a unique master. Zookeeper is an implementation of paxos. In this scenario, Zookeeper is the most suitable for choosing a master,However, Zookeeper has an obvious disadvantage. When the number of surviving nodes is less than 1/2 of that of the zookeeper cluster, it cannot work. For example, if ZK has 10 nodes, the available nodes must be greater than 5.

In the actual environment, if the requirements on the master node are not so strict, some improvements and trade-offs can be used to achieve the goal. For example, the master may be temporarily inaccessible at the second level, and there may be some conflicts during the master selection time, but you can select the master again. I designed a simple workaround using MySQL consistency and simple lease.

MySQL acid ensures the consistency and integrity of a data record, and does not cause consistency and unique correctness of multi-process read/write. The lease Protocol (the Protocol details can be Google) sends a lease (lease period) package to the master. The Master will act as the master role during this lease period. If the lease period expires, the master will apply for the lease again, if the lease period is reached, but the network is not affected, the master can take the initiative to deprecate the I, so that other nodes can run for the master. For example, after three nodes A, B, and C are selected as the master node in the first round, a becomes the master node and it obtains the lease of 10 seconds. The current time is assumed to be 00:00:00, so it can use the master position at 00:00:10. When the time reaches 00:00:10, A, B, and C will re-elect the master, and each node may become the master (triggered from the engineering point of view, A is more likely to continue as the master node). If a's network is disconnected at this time, it cannot connect clusters of B and C,Then a will automatically go offline and won't compete, so there will be no split-brain phenomenon.

---------------------------------------------- Gorgeous split line ----------------------------------------------

The design scheme is as follows: (the server represents a machine in the cluster and can also be considered as a process. The servers are equal)

 

  1. Use ntpserver time synchronization between servers (ensure synchronization between servers in seconds)
  2. Each server holds a unique ID (IP + process ID), which uniquely identifies a server instance
  3. Each server defines a lease period in seconds.
  4. The unique record of the MySQL unique table maintains the information of the global master, and acid ensures consistency.
  5. The master server updates the unique record above to MySQL every half of the lease period, updates the heartbeat, and maintains the master status.
  6. The slaver server obtains the master server information from MySQL every half of the lease cycle. If the master lease in the database exceeds the current time (heartbeat_time + lease> current_time), the server applies to be a master.

 

The most difficult problems are:

1. Because of the database access and sleep time (half of the lease time), there is a latency. To handle MySQL exceptions and network exceptions.

2. The master server may be preemptible at the same time. At this time, a verification mechanism is required to ensure that the server that grabs the master will automatically return to slaver.

The following figure shows the instance: (10.0.0.1 is the master)

10.0.0.1 crash. The master information of 10.0.0.1 maintained in MySQL has expired and is preemptible by other nodes.

Each node reads the database again to check whether the database is successfully preemptible:

Then, 10.0.0.3 serves as the master. If 10.0.0.1 is restarted, it can be used as a server Load balancer instance. If the heartbeat cannot be maintained due to network differentiation or network exceptions in 10.0.0.1, the service is automatically stopped when the lease exceeds itself, and the "dual master" phenomenon does not occur.

Each server follows the following process:

Database Design:

Master information in the database at a certain time point:

 

 

Current Time: 45 minutes and 15 seconds

Current master lease: 6 seconds

Current master lease available to: 45 minutes and 21 seconds

---------------------------------------------- Gorgeous split line ----------------------------------------------

3. Applicable scenarios

IMySQL can be used in the life cycle, and time synchronization between various servers is enabled.

2. You need to select a unique master in the cluster to provide external services. Other nodes act as slaver for standby. When the master lease expires, the competition is for the master node.

3. Compared with zookeeper, Zookeeper can work normally if half of the cluster fails. For example, a cluster has only one master node and one slave node.

4. A system that allows the master election operation to be fault-tolerant in seconds. When the master is selected, there may be a lease/2 s time window, and the service may be unavailable.

 

5. lease is allowed to have dual-master limit within 2 seconds, but the probability is very small.

---------------------------------------------- Gorgeous split line ----------------------------------------------

4. Java implementation description

 

Some configuration information and time-related, sleep period-related time variables
        final long interval = lease / intervalDivisor;        long waitForLeaseChallenging = 0L;         lease = lease / 1000L;        long challengeFailTimes = 0L;         long takeRest = 0L;         long dbExceptionTimes = 0L;         long offlineTime = 0L;         Random rand = new Random();        Status stateMechine = Status.START;        long activeNodeLease = 0L;         long activeNodeTimeStamp = 0L; 

Database exception handling:

 

 

            KeepAlive keepaliveNode = null;            try {                /* first of all get it from mysql */                keepaliveNode = dbService.accquireAliveNode();                if (stateMechine != Status.START && keepaliveNode==null)                    throw new Exception();                // recount , avoid network shake                dbExceptionTimes = 0L;            } catch (Exception e) {                log.fatal("[Scanner] Database Exception with times : " + dbExceptionTimes++);                if (stateMechine == Status.OFFLINE) {                    log.warn("[Scanner] Database Exception , OFFLINE ");                } else if (dbExceptionTimes >= 3) {                    log.fatal("[Scanner] Database Exception , Node Offline Mode Active , uniqueid : " + uniqueID);                    stateMechine = Status.OFFLINE;                    dbExceptionTimes = 0L;                    offlineTime = System.currentTimeMillis();                    online = false;                } else                    continue;            }

Overall cycle and state machine changes:

 

 

        while (true) {            SqlSession session = dbConnecction.openSession();            ActionScanMapper dbService = session.getMapper(ActionScanMapper.class);            KeepAlive keepaliveNode = null;            try {                /* first of all get it from mysql */                keepaliveNode = dbService.accquireAliveNode();                if (stateMechine != Status.START && keepaliveNode==null)                    throw new Exception();                // recount , avoid network shake                dbExceptionTimes = 0L;            } catch (Exception e) {                log.fatal("[Scanner] Database Exception with times : " + dbExceptionTimes++);                if (stateMechine == Status.OFFLINE) {                    log.warn("[Scanner] Database Exception , OFFLINE ");                } else if (dbExceptionTimes >= 3) {                    log.fatal("[Scanner] Database Exception , Node Offline Mode Active , uniqueid : " + uniqueID);                    stateMechine = Status.OFFLINE;                    dbExceptionTimes = 0L;                    offlineTime = System.currentTimeMillis();                    online = false;                } else                    continue;            }            try {                activeNodeLease = keepaliveNode!=null ? keepaliveNode.getLease() : activeNodeLease;                activeNodeTimeStamp = keepaliveNode!=null ? keepaliveNode.getTimestamp() : activeNodeTimeStamp;                takeRest = interval;                switch (stateMechine) {                    case START:                        if (keepaliveNode == null) {                            log.fatal("[START] Accquire node is null , ignore ");                            // if no node register here , we challenge it                            stateMechine = Status.CHALLENGE_REGISTER;                            takeRest = 0;                        } else {                            // check the lease , wether myself or others                             if (activeNodeLease < timestampGap(activeNodeTimeStamp)) {                                log.warn("[START] Lease Timeout scanner for uniqueid : " + uniqueID + ", timeout : "                                            + timestampGap(activeNodeTimeStamp));                                if (keepaliveNode.getStatus().equals(STAT_CHALLENGE))                                    stateMechine = Status.HEARTBEAT;                                else {                                    stateMechine = Status.CHALLENGE_MASTER;                                    takeRest = 0;                                }                            } else if (keepaliveNode.getUniqueID().equals(uniqueID)) {                                // I'am restart                                log.info("[START] Restart Scanner for uniqueid : " + uniqueID                                                + ", timeout : " + timestampGap(activeNodeTimeStamp)); stateMechine = Status.HEARTBEAT;                            } else {                                log.info("[START] Already Exist Keepalive Node with uniqueid : " + uniqueID);                                stateMechine = Status.HEARTBEAT;                            }                        }                        break;                    case HEARTBEAT:                        /* uniqueID == keepaliveNode.uniqueID */                        if (keepaliveNode.getUniqueID().equals(uniqueID)) {                            if (activeNodeLease < timestampGap(activeNodeTimeStamp)) {                                // we should challenge now , without nessesary to checkout Status[CHALLENGE]                                log.warn("[HEARTBEAT] HEART BEAT Lease is timeout for uniqueid : " + uniqueID                                                + ", time : " + timestampGap(activeNodeTimeStamp));                                stateMechine = Status.CHALLENGE_MASTER;                                takeRest = 0;                                break;                            } else {                                // lease ok , just update mysql keepalive status                                dbService.updateAliveNode(keepaliveNode.setLease(lease));                                online = true;                                log.info("[HEARTBEAT] update equaled keepalive node , uniqueid : " + uniqueID                                        + ", lease : " + lease + "s, remain_usable : " +                                        ((activeNodeTimeStamp * 1000L + lease * 1000L) - System.currentTimeMillis()) + " ms");                            }                        } else {                            /* It's others , let's check lease */                            if (activeNodeLease < timestampGap(activeNodeTimeStamp)) {                                if (keepaliveNode.getStatus().equals(STAT_CHALLENGE)) {                                    waitForLeaseChallenging = (long) (activeNodeLease * awaitFactor);                                    if ((waitForLeaseChallenging) < timestampGap(activeNodeTimeStamp)) {                                        log.info("[HEARTBEAT] Lease Expired , Diff[" + timestampGap(activeNodeTimeStamp) + "] , Lease[" + activeNodeLease + "]");                                        stateMechine = Status.CHALLENGE_MASTER;                                        takeRest = 0;                                    } else {                                        log.info("[HEARTBEAT] Other Node Challenging , We wait for a moment ...");                                    }                                } else {                                    log.info("[HEARTBEAT] Lease Expired , Diff[" + timestampGap(activeNodeTimeStamp) + "] , lease[" + activeNodeLease + "]");                                    stateMechine = Status.CHALLENGE_MASTER;                                    takeRest = 0;                                }                            } else {                                online = false;                                log.info("[HEARTBEAT] Exist Active Node On The Way with uniqueid : "                                                + keepaliveNode.getUniqueID() + ", lease : " + keepaliveNode.getLease());                            }                        }                        break;                    case CHALLENGE_MASTER:                        dbService.challengeAliveNode(new KeepAlive().setUniqueID(uniqueID).setLease(lease));                        online = false;                        // wait for the expired node offline automatic                        // and others also have changce to challengetakeRest = activeNodeLease;                        stateMechine = Status.CHALLENGE_COMPLETE;                        log.info("[CHALLENGE_MASTER] Other Node is timeout["                                        + timestampGap(activeNodeTimeStamp) + "s] , I challenge with uniqueid : " + uniqueID                                        + ", lease : " + lease + ", wait : " + lease);                        break;                    case CHALLENGE_REGISTER:                        dbService.registerNewNode(new KeepAlive().setUniqueID(uniqueID).setLease(lease));                        online = false;                        // wait for the expired node offline automatic                         // and others also have changce to challenge                        takeRest = activeNodeLease;                        stateMechine = Status.CHALLENGE_COMPLETE;                        log.info("[CHALLENGE_REGISTER] Regiter Keepalive uniqueid : " + uniqueID + ", lease : " + lease);                        break;                    case CHALLENGE_COMPLETE :                        if (keepaliveNode.getUniqueID().equals(uniqueID)) {                            dbService.updateAliveNode(keepaliveNode.setLease(lease));                            online = true;                            log.info("[CHALLENGE_COMPLETE] I Will be the Master uniqueid : " + uniqueID);                            // make the uptime correct                            stateMechine = Status.HEARTBEAT;                        } else {                            online = false;                            log.warn("[CHALLENGE_COMPLETE] So unlucky , Challenge Failed By Other Node with uniqueid : " + keepaliveNode.getUniqueID());                            if (challengeFailTimes++ >= (rand.nextLong() % maxChallenge) + minChallenge) {                                // need't challenge anymore in a long time                                takeRest=maxChallengeAwaitInterval;                                stateMechine = Status.HEARTBEAT;                                challengeFailTimes = 0L;                                log.info("[CHALLENGE_COMPLETE] Challenge Try Times Used Up , let's take a long rest !");                            } else {stateMechine = Status.HEARTBEAT;                                log.info("[CHALLENGE_COMPLETE] Challenge Times : " + challengeFailTimes + ", Never Give Up , to[" + stateMechine + "]");                            }                        }                        break;                    case OFFLINE :                        log.fatal("[Scanner] Offline Mode Node with uniqueid : " + uniqueID);                        if (System.currentTimeMillis() - offlineTime >= maxOfflineFrozen) {                            // I am relive forcely                            log.info("[Scanner] I am relive to activie node  , uniqueid : " + uniqueID);                            stateMechine = Status.HEARTBEAT;                            offlineTime = 0L;                        } else if (keepaliveNode != null) {                            // db is reconnected                            stateMechine = Status.HEARTBEAT;                            offlineTime = 0L;                            log.info("[Scanner] I am relive to activie node  , uniqueid : " + uniqueID);                        }                        break;                    default :                        System.exit(0);                }                session.commit();                session.close();                if (takeRest != 0)                    Thread.sleep(takeRest);                log.info("[Scanner] State Stage [" + stateMechine + "]");            } catch (InterruptedException e) {                log.fatal("[System] Thread InterruptedException : " + e.getMessage());            } finally {                log.info("[Scanner] UniqueID : " + uniqueID + ", Mode : " + (online?"online":"offline"));            }        }    }    enum Status {        START, HEARTBEAT, CHALLENGE_MASTER, CHALLENGE_REGISTER, CHALLENGE_COMPLETE, OFFLINE    }

 

 

5 ,
Further Optimization1. When each system competes for the master node, the conflict probability may be high due to too many nodes. You can add the Status field to the database to identify whether other nodes are competing for the master node, if yes, you can pause for a moment and try again. If the node successfully grabs the master, it will save the probability of many node conflicts.
2. Due to the extreme situation, because the time for competing master nodes and the lease time are fixed, "timeline resonance" may occur, the most typical example is to keep competing with the master but keep failing, and then keep trying again. All servers are catching up with the same thing at the same time. You can solve the problem by adding time randomness. For example, if you attempt to seize the master and fail continuously, the random number is generated through random and then sleep to offset the resonance.

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.