Automatic failover for MongoDB replica set (2) election

Source: Internet
Author: User
Tags socket error
Introduction to the high availability of a replica set the replica set achieves high availability through automatic failover. When the master node fails, the slave node can be elected as the master node, in most cases, this process is automatically performed without manual intervention. In some cases, data rollback is required for automatic failover. Deployment method of replica set (copy integration

Introduction to the high availability of a replica set the replica set achieves high availability through automatic failover. When the master node fails, the slave node can be elected as the master node, in most cases, this process is automatically performed without manual intervention. In some cases, data rollback is required for automatic failover. Deployment method of replica set (copy integration

Introduction to the high availability of a replica set
The replica set achieves high availability through automatic failover. When the master node fails, the slave node can be elected as the master node, and this process is automatically performed in most cases, manual intervention is not required. In some cases, data rollback is required for automatic failover.
Deployment of replica sets (number of replica set members, physical factors, such as the geographic location of bandwidth replica set members) may affect the efficiency of automatic failover. To improve the efficiency of automatic failover, we should put most members of the replica set into a core data center for management, and put several slave nodes in the replica set. When the master node fails, it not only ensures that available slave nodes can be used, but also prevents communication by most replica integrators due to network faults.
Automatic failover involves two main processes: Election and rollback.
The election plays a very important role in the replica set operation. The election process takes time. During the election process, the replica set does not have a master node, the entire replica set cannot accept client requests, so MongoDB tries its best to avoid election. Under what circumstances will the election be conducted? Election: when the master node is unavailable; when the replica set is initialized.
Ii. Factors affecting elections
1. heartbeats heartbeat: The replica set sends a heartbeat request packet every two seconds ), if you do not receive a heartbeat packet reply from the other member within 10 seconds, the heartbeat packet is considered faulty.

2. The Priority of members of Priority Comparisons will affect the election process. Members will give Priority to their votes to members with higher Priority. Members with zero priority will not be elected as the master node because they are not elected and will not be voted. As long as the current master node has the highest priority and its operation log entries are up-to-date, no election will be conducted. At this time, if a Member with a higher priority joins and the latest operation time recorded in the operation log is less than 10 seconds different from the time recorded in the current master node, then the election will be conducted, it provides a master node opportunity for high-priority nodes.
3. The Optime refers to the time when a Member executes the operation log for the last time. If a member has the latest operation time (most recent) than other members, the member may be elected as the master node.
4. A member of the Connections replica set can become the master node only when it is connected to the majority of Members. Otherwise, it will be automatically downgraded to the slave node.
For example, for a replica set with three members, each member can vote. If only two of them can be connected, the replica set can be elected once. If two of the nodes become unavailable and the remaining node is a slave node, it is still a slave node because it cannot be connected to other members. In the same way, if the node is the master node, it will be automatically downgraded to the slave node because it cannot be connected to other Members and loses the qualification to become the master node. in this way, the replica set becomes read-only.
5. Due to Network Partitions, if a master node is unavailable and other slave nodes cannot be connected, the replica set cannot be elected. So try to put most of the replica set members in one data center, and put a small part in another.
Three election process 1. Heartbeat detection suppose we have three replica set: X, Y, and Z nodes. In the replica sets structure, each of the three nodes sends a heartbeat detection request to the other two nodes every 2 seconds. For example, if X sends a heartbeat detection request to Y and Z nodes, Y and Z normally return a response packet containing their own information, the reply package mainly includes the following information: what role are they (primary or secondary), whether they are eligible to become primary, and their current timestamp. After receiving the reply packet, X node will update its status ing table with this information. The updated content includes: whether new nodes are added or old nodes are down, the request's network transmission time. When the ing table of the X node changes, X will make the following logical judgment: If X is a primary, and the other node fails, then, it will confirm whether it can still communicate with the majority of nodes in the cluster. If it cannot communicate with the majority of nodes, it will downgrade itself from primary to secondary. (Note: In replica sets, primary must be able to communicate with the majority of nodes in the cluster to avoid the formation of two or more nodes in different groups due to network disconnection, this will affect data consistency .)
2. About downgrade

Some problems may occur when the node is downgraded from primary to secondary. In MongoDB, write operations are performed in the fire-and-forget mode by default, which means that write operations are usually not successful or not, after the request is sent, the client determines that the request is successful. However, if primary is downgraded at this time, the client does not know that primary has been downgraded to secondary, and the client may send subsequent write operations to this node. At this time, the secondary that has just been downgraded can send a package saying "I am not a primary", but as we mentioned above, the client simply ignores your package. Therefore, the client does not know that the write operation has failed. For this problem, you may say, "every time we use secure writing, it will be okay." (secure writing means that the client considers the write successful only after the server returns a successful result ), but obviously, this is very unreliable. Therefore, after a primary is downgraded to secondary, it will close all the original connections. In this way, the client will encounter a socket error during the next write. After the client finds this error, it will obtain the new primary address from the cluster again and write subsequent write operations to the new server.
3. Election

Let's look back at the heartbeat monitoring request: If X is a secondary, X regularly checks whether to elect itself as a primary, even if its State ing table has not changed. Its detection content includes: Do other nodes in the cluster think they are primary? Is the X node primary itself? Are X Nodes eligible to become primary themselves? If either of the three questions is no, the X node will not try to convert itself into a primary. (That is, only when the X node is a secondary that can be primary and other nodes are not primary will X initiate an election and select itself as primary)

When X finds that a primary is needed and can be filled by itself, it will initiate a round of election: X node will initiate a request packet to each of the Y and Z nodes, tell them, "I think I can take over the role of primary. What do you think? When Y and Z receive the above request packet, they will perform the following checks: do they know that there is a primary in the cluster? Are their own data newer than X nodes? Is there any other node with more data than the X node? If any of the above conditions is met, they will think that X is not qualified as primary and they will send a return packet to inform X that "the election is stopped! ". If none of the three conditions are true, that is to say, they believe that there is indeed no primary in the current cluster, and X Data is the latest, then they will send a response packet to inform X that "no problem".
If X receives "Stop election! ", Then he will immediately stop the election and keep himself in sencondary state. If X returns "no problem" to all other nodes, it enters the second stage of the election process.
In Phase 2, X will send a package to other nodes, saying, "I have declared that I am a primary." At this time, the Y and Z nodes will make some final confirmation: whether all the conditions judged above indicate that X can be used as the primary. If so, then they will vote in favor of X in this round of primary election. After they vote for the vote, no other voting decisions will be made within 30 seconds. The above indicates that if the second confirmation is still successful, what if the final confirmation fails. They will vote for a vote and oppose X as the primary. If any vote is generated, this round of election will fail. X still maintains the secondary identity. Let's assume that if Y votes in favor of X and Z votes against X. At that time, due to Y's vote, it won't be able to vote in 30 seconds. So if Z initiates an election at this time to make himself a primary, then Z must get X's favor at this time. At this time, Y cannot vote. To obtain a majority, Z must obtain the correct vote of X. Therefore, the voting rule is as follows: if no one votes against the vote, and the percentage of votes in favor is more than half, the current round of election will become the primary object.

Note: 1. voting events: When the replica set is initialized, the slave node loses contact with the master node and calls on everyone to elect the voting master node for downgrading. node downgrade: when the master node receives the replSetStepDown command, if a slave node is qualified to become the master node, higher priority than the master node when the master node and most slave nodes cannot be connected 3. when the master node becomes unavailable, all connections to the client are automatically closed, so that data consistency between the client and the database can be maintained. 4. Priortity0 members do not trigger the election operation even if they cannot connect to the master node

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.