Principles of MongoDB replica set (I) and principles of mongodb Replication

Source: Internet
Author: User
Tags mongodb driver socket error

Principles of MongoDB replica set (I) and principles of mongodb Replication

References

Official Website: www.mongodb.org

Chinese community: www.mongoing.com

Online Tutorial: https://university.mongodb.com/

Mongodb supports the traditional master-slave architecture. There is no automatic failover function. You must specify the master and slave. We recommend that you use the replica set architecture, which is better maintained and more functional than the replication architecture.

I. Basic Concepts

Replica setIs composed of a groupMongodInstance. One of the nodes is the Primary node, and all write requests are completed on it. Other nodes are slave nodes, which receive and apply the operations uploaded from the master node and ensure that they are consistent with the data set on the master node.

Master node:Receives all write operations from the client. A replica set can have only one master node. Since only one member in a replica set can receive write operations, the replica set provides strict consistency verification for all reads from the master node. The master node records changes to all datasets to support replication.

Slave node:Copy the oplog on the master node and apply these operations to modify its own dataset to ensure that the dataset on the slave node is consistent with that on the master node. Once the master node is unavailable, the replica set will elect a slave node as the new master node. The client performs read operations from the master node by default, but the client can also send read operations to the slave node by specifying the replica set read option. Note that the data obtained during read operations on the slave node may not be the value on the master node.

Voting Node: We can also add an additionalMongodThe instance is used as the voting node. This node does not contain datasets. The voting node only serves to participate in voting during the election process. When the number of members in the replica set is an even number, adding a voting node can prevent the occurrence of a draw. A new master node is elected by the majority of votes. As the voting node only provides the voting function, a dedicated physical machine is not required.

Voting NodeOnly voting is supported. When the master node is downgraded to a slave node, the other slave nodes will be elected as the master node during the election.

Ii. Replica set architecture

The most basic replica set architecture is composed of three members. This architecture provides redundancy and Failover for replica sets. Design the replicaset architecture based on application requirements to avoid unnecessary complexity.

The replica set should contain an odd number of members.

The existence of an odd number of members ensures that the primary node can be normally elected from the replica set. If the replica set has an even number of members, add a voting node to ensure that the number of members is odd.

Use hidden nodes and delayed nodes for Special Needs.

Add hidden nodes or delayed nodes to provide services for special needs, such as backup or report.

Load Balancing in read-based architecture

If the business brings a large number of read requests, we can improve the read capability of the replica set by performing read/write splitting. As the business expands, we can increase redundancy and availability by adding slave nodes to other data centers.

Determines the distribution and functions of members in the replica set.

Node Distribution on physical locations

Having at least one replica set node in other data centers can ensure data security when the primary data center fails.PrioritySet to 0 to prevent the master node from being promoted.

Ensure that a data center has a majority of nodes

When a replica set has nodes in multiple data centers and the network of each data center is isolated, to ensure data replication and transmission, nodes must communicate properly.

During the election, each node must be able to communicate with each other to ensure its majority. To ensure that the replica set nodes can maintain a majority and can normally elect the master node, we need to ensure that a data center has a majority of nodes in the replica set.

Iii. Failover

The replica set recovers from the current unavailability of the master node through election.

Replica set election

Once the current master node is unavailable, the replica set will be elected and a new master node will be elected.

Rollback during failover

A write rollback occurs when the master node returns to the replica set after failover. Rollback only occurs when the write operation on the master node fails and the application on the slave node is resigned. When the master node joins the replica set as a slave node again, it will perform "rollback", and the write operation on the master node will be consistent with that of other Members in the replica set.

MongoDB tries its best to avoid rollback. If a rollback occurs, it is often caused by the network. If the slave node cannot keep up with the write operation throughput on the master node, the rollback impact will be intensified.

If the master node does not resign after the slave node completes the write operation, or the master node is always available or can communicate with most nodes, rollback will not occur.

Select rollback data

When a rollback occurs, the administrator needs to decide whether to restore or ignore the rollback data. MongoDB writes the rolled back data to the database in the form of a BSON file.DbPathFolderRollback/Directory. The name of the rollback data file is based on the following rules:

<Database>. <collection>. <timestamp>. bson

For example, records. accounts.2011-05-09T18-10-04.0.bson

You can set the secure write level of the replica set to ensure that the write operation is applied to the entire replica set to avoid rollback.

Factors Affecting elections

1. Heartbeat Detection

The replica set member performs heartbeat detection every two seconds to other Members in the replica set. If a node is not returned within 10 seconds, it is marked as unavailable.

2. Connection

If a node in the replica set cannot connect to most other nodes, it cannot be promoted to the master node. In the election, the majority refers to the majority vote rather than the majority of nodes. If the replica set is composed of three nodes and all three nodes can vote, as long as two nodes can communicate with each other, the replica set can elect a new master node. If two nodes are unavailable, the remaining nodes are slave nodes because they cannot communicate with most nodes in the replica set. If two slave nodes are unavailable, the remaining master nodes are downgraded to slave nodes.

3. Network isolation

Network isolation affects the structure of the majority of votes in the election. If the master node is unavailable and no votes are displayed in each isolated network, the replica set will not elect a new master node. The replica set will be read-only. To avoid this situation, we need to place most nodes in the primary data center, and a few nodes in other data centers.


Triggering the election

When no primary node is available in the replica set, the election is triggered, for example:

1. initialize the new replica set.

2. A slave node cannot be connected to the master node. When the slave nodes cannot communicate with the master node, the election is triggered.

3. The master node has resigned.

The master node will resign in the following situations:

1. ReceiveReplSetStepDownCommand.

2. An existing slave node is qualified in the election and has a higher priority.

3. When the master node cannot communicate with the majority of nodes in the replica set.

4. In some cases, the election will be triggered when we need to modify some replica set configurations, causing the master node to resign

Note: after the master node resign, it closes all established connections to ensure that the client does not perform write operations from the slave node. This will help the client to obtain the replica set architecture and prevent rollback.

Let's take a look at the election process.

Heartbeat Detection

Suppose we have three replica sets: a, B, and c. In the replica sets structure, each of the three nodes sends a heartbeat detection request to the other two nodes every 2 seconds. For example, node a sends a heartbeat detection request to Node B and node c. Normally, Node B and node c return a response packet containing their own information, the main information contained in the reply packet is: what role are they (primary or secondary), whether they can become primary when necessary, and their current clock time.

After receiving the reply packet, node a updates its status ing table with this information. The updated content includes: whether new nodes are added or old nodes are down, the request's network transmission time.

When the ing table of node a changes, a will make the following logical judgment: If a is a primary and another node fails, it will check whether it can still communicate with the majority of nodes in the cluster. If it cannot communicate with most nodes, it will downgrade itself from primary to secondary. (In replica sets, primary must be able to communicate with the majority of nodes in the cluster, so that two or more node groups are independent from each other in case of network disconnection, which will affect data consistency)

About downgrade

In MongoDB, the write operation is performed in the fire-and-forget mode by default. That is to say, the write operation usually does not care whether the operation is successful or not. After the request is sent, the client determines that the operation is successful. However, if primary is downgraded at this time, the client does not know that primary has been downgraded to secondary, and the client may send subsequent write operations to this node. At this time, the secondary that has just been downgraded can send a package saying "I am not a primary", but as we mentioned above, the client simply ignores your package. Therefore, the client does not know that the write operation has failed.

MongoDB developers have considered this issue. The solution is that after a primary is downgraded to secondary, it will close all the original connections. In this way, the client will encounter a socket error during the next write. After the client finds this error, it will obtain the new primary address from the cluster again and write subsequent write operations to the new server.

Election

Let's look at the heartbeat monitoring request: If a is a secondary, a regularly checks whether to elect itself as a primary. Its detection content includes:

1. Do other nodes in the cluster think they are primary?

2. Does node a itself already have primary?

3. Is node a eligible to become a primary?

If either of the three questions is correct, node a will not try to convert itself into a primary. (That is, only when node a is a secondary that can be primary and other nodes are not primary will a initiate election and select itself as primary)

When a finds that a primary needs to be ready, it initiates a round of election: node a initiates a request packet to each node B and c, tell them, "I think I can take over the role of primary. What do you think? "

When B and c receive the preceding request packet, they perform the following checks:

1. Have they supported a primary in the cluster?

2. are their own data newer than node?

3. Is there any other node whose data is newer than node?

If any of the above conditions is met, they will think that a is not qualified to become primary and they will send a return packet to inform a that "the election is stopped! ". If none of the three conditions are true, that is to say, they believe that there is indeed no primary in the current cluster, and the data of a is up-to-date, then they will send a response packet to inform a that "no problem".

If a receives "Stop election! ", Then he will immediately stop the election and keep himself in sencondary state.

If a receives a message from all other nodes and says "no problem", it enters the second stage of the election process.

In the second phase, a will send a package to other nodes and say, "I have declared that I am a primary." At this time, the B and c nodes will make some final confirmation: whether all the conditions previously judged indicate that a can do primary. If so, then they will vote in favor of a in this round of primary election. After they vote for the vote, no other voting decisions will be made within 30 seconds.

The above indicates that if the second confirmation is still successful, what if the final confirmation fails. They will vote for a vote and oppose a as a primary. If any vote is generated, this round of election will fail. A still maintains the secondary identity.

Assume that B votes in favor of a and c votes against. At this time, because B voted in favor, it could not vote again within 30 seconds. So if c initiates an election to make himself a primary at this time, c must get a vote in favor of a at this time. Because B cannot vote at this time, c must get a vote in favor of a in order to obtain a majority.

Therefore, the voting rule is as follows: if no one votes against the vote, and the percentage of votes in favor is more than half, the current round of election will become the primary object.


4. Reading and Writing of replica sets

By default, the application performs read operations directly on the master node of the replica set. The read operation on the master node ensures that the data returned by the read operation is the latest. However, if some or all read operations are distributed to the slave node of the replica set when the data consistency requirements are not so strict, it can improve read performance and reduce the waiting time of applications.

To ensure data consistency during read operations on the slave node, We can configure the client to ensure that the write operation is successfully completed after all nodes in the application to the replica set.

The MongoDB driver supports five replica set read options.

Replica set read option Mode Detailed description
Primary By default, all read operations are performed on the master node of the replica set.
PrimaryPreferred In most cases, the read operation is performed on the master node, but if the master node is unavailable, the read operation will be transferred to the slave node for execution.
Secondary All read operations are performed on the slave node of the replica set.
SecondaryPreferred In most cases, read operations are performed on slave nodes, but when the slave node is unavailable, read operations will be transferred to the master node.
Nearest The read operation is performed on the node with the smallest network latency in the replica set, regardless of the node type.
Use configurations to find the architecture that suits your business.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.