MongoDB replica set (3) Internal Data Synchronization

Last Update:2018-06-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One Data Synchronization a healthy secondary selects a node closest to itself during running, and the data is synchronized compared to its new node. After the node is selected, it will pull the oplog synchronization log from this node. The specific process is as follows:. execute this op log B. write this op log to your oplog (local. oplog. rs)
C. Request the next op log
If the synchronization operation goes down between Step 1 and step 2, secondary will check its latest oplog after resuming the synchronization operation, therefore, this write operation log does not exist on your own side. At this time, he will execute the operation he just executed again. Is there a problem if I perform the same write operation twice? MongoDB has taken this into consideration when designing oplog, so all oplogs can be executed repeatedly. For example, if you execute {$ inc: {counter: 1}, add 1 to the counter field, when the counter field is added with 1 and the value is 2, the Operation {$ inc: {counter: 1} is not recorded in oplog, but the {$ set: {counter: 2. Therefore, no matter how many times the same write operation is performed, no problem occurs.
Note: The slave node does not have to read data from the operation log of the master node. It can also be closest to itself (based on the ping time) and get operation logs from the node that is updated than the operation logs. 2. Synchronization Process

When we perform a write operation in MongoDB, a success is returned by default. You can also set the w parameter to specify the number of nodes to which the write operation is synchronized. As follows:

db.foo.runCommand({getLastError:1, w:2})

The above example is to execute the getLastError command to synchronize the previous write operation to two nodes and then return the result. Different clients may be written differently, but this function should be available. For important data, you can consider using this method to improve data security by sacrificing some write performance.

How can this function be implemented? How many copies of data are synchronized on the primary node? When calling the preceding command, MongoDB actually executes the following processes:
A. Complete the write operation on primary.
B. record an oplog on primary. The log contains a ts field and the value is the time when the write operation is executed. For example, the log is t
C. The client calls the {getLastError: 1, w: 2} command to wait for primary to return the result.
D. secondary pulls oplog from primary and obtains the log of the write operation just now.
E. secondary perform write operations based on the obtained logs.
F. After the execution is complete, secondary obtains the new log. The condition for pulling oplog to primary is {ts :{$ gt: t }}
G. primary now receives the secondary request and learns that secondary has successfully executed the write operation logs whose request time is greater than t.
H. At this time, the getLastError command detects that both primary and secondary have completed the write operation, so w: 2 meets the conditions and returns the result to the client successfully.

Note: 1. If the set w parameter is greater than the number of slave nodes in the current replica set, the write operation will be blocked until the number of write nodes reaches the data set by w parameter. 2. If you set the W parameter to the number of slave nodes in the current collection, the replica set will provide services with strong consistency, but the Write Performance of the entire replica set will also decrease. Start Initialization

When a new node is started and added to the current Replica Sets, the newly started node will view its own oplog and find the most recent write operation through a command called lastOpTimeWritten. You can also execute the following command in the command line:

> rs.debug.getLastOpWritten()

This command returns an oplog record, where the ts field is the time of the last write operation.

If your node is brand new and has no data, there is no data in oplog. At this time, the node will choose to perform a full synchronization. This document does not describe the full synchronization method for the time being.

Select synchronization source node

Data is always synchronized between nodes in Replica Sets, but they are not synchronized in the traditional one-master-multiple-slave mode. MongoDB's policy is to select an appropriate node as the data source.

First, the secondary node uses the ping time to determine the distance between other nodes and it. The longer the time, the farther the distance is. Then, use the following method to determine its source node:

for each member that is healthy:    if member[state] == PRIMARY        add to set of possible sync targets    if member[lastOpTimeWritten] > our[lastOpTimeWritten]        add to set of possible sync targetssync target = member with the min ping time from the possible sync targets

Different Versions determine whether the node is healthy, but the purpose is to find a healthy node. In version 2.0, its judgment also includes the salve delay factor.

You can run db. adminCommand ({replSetGetStatus: 1}) command to view the current node status, run this command on secondary, you can see the field syncingTo, the value of this field is the synchronization source of this secondary. (In fact, the name should be syncingFrom, but this incorrect name is used due to version compatibility)

Chain synchronization Structure

Release/2u + release/release + 3aGj1eLR + release + 3cqxo6xDu + release + Gw0cTjtcTV4rj2vMbK/release + release 2kG/tMC0o6y + release /authorization + signature/ehozxicj4KPHA + signature/WtNDQuvOjrELK18/Signature + bTTQsnPu/Signature/bjVssW/Signature 68kG + signature/Signature + CjxwPgq + Signature = = "brush: java; "> c B A <==><====>< ---->
There are two channels between B and A. The dual line is A real synchronous connection, and the single line is A virtual connection.
Note: The chain synchronization structure of MongoDB is similar to the streaming replication of data blocks in HDFS in Hadoop. This can greatly reduce the pressure on the master node and improve the speed of data synchronization.
3. New Functions

The above is the internal implementation of current Replica Sets synchronization, and some new features will be developed in the future. In version 2.2, the replSetSyncFrom command is provided, allowing you to manually set a secondary synchronization source. The usage is like this:

> db.adminCommand({replSetSyncFrom:"otherHost:27017"})

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MongoDB replica set (3) Internal Data Synchronization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MongoDB replica set (3) Internal Data Synchronization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support