How the MongoDB replica set works

Last Update:2014-12-10 Source: Internet

Author: User

Tags mongodb documentation

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the MongoDB replica set, the primary node handles read and write requests from the client, and the backup node is responsible for mapping the master node's data.

How the backup node works can be described roughly as the backup node polls the data operations on the master node on a regular basis, and then carries out these operations on its own copy of the data to ensure synchronization with the data of the master node.

The operation of all database state changes on the master node is stored in a specific system table. The backup node makes its own data updates based on that data.

Oplog

The above mentioned database state change operation, called Oplog (Operation log, master node operation record). The oplog is stored in the "oplog.rs" table of the local database. The backup node in the replica set asynchronously synchronizes the Oplog from the primary node and then re-executes the operations it records to achieve the role of data synchronization.

There are a few points to note about Oplog:

Oplog only records actions that alter the state of the database
The operations stored in Oplog are not exactly the same as those performed by the primary node, such as "$inc" operations are converted to "$set" operations
Oplog stored in a fixed collection (capped collection), when the number of Oplog exceeds oplogsize, the new operation overwrites the operation

Below is a look at some of the specific content of Oplog, first delete the previous Node1-node3 folder, re-establish the replica set, but this limit oplogsize is 5MB.

Mongod.exe--dbpath="c:\mongodb\db\node1" --port=11111 --replset myreplset-- oplogsize=5--dbpath="c:\mongodb\db\node2" --port=22222 -- Replset myreplset--oplogsize=5--dbpath="c:\mongodb\db\node3" --port=33333 --replset myreplset--oplogsize=5

Then insert some data through the MongoDB shell (Connection master node)

Use Testdb.person.insert ({"name":"Will0","Gender":"Female"," Age": A}) Db.person.insert ({"name":"Will1","Gender":"Female"," Age": -})

With some commands you can see the oplog of the master node, and through Oplog you can see the previous two data insertions, and the backup node can update its own data set based on these two records.

Use localshow collectionsdb.oplog.rs. Find ()

View the status of the Oplog table, the current Oplog has 3 records, the Oplog table is a capped collection (fixed size collection), oplog the size of the table is 5242880B=5MB.

OPLOG Data structure

The following analysis of the meaning of the field in the Oplog, the following command to take out a oplog:

Db.oplog.rs. Find (). Skip (1). Limit (1). ToArray ()

The timestamp of the ts:8 byte, represented by a 4-byte UNIX timestamp + 4-byte self-increment count. This value is very important, when the election (such as Master down) new primary, will choose the largest TS secondary as the new primary
Op:1 byte type of operation
- "I": Insert
- "U": Update
- "D": Delete
- "C": db cmd
- "DB": Declares the current database (where NS is set to = + database name + '. ')
- "N": no op, empty operation, which is periodically executed to ensure timeliness
NS: namespace of operation
O: The document that corresponds to the operation, that is, the contents of the current operation (such as the fields and values to be updated when the update operation)
O2: The Where condition when performing an update operation only if it is limited to update

Size of the Oplog

Capped collection is a fixed-size collection in MongoDB that provides high-performance insert, read, and delete operations, and when the collection is filled, the new inserted document overwrites the old document.

Therefore, the Oplog table uses capped collection is reasonable, because it is impossible to make unlimited growth oplog. MongoDB will have a default Oplog size when initializing the replica set:

On 64-bit LINUX,SOLARIS,FREEBSD and Windows systems, MongoDB allocates 5% of the disk's remaining space as the size of Oplog, and allocates 1GB of space if this part is less than 1GB
183MB is allocated on 64 OS X systems
Only 48MB is allocated on a 32-bit system

Oplog size setting is a problem to consider, if the oplog size is too large, will waste storage space, if the oplog size is too small, the old Oplog records will be overwritten quickly, then the node of the outage is prone to the phenomenon of unable to synchronize data.

For example, based on the example above, we stopped a backup node (port=33333) and then inserted the following record through the master node, then looked at Oplog, and found that the previous oplog had been overwritten.

 for(Var i=0;i<10000; i++) {var randage= parseint (5*math.random ()) + -; var gender= (randage%2)?"Male":"Female"; Db.school.students.insert ({"name":" would"+i,"Gender": Gender," Age": Randage});}

Then restart the above-stopped backup node (port=33333), as you can see from the output of the server, that the Oplog is too new and the backup node cannot be synchronized.

This node is connected through the MongoDB shell and it is found that the node has been in recovering state.

Data synchronization

In a replica set, there are two ways of synchronizing data:

Initial sync (Initialize): This process occurs when a new database is created in the replica set or one of the nodes is just recovering from the outage, or when a new member is added to the replica set, by default, the nodes in the replica set replicate the Oplog from the node closest to it to synchronize the data. This closest node can be a primary or a secondary node with the latest Oplog replicas.
- This operation typically re-initializes the backup node with a large overhead
Replication (replication): This operation continues after initialization to maintain data synchronization between each secondary node.

Initial sync

When encountering an issue that cannot be synced in the example above, initial sync is only available in the following two ways

The first way is to stop the node and then delete the files in the directory and restart the node. This way, the node executes initial sync
- Note: In this way, sync time is based on the amount of data, and if the amount of data is too large, the sync time will be very long
- At the same time there will be a lot of network transmission, may affect the work of other nodes
In the second way, stop the node, then delete the files in the directory, find a newer node, and copy the files from that node directory to the node directory to sync.

You can restore the "port=33333" node by using one of the two methods above. It's not in progress.

Replica Set management view oplog information

The Db.printreplicationinfo () command allows you to view oplog information

Field Description:

Configured Oplog size:oplog File size
Log length start to End:oplog time period of the log
Oplog first Event Time: The generation of a transaction log
Oplog Last Event time: The final transaction log is generated
Now: the time

View slave status

"Db.printslavereplicationinfo ()" Allows you to view the synchronization status of slave

When we insert a new piece of data and then recheck the slave status, we find that sync time is updated

Summarize

This article describes how replica sets work, with Oplog and data synchronization to learn more about replica sets.

In addition, the practice will inevitably encounter the need to modify the Oplog size, this article is not introduced, please refer to the MongoDB documentation steps, modify Oplog size

Ps: All the commands in the example can refer to the following links

Http://files.cnblogs.com/wilber2013/oplog.js

How the

MongoDB replica set works

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More