MongoDB Oplog Detailed

Source: Internet
Author: User

1:oplog Introduction

Oplog is a fixed collection under the local library, and secondary is copied by looking at the primary Oplog collection. Each node has a oplog that records the information that is copied from the primary node so that each member can be used as a synchronization source to other nodes.

Oplog can be said to be the link of MongoDB replication.

2: The process of replica set data synchronization

Detailed procedures for data synchronization in a replica set: The primary node writes the data, secondary copies the information by reading the oplog of the primary, begins copying the data, and writes the copied information to its own oplog. The backup node stops copying data from the current data source if an operation fails (which can occur only if the data for the synchronization source is corrupted or the data is inconsistent with the primary node). If a backup node is suspended for some reason, when restarted, it will automatically start synchronization from the last action of Oplog, after the synchronization is completed, the information will be written to its own oplog, since the copy operation is to replicate the data before the copy is completed and then write Oplog, it is possible that the same operation will synchronize two copies, However, MongoDB at the beginning of the design to consider this problem, will oplog the same operation executed multiple times, and the effect is the same as the execution once.

    • Role:

When primary writes, these write records are written to the primary oplog, and then secondary is copied to the native and applied to the Oplog function.
At the same time, because it records the write operation on the primary, it can also be used as data recovery.
It can be simply viewed as a binlog in MySQL.

The growth rate of 3:oplog

Oplog is a fixed size, he can only save a specific number of operation logs, usually oplog use of space growth rate with the system to process write requests, if the primary node processing 1KB of write data per minute, then oplog every minute about also write 1KB data. If a single operation affects multiple documents (such as deleting multiple documents or updating multiple documents), Oplog may have more than one action log. Db.testcoll.remove () deleted 1 million documents, there will be 1 million operation logs in the Oplog. If there is a large volume of operations, Oplog may soon be full.

    • Size:

Oplog is a capped collection.
In 64-bit Linux, Solaris, FreeBSD, and Windows systems,MongoDB defaults to a size of 5% of the available disk space (the default minimum is 1G and the maximum is 50G). Or you can set oplogsize in mongo.conf to the value we need before the MongoDB replica set instance is initialized .

Local.oplog.rs a capped collection collection. you can set the collection size size by using the--oplogsize option at the command line.
However, due to the oplog it guarantees the normal operation of replication, as well as the security and disaster tolerance of the data.

4:oplog Precautions:

A special collection of local.oplog.rs. the operation used to record the primary node .

To improve the efficiency of replication, heartbeat detection (ping) occurs between all nodes in the replication set. Each node can get oplog from other nodes.

An action in the Oplog. No matter how many times the effect is performed,

Size of the 5:oplog

The first time you start a node in a replication set, MongoDB establishes Oplog, which will have a default size that depends on the machine's operating system

Rs.printreplicationinfo () To view the status of the Oplog, the output information includes the Oplog log size, and the start time of the Operation log record.

Db.getreplicationinfo () can be used to view the status, size, and time range of the Oplog.

Size of the Oplog

Capped collection is a fixed-size collection in MongoDB that provides high-performance insert, read, and delete operations, and when the collection is filled, the new inserted document overwrites the old document.

Therefore, the Oplog table uses capped collection is reasonable, because it is impossible to make unlimited growth oplog. MongoDB will have a default Oplog size when initializing the replica set:

    • On 64-bit LINUX,SOLARIS,FREEBSD and Windows systems, MongoDB allocates 5% of the disk's remaining space as the size of Oplog, and allocates 1GB of space if this part is less than 1GB
    • 183MB is allocated on 64 OS X systems
    • Only 48MB is allocated on a 32-bit system

Oplog size setting is a problem to consider, if the oplog size is too large, will waste storage space, if the oplog size is too small, the old Oplog records will be overwritten quickly, then the node of the outage is prone to the phenomenon of unable to synchronize data.

For example, based on the example above, we stopped a backup node (port=33333) and then inserted the following record through the master node, then looked at Oplog, and found that the previous oplog had been overwritten.

This node is connected through the MongoDB shell and it is found that the node has been in recovering state .

Workaround:

Data synchronization

In a replica set, there are two ways of synchronizing data:

    • Initial sync (Initialize): This process occurs when a new database is created in the replica set or one of the nodes is just recovering from the outage, or when a new member is added to the replica set, by default, the nodes in the replica set replicate the Oplog from the node closest to it to synchronize the data. This closest node can be a primary or a secondary node with the latest Oplog replicas.
      • This operation typically re-initializes the backup node with a large overhead
    • Replication (replication): This operation continues after initialization to maintain data synchronization between each secondary node.
Initial sync

When encountering an issue that cannot be synced in the example above, initial sync is only available in the following two ways

    • The first way is to stop the node and then delete the files in the directory and restart the node. This way, the node executes initial sync
      • Note: In this way, sync time is based on the amount of data, and if the amount of data is too large, the sync time will be very long
      • At the same time there will be a lot of network transmission, may affect the work of other nodes
    • In the second way , stop the node, then delete the files in the directory, find a newer node, and copy the files from that node directory to the node directory to sync.

You can restore the "port=33333" node by using one of the two methods above. Change the error that has been in the recovering state.

6:oplog data structure

The following analysis of the meaning of the field in the Oplog, the following command to take out a oplog:

Db.oplog.rs.find (). Skip (1). Limit (1). ToArray ()
    • The timestamp of the ts:8 byte, represented by a 4-byte UNIX timestamp + 4-byte self-increment count. This value is very important, when the election (such as Master down) new primary, will choose the largest TS secondary as the new primary
    • Op:1 byte type of operation
      • "I": Insert
      • "U": Update
      • "D": Delete
      • "C": db cmd
      • "DB": Declares the current database (where NS is set to = + database name + '. ')
      • "N": no op, empty operation, which is periodically executed to ensure timeliness
    • NS: namespace of operation
    • O: The document that corresponds to the operation, that is, the contents of the current operation (such as the fields and values to be updated when the update operation)
    • O2: The Where condition when performing an update operation only if it is limited to update
View Oplog's information

The Db.printreplicationinfo () command allows you to view oplog information

Field Description:

    • Configured Oplog size:oplog File size
    • Log length start to End:oplog time period of the log
    • Oplog first Event Time: The generation of a transaction log
    • Oplog Last Event time: The final transaction log is generated
    • Now: the time
View slave status

"Db.printslavereplicationinfo ()" Allows you to view the synchronization status of slave

Execute db.printslavereplicationinfo () command in replica node to view synchronization status information

    • source--IP and ports from the library
    • syncedto--current sync situation, how long is the delay?

When we insert a new piece of data and then recheck the slave status, we find that sync time is updated

Reference articles

MongoDB Oplog Detailed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.