MongoDB Data Migration and synchronization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MongoDB data migration and synchronization of MONGODB data replication

MongoDB requires a minimum of two instances of replication. One is the master node master, which handles client requests and the rest is slave, which is responsible for replicating data from master.

Master Write Processing:
Master is responsible for receiving write requests, and the process is as follows:

If the journal function is turned on, the write request is recorded to journal and then executed in bulk, and the operation is recorded in Oplog.
If the journal feature is not turned on, each write request is individually manipulated and then written to Oplog.

Note: The Oplog is idempotent, and when there is an additive operation of INC, it is recorded as a set operation, so that no matter how many operations are performed repeatedly, the result is the same.

Synchronizing from a node:

If it is a new slave node, first copy the database file from master, and record the starting time, and when the replication from the node from master is complete, the Oplog is started after the replication start time and then synchronized with master.
After initializing the synchronization, slave periodically gets the latest action from Master's Oplog and then performs these operations on its own copy of the data, ensuring that the slave data is ultimately consistent with master.

Note: When the speed of the slave synchronization does not match the speed of the master Update, Oplog overwrites the old record because of an excessive amount of extra action, so that slave may not be able to guarantee that all data is synchronized, and slave begins to resynchronize from the beginning.

The master-Slave synchronization principle of MongoDB

Master-slave replication is the most common form of replication for MongoDB, and is flexible enough to be used for backup, failback, and read scaling. You need to specify master and Slave,slave when you start the process to specify the address of master, which does not have an automatic failover feature.

By default, slave is neither writable nor readable, but can be configured to change the slave to readable mode for read-write separation and improved system performance.

Operation

Actions can be selected in two ways: the command line and the configuration file.

Command line

Master Start command

# Mongod--config/etc/mongodb.conf--master

Slave Start command

# Mongod--config/etc/mongodb.conf--slave--autoresync--slavedelay=10--source master-ip:master-port

Configuration

Master configuration:
Master = True

Slave configuration:
Slave = True
Autoresync = True
Slavedelay = 10

Options

-only
Specifies that only a specific database is replicated on the slave node (by default, all databases are replicated).
-slavedelay
Used on the slave node, when the master node's operation is applied, increments the delay copy (in seconds) from the node. This makes it easy to set the delay from the node, such a node to the user inadvertently delete important documents or insert garbage data, such as protective effect, these bad operations will be copied to all from the node, through the delay to perform operations, can have a recovery time difference.
-fastsync
Initiates a Slave node based on the data snapshot of the master node. If the data catalog starts with a data snapshot of the master node, booting from the node with this option is much faster than doing a full synchronization.
-autoresync
If the slave node is not synchronized with the primary node, it is automatically resynchronized.
-oplogsize
The size of the primary node Oplog (in megabytes). The default Oplog size is 5% of the remaining disk space.

MongoDB Replica Set principle

A replica set (Replica set) is a master-slave cluster with automatic recovery. The most obvious difference between master and slave clusters and replica sets is that the replica set does not have a fixed "master node", and the entire cluster elects a "master node" that changes to other nodes when it is not working. The replica set always has an active node (primary) and one or more backup nodes (secondary).

Replica sets can automatically switch when there is a problem with the active node.

Node type

At any time, there is only one active node in the cluster, and the others are backup nodes. The active node is actually an Active server, and the specified active node can change over time.

There are several different types of nodes that can exist with the replica set:

Standard node
This is a regular node, which stores a complete copy of the data, and voting in an election may become an active node.
Passive passive nodal point
A complete copy of the data is stored, participating in the poll and not being an active node.
Arbiter Arbitrator
Arbitrators can only participate in voting, do not receive replicated data, or become active nodes.

Priority level

Each participating node (non-quorum) has priority, priority values are from large to small, the default priority is 1, which can be 0-1000 (inclusive).
Modify the priority key in the node configuration to configure either the standard node or the passive node.

> Members.push ({"_id": 3, "host": "127.0.0.1:10002", "Priority": 40})

The "arbiteronly" key can specify the quorum node

> Members.push ({"_id": 4, "host": "127.0.0.1:10003", "Arbiteronly": true})

The backup node extracts the oplog from the active node and takes action, just like the backup server in the active backup system. The active node also writes operations to its own local oplog.oplog, which contains a strictly incrementing sequence number that determines the timeliness of the data.

Election strategy

If the active node fails, the remaining nodes select a new active node. The election process can be initiated by any inactive node, and the new active node is generated by the majority of the elections in the replica set. Among them, the arbitration node also participates in the election to avoid deadlock. The new active node will be the node with the highest priority, with the same priority, and the data is more than the new node wins.

Regardless of when the active node changes, the data of the new active node is assumed to be the most recent data of the system. The operation of the other points (the original active node) is rolled back, even if the previous active node has resumed work. In order to complete the rollback, all nodes are re-synchronized after the new active node is connected. These nodes look at their oplog, find out what the active node does not have, and then request the active node for the most recent copy of the document affected by these operations. The node that is performing the resynchronization is considered a recovery and cannot be a candidate for the active node until this process is completed.

Operation Command Line initialization operation

Setting a replica set is a little more complicated than setting up a master-slave cluster.

The first name of the replica set is to make it easy to differentiate it from other replica sets, and also to make the whole collection as a whole, which is named: Refactor

The role of the boot server –replset is to let the server know the "refactor" replica set and other companions, located in refactor/127.0.0.1:10001

# Mongod--dbpath/data1/mongodb--port 10000--replset refactor/127.0.0.1:10001--logpath/data1/log/mongodb/ Mongodb.log--rest

Start another in the same way:

# Mongod--dbpath/data1/mongodb--port 10001--replset refactor/127.0.0.1:10000--logpath/data1/log/mongodb/ Mongodb.log--rest

If you want to add a third one, two ways:

# Mongod--dbpath/data1/mongodb--port 10002--replset refactor/127.0.0.1:10000--logpath/data1/log/mongodb/ Mongodb.log--rest

Note: The replica set has automatic detection: After you specify a single server, MongoDB automatically searches for and connects to the remaining nodes.

MONGO Internal Command initialization operation

MongoDB Data Migration Mongodump & MONGORESTOREMONGODB Data backup

In MongoDB we use the Mongodump command to back up MongoDB data. The command can export all data to the specified directory.
The Mongodump command allows you to specify the server to which the exported data can be dumped by parameter.
Grammar
The mongodump command script syntax is as follows:

# mongodump-h dbhost-d Dbname-o dbdirectory

-H:
Destination MONGDB The server address, for example: 127.0.0.1, or you can specify the port number: 127.0.0.1:27017
-D:
The database instance that needs to be backed up, for example: test
-O:
Backup data storage location, for example:/data/mongodb/backup, of course, the directory needs to be established in advance, after the backup is completed, the system automatically establishes a test directory under the Dump directory, which holds the backup data of the database instance.

MongoDB Data Recovery

MongoDB uses the Mongorerstore command to restore the backed up data.
Grammar
The mongorestore command script syntax is as follows:

# mongorestore-h dbhost-d dbname--directoryperdb dbdirectory

-H:
MongoDB server Address
-D:
A database instance that needs to be recovered, for example: test, and of course the name can be different from the backup time, such as Test2
-directoryperdb:
Where to back up your data, for example: C:\data\dump\test, why add a test here instead of a dump at backup time, and read the tips yourself!
-drop:
When recovering, delete the current data and then restore the backed up data. That is, after the recovery, add the modified data before the backup will be deleted, use with caution!

Log System Log

The system log is important in MONGDB data, which records MongoDB start and stop operations, and any exception information that occurs during the operation of the server.

To configure the path:

# mongod--logpath= '/data/db/log/server.log '-logappend

Journal Log

The JOURANL log adds additional reliability assurance to MongoDB by pre-writing the redo log. When this feature is turned on, data updates are written to the journal Log, regularly committed (currently every 100ms), and then the changes are made in the official data.

Open in the way:

# Mongod--journal

Oplog Log

MongoDB's highly available replication strategy has one called replica Sets.replicaset the replication process has one server acting as the primary server, and one or more acting from the server, the master service writes the update to a local collection, which records the update operations that occur on the primary server and distributes those operations to the slave server. This log is capped Collection.

Note: Capped collections is a fixed-size collection with excellent performance, used (aging-out) processing with LRU (Least recently age-out Least recently used) rules and insertion order, automatically maintaining the order in which objects in the collection are inserted, You specify the size beforehand when you create it.

# Mongod--oplogsize=1024 #单位是M

Slow log

The slow query records the operation statement that the execution time exceeds the set time threshold. The slow query log is useful for discovering statements that have a performance problem, and it is recommended that you turn on this feature and often analyze the contents of the log.

To configure this feature, you only need to set the profile parameter at Mongod boot time. For example, if you want to record more than 5s of operations, you can use the following statement:

Reference articles

Master Slave Replication
MongoDB replication
MongoDB Replication principle
MongoDB Backup (Mongodump) and recovery (Mongorerstore)

MongoDB Data Migration and synchronization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MongoDB Data Migration and synchronization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MongoDB Data Migration and synchronization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support