MongoDB replica set creation and management

Last Update:2018-07-23 Source: Internet

Author: User

Tags mongodb rollback

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

the concept of a replica set

A replica set is a set of servers, one of which is the primary server (primary), used to process client requests, and multiple backup servers (secondary) to hold data replicas of the primary server. If the primary server crashes, the backup server automatically upgrades one of the members to the new primary server.

Bj1-farm1:primary> Rs.ismaster ()

{

"SetName": "Bj1-farm1",

"Setversion": 4,

"IsMaster": true,

"Secondary": false,

"Hosts": [

"172.16.0.150:27017",

"172.16.0.152:27017",

"172.16.0.151:27017"

"PRIMARY": "172.16.0.150:27017",

"Me": "172.16.0.150:27017",

"Maxbsonobjectsize": 16777216,

"Maxmessagesizebytes": 48000000,

"Maxwritebatchsize": 1000,

"LocalTime": Isodate ("2014-12-01t08:20:34.014z"),

"Maxwireversion": 2,

"Minwireversion": 0,

"OK": 1

}

To create a test replica set 1 Add-replset Spock option on each machine when starting the Mongod service, or write to the configuration file, where Spock is the designator 2 Creating a configuration file

Config ={

"_id": "Spock",

"Members": [

{"_id": 0, "host": "10.0.11.243:27017"},

{"_id": 1, "host": "10.0.11.244:27017"}

]

3 connecting to the database and initializing the replica set

db = (Newmongo ("10.0.11.243:27017")). Getdb ("test")

Rs.initiate (config)

4 Master Insert data test

for (i=0;i<1000;i++) {Db.coll.insert ({count:i})}

Db.coll.count ()

5 test on the backup node, modify first to make the backup node readable

Db.setslaveok ()

Db.coll.count ()

Stop Replica set

Rs.stopset ()

RS Auxiliary function

RS is a global variable that contains the available rs.help () for replication-related auxiliary functions, most of which are just wrappers for database commands, such as

Db.admincommand ({"Replsetinitiate": Config}) is equivalent to the previous one

Network Considerations

It is necessary to ensure that each member of the replica set can reach each other and should not use localhost as host name

Modifying a replica set Add new Member

Rs.add ("server-3:27017") Delete member

Rs. Remove ("server-3:27017") when deleting a member, it is normal to complain that the information cannot be connected to the database, which indicates that the configuration modification was successful. When you reconfigure the last step in the replica set, the master closes all the connections, so the shell disconnects briefly and then automatically establishes the connection again.

To see if a configuration modification was successful

Spock:primary> Rs.config ()

{

"_id": "Spock",

"Version": 1,

"Members": [

{

"_id": 0,

"Host": "10.0.11.243:27017"

{

"_id": 1,

"Host": "10.0.11.244:27017"

}

]

}

At the end of each modification, version will increase itself, with an initial value of 1;

You can also create a new configuration document, and then use Rs.reconfig to reconfigure the replica set

>var Config=rs.config ()

>config.members[0].host = "server-0:27017"

>rs.reconfig (config)

Designing Replica Sets

A very important concept of a replica set is "most": Selecting a primary node requires most decisions, that is, more than half of the members of the replica, so if you use a replica set with only two members, one hangs, and no end of the network can meet the majority of the requirements, it is generally not recommended to use such a configuration Two recommended configuration options

1 Put "most" members in the same data center

2 Place the respective number of equal members in the two data centers, placing a replica set member in the third place to determine the outcome

Quorum node

The only function of the arbitrator is to elect, not to save the data, nor to serve the client, but to satisfy the "majority" requirement

Two ways to add arbitrators

>rs.addarb ("server-5:27017")

>rs.add ({"_id": 4, "host": "server-5:27017", "Arbiteronly": true})

Disadvantages of arbitration

For example, there are three members of the replica set, one of which is the quorum node. When a data node is hung, then another data node becomes the primary node and a new backup node needs to be added to ensure data security, but because the quorum node has no data, the new node can be transferred only by the current master node. It will not only process application requests, but also replicate data to backup nodes, which can result in huge server pressure

So try to configure it as an odd number of data members without using the arbitrator

Priority Level

The priority represents the degree to which a member aspires to be a master node, with a value range of 0-100, default 1, and 0 representing never being the primary node (passive)

High priority will be given priority to electing the primary node (as long as he gets most of the support and the data is up to date)

Hide Members

A client does not send a request to a hidden member, nor does a hidden member become a replication source (unless other replication sources are not available), so that a less powerful server or backup server can be hidden, and only members with a priority of 0 can be hidden by setting the hidden:true in the configuration. Hiding Ismater () will not see hidden members, and hidden members can be seen using Rs.status and Rs.config (). Because the client is connected to the replica set, the call is IsMaster () to view the available members

Delay Backup Node

For disaster protection, similar to MySQL delay replication, using Slavedelay to set up a deferred backup node, requiring a member priority of 0

Create an index

Sometimes the backup node does not need the same index as the primary node, not even the index, especially if the backup node uses only to handle data backups or offline bulk tasks, then the configuration can be specified buildindex:false. This node is required to have a priority of 0 and is a permanent behavior unless the node is deleted and added to the replica set for data synchronization.

Sync

The MONGO copy function is implemented through Oplog, Oplog contains each write operation of the master node, is a fixed collection in the master node's local database, the backup node can know the operation that needs to replicate by querying this collection. Each backup node has its own oplog so that each member can be used as a synchronization source for other members. The backup node obtains the actions that need to be performed from the current synchronization source, performs these operations on its own dataset, and finally writes the operations to its own oplog.

If the backup node is dead, when it restarts, it automatically starts the synchronization from the last operation in Oplog, because the replication operation is the process of copying the data before writing to Oplog, so the backup node is likely to replicate again on the data that has already been synchronized. MONGO is doing this: the same action is performed multiple times in Oplog and only once.

Because the size of the oplog is fixed, the speed at which it is used is almost the same as the rate at which the system processes write requests. However, there are some exceptions: if a single processing can affect multiple documents, each affected document will correspond to a oplog log. For example, the execution of Db.coll.remove () deletes 1 million documents, then Oplog will have 1 million operations log, so Oplog will soon be filled.

Initializing synchronization

Once a member of the replica set is started, it checks its own state to see if it can be synchronized from a member, and if not, it attempts to replicate the full data from another member of the replica, which is the initialization synchronization (initial sync), which typically follows these steps

1 Select a member as the synchronization source, create an identifier for yourself in local.me, delete all existing databases, and start synchronizing with a completely new state

2 clones, copy all records of the synchronization source to local

3 Oplog synchronization, the operations in the cloning process will be recorded to Oplog. If a document is moved during the cloning process, it may be omitted and a clone needs to be cloned again

4 record the operation in the first Oplog synchronization, and only when nothing needs cloning will it be different from the first.

5 Creating an Index

6 Synchronize index operations during index creation.

7 complete initialization, switch to normal sync state

Initializing synchronization-induced problems and solutions

If you want to track the initialization synchronization process, the best way is to query the server log

Initialization synchronization is simple, but too slow to recover from backup

Cloning a working set that could corrupt a synchronization source. After actual deployment there will be a frequently used subset of data in memory, performing an initialization synchronization forces all data paging of the current member into memory, which causes frequent data to not reside in memory, causing many requests to slow down. However, for smaller datasets and better performance servers, initializing synchronization is an easy and easy-to-use option.

Initializing a synchronization if it takes too long, the new member will be disconnected from the synchronization source, causing the data synchronization of the new member to be slower than the synchronization source, and the synchronization source may overwrite some of the data that the new member needs to replicate. There is no good way to solve this problem, you can only perform initialization synchronization when you are not too busy. Or it is important to have the master node save enough operations logs with a larger oplog.

Heartbeat

Each member sends a heartbeat request to the other member every two seconds to check the status of each member to see if he or she meets most of the conditions.

rolling back

If the primary node executes a write request and hangs, but the backup node has not yet been able to replicate this operation, the newly elected master will miss the write operation. The rollback process is performed.

If you roll back too much, MONGO will not be able to handle it, and if the number to be rolled back is greater than 300m or greater than 30 minutes, the rollback fails and a resynchronization is required.

application connects to the replica set

Similar to connecting to a replica set and connecting to a single server, common connection strings are as follows:

"MONGODB://SERVER-1:27017,SERVER-2:27017"

When the primary node is hung, the driver automatically finds the new master node as soon as possible. During the election process, the master node may cause temporary unavailable, and no requests (read or write) will be processed during that time, but you can optionally route read requests to the backup node

waiting for write replication

Use the GetLastError command to check that the write is successful, or use it to ensure that the write operation is replicated to the backup node. The parameter "W" forces the GetLastError to wait until a given number of members have completed the final write operation. This value can be passed through the majority keyword, wtimeout is the timeout time

Spock:primary>db.runcommand ({"GetLastError": 1, "W": "Majority", "Wtimeout": 1000})

{

"Lastop": Timestamp (0, 0),

"ConnectionID": 4776,

"N": 0,

"Syncmillis": 0,

"Writtento": null,

' err ': null,

"OK": 1

}

Usually "W" is used to control write speed, MONGO write too fast, the main node after the write operation, the backup node is too late to follow, by regular call GetLastError, set w value greater than 1 can force the write operation on this connection to wait until the copy succeeds, but this will block the write operation

custom Replication Guarantee Rules

1 guaranteed to replicate to a single server in each data center

2 Guaranteed write operations are replicated to most of the visible nodes

3 Creating additional Rules

Send a read request to a backup node

1 Due to consistency considerations

2 Due to load considerations

Applicable scenarios

1 The primary node is dead, requiring the backup node to be able to read the data (primarypreferred)

2 obtain low latency data, or nearest parameters, based on the ping time of the driver to the replica set. This is the only way if your application needs to read from multiple data centers to the same document with the lowest latency. If the relevance of the document and location is greater, use fragmentation. If it is required for low latency reading and writing, it must be a slicing scheme.

3 If you can accept any stale data, use secondary it will always send read requests to the backup node

4 Secondary preferred: Priority sent to available backup nodes, no more master

5 General real-time requirements of the primary, not very tall primarypreferred

View server configuration on line

Spock:primary> db.servercmdlineopts ()

{

"ARGV": [

"Mongod",

"-F",

"/etc/mongod.conf"

"Parsed": {

"Config": "/etc/mongod.conf",

"NET": {

"Bindip": "10.0.11.244"

"Processmanagement": {

"Fork": true,

"Pidfilepath": "/var/run/mongodb/mongod.pid"

"Replication": {

"Replset": "Spock"

"Storage": {

"DBPath": "/var/lib/mongo"

"Systemlog": {

"Destination": "File",

"Logappend": true,

"Path": "/var/log/mongodb/mongod.log"

}

"OK": 1

}

turning the primary node into a backup node

Rs.stepdown (60)//60 seconds no one else promoted the primary node requires the node to be returned to the election to prevent the election

Rs.freeze (1000)//is executed on each backup node, preventing it from becoming the primary node, and preventing other nodes from usurping the host while maintaining the primary node, Rs.freeze (0) indicates that the maintenance is complete to release

The easiest way to monitor replication is to view logs

Monitor replication latency and run the following command on the standby

Spock:secondary>db.printslavereplicationinfo ()

source:10.0.11.243:27017

Syncedto:mon Dec 2014 18:12:32 gmt+0800 (CST)

0secs (0 hrs) behind the primary

to adjust the Oplog size

1 if the current server is the master node, let him abdicate so that other members can update as soon as possible to the same

2 shut down the current server

3 Start the current server in stand-alone mode

4 temporarily saves the last insert in Oplog to another collection

5 Delete the current oplogdb.oplog.rs.drop ()

6 Create a new Oplog

7 write the last action record back to Oplog

8 Restart the current server as a replica set member

recovering from deferred backup nodes

Method 1: May cause this member to overload

1 Close all other Members

2 Delete all data in the other member data directory

3 Reboot all Members

Method 2: Causes each server to have the same size Oplog

1 Close all other Members

2 Delete all data in the other member data directory

3 Copy data files that delay the backup node to another server

4 Reboot all Members

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More