the concept of a replica set
A replica set is a set of servers, one of which is the primary server (primary), used to process client requests, and multiple backup servers (secondary) to hold data replicas of the primary server. If the primary server crashes, the backup server automatically upgrades one of the members to the new primary server.
Bj1-farm1:primary> Rs.ismaster ()
{
"SetName": "Bj1-farm1",
"Setversion": 4,
"IsMaster": true,
"Secondary": false,
"Hosts": [
"172.16.0.150:27017",
"172.16.0.152:27017",
"172.16.0.151:27017"
],
"PRIMARY": "172.16.0.150:27017",
"Me": "172.16.0.150:27017",
"Maxbsonobjectsize": 16777216,
"Maxmessagesizebytes": 48000000,
"Maxwritebatchsize": 1000,
"LocalTime": Isodate ("2014-12-01t08:20:34.014z"),
"Maxwireversion": 2,
"Minwireversion": 0,
"OK": 1
}
To create a test replica set 1 Add-replset Spock option on each machine when starting the Mongod service, or write to the configuration file, where Spock is the designator 2 Creating a configuration file
Config ={
"_id": "Spock",
"Members": [
{"_id": 0, "host": "10.0.11.243:27017"},
{"_id": 1, "host": "10.0.11.244:27017"}
]
3 connecting to the database and initializing the replica set
db = (Newmongo ("10.0.11.243:27017")). Getdb ("test")
Rs.initiate (config)
4 Master Insert data test
for (i=0;i<1000;i++) {Db.coll.insert ({count:i})}
Db.coll.count ()
5 test on the backup node, modify first to make the backup node readable
Db.setslaveok ()
Db.coll.count ()
Stop Replica set
Rs.stopset ()
RS Auxiliary function
RS is a global variable that contains the available rs.help () for replication-related auxiliary functions, most of which are just wrappers for database commands, such as
Db.admincommand ({"Replsetinitiate": Config}) is equivalent to the previous one
Network Considerations
It is necessary to ensure that each member of the replica set can reach each other and should not use localhost as host name
Modifying a replica set Add new Member
Rs.add ("server-3:27017") Delete member
Rs. Remove ("server-3:27017") when deleting a member, it is normal to complain that the information cannot be connected to the database, which indicates that the configuration modification was successful. When you reconfigure the last step in the replica set, the master closes all the connections, so the shell disconnects briefly and then automatically establishes the connection again.
To see if a configuration modification was successful
Spock:primary> Rs.config ()
{
"_id": "Spock",
"Version": 1,
"Members": [
{
"_id": 0,
"Host": "10.0.11.243:27017"
},
{
"_id": 1,
"Host": "10.0.11.244:27017"
}
]
}
At the end of each modification, version will increase itself, with an initial value of 1;
You can also create a new configuration document, and then use Rs.reconfig to reconfigure the replica set
>var Config=rs.config ()
>config.members[0].host = "server-0:27017"
>rs.reconfig (config)
Designing Replica Sets
A very important concept of a replica set is "most": Selecting a primary node requires most decisions, that is, more than half of the members of the replica, so if you use a replica set with only two members, one hangs, and no end of the network can meet the majority of the requirements, it is generally not recommended to use such a configuration Two recommended configuration options
1 Put "most" members in the same data center
2 Place the respective number of equal members in the two data centers, placing a replica set member in the third place to determine the outcome
Quorum node
The only function of the arbitrator is to elect, not to save the data, nor to serve the client, but to satisfy the "majority" requirement
Two ways to add arbitrators
>rs.addarb ("server-5:27017")
>rs.add ({"_id": 4, "host": "server-5:27017", "Arbiteronly": true})
Disadvantages of arbitration
For example, there are three members of the replica set, one of which is the quorum node. When a data node is hung, then another data node becomes the primary node and a new backup node needs to be added to ensure data security, but because the quorum node has no data, the new node can be transferred only by the current master node. It will not only process application requests, but also replicate data to backup nodes, which can result in huge server pressure
So try to configure it as an odd number of data members without using the arbitrator
Priority Level
The priority represents the degree to which a member aspires to be a master node, with a value range of 0-100, default 1, and 0 representing never being the primary node (passive)
High priority will be given priority to electing the primary node (as long as he gets most of the support and the data is up to date)
Hide Members
A client does not send a request to a hidden member, nor does a hidden member become a replication source (unless other replication sources are not available), so that a less powerful server or backup server can be hidden, and only members with a priority of 0 can be hidden by setting the hidden:true in the configuration. Hiding Ismater () will not see hidden members, and hidden members can be seen using Rs.status and Rs.config (). Because the client is connected to the replica set, the call is IsMaster () to view the available members
Delay Backup Node
For disaster protection, similar to MySQL delay replication, using Slavedelay to set up a deferred backup node, requiring a member priority of 0
Create an index
Sometimes the backup node does not need the same index as the primary node, not even the index, especially if the backup node uses only to handle data backups or offline bulk tasks, then the configuration can be specified buildindex:false. This node is required to have a priority of 0 and is a permanent behavior unless the node is deleted and added to the replica set for data synchronization.
Sync
The MONGO copy function is implemented through Oplog, Oplog contains each write operation of the master node, is a fixed collection in the master node's local database, the backup node can know the operation that needs to replicate by querying this collection. Each backup node has its own oplog so that each member can be used as a synchronization source for other members. The backup node obtains the actions that need to be performed from the current synchronization source, performs these operations on its own dataset, and finally writes the operations to its own oplog.
If the backup node is dead, when it restarts, it automatically starts the synchronization from the last operation in Oplog, because the replication operation is the process of copying the data before writing to Oplog, so the backup node is likely to replicate again on the data that has already been synchronized. MONGO is doing this: the same action is performed multiple times in Oplog and only once.
Because the size of the oplog is fixed, the speed at which it is used is almost the same as the rate at which the system processes write requests. However, there are some exceptions: if a single processing can affect multiple documents, each affected document will correspond to a oplog log. For example, the execution of Db.coll.remove () deletes 1 million documents, then Oplog will have 1 million operations log, so Oplog will soon be filled.
Initializing synchronization
Once a member of the replica set is started, it checks its own state to see if it can be synchronized from a member, and if not, it attempts to replicate the full data from another member of the replica, which is the initialization synchronization (initial sync), which typically follows these steps
1 Select a member as the synchronization source, create an identifier for yourself in local.me, delete all existing databases, and start synchronizing with a completely new state
2 clones, copy all records of the synchronization source to local
3 Oplog synchronization, the operations in the cloning process will be recorded to Oplog. If a document is moved during the cloning process, it may be omitted and a clone needs to be cloned again
4 record the operation in the first Oplog synchronization, and only when nothing needs cloning will it be different from the first.
5 Creating an Index
6 Synchronize index operations during index creation.
7 complete initialization, switch to normal sync state
Initializing synchronization-induced problems and solutions
If you want to track the initialization synchronization process, the best way is to query the server log
Initialization synchronization is simple, but too slow to recover from backup
Cloning a working set that could corrupt a synchronization source. After actual deployment there will be a frequently used subset of data in memory, performing an initialization synchronization forces all data paging of the current member into memory, which causes frequent data to not reside in memory, causing many requests to slow down. However, for smaller datasets and better performance servers, initializing synchronization is an easy and easy-to-use option.
Initializing a synchronization if it takes too long, the new member will be disconnected from the synchronization source, causing the data synchronization of the new member to be slower than the synchronization source, and the synchronization source may overwrite some of the data that the new member needs to replicate. There is no good way to solve this problem, you can only perform initialization synchronization when you are not too busy. Or it is important to have the master node save enough operations logs with a larger oplog.
Heartbeat
Each member sends a heartbeat request to the other member every two seconds to check the status of each member to see if he or she meets most of the conditions.
rolling back
If the primary node executes a write request and hangs, but the backup node has not yet been able to replicate this operation, the newly elected master will miss the write operation. The rollback process is performed.
If you roll back too much, MONGO will not be able to handle it, and if the number to be rolled back is greater than 300m or greater than 30 minutes, the rollback fails and a resynchronization is required.
application connects to the replica set
Similar to connecting to a replica set and connecting to a single server, common connection strings are as follows:
"MONGODB://SERVER-1:27017,SERVER-2:27017"
When the primary node is hung, the driver automatically finds the new master node as soon as possible. During the election process, the master node may cause temporary unavailable, and no requests (read or write) will be processed during that time, but you can optionally route read requests to the backup node
waiting for write replication
Use the GetLastError command to check that the write is successful, or use it to ensure that the write operation is replicated to the backup node. The parameter "W" forces the GetLastError to wait until a given number of members have completed the final write operation. This value can be passed through the majority keyword, wtimeout is the timeout time
Spock:primary>db.runcommand ({"GetLastError": 1, "W": "Majority", "Wtimeout": 1000})
{
"Lastop": Timestamp (0, 0),
"ConnectionID": 4776,
"N": 0,
"Syncmillis": 0,
"Writtento": null,
' err ': null,
"OK": 1
}
Usually "W" is used to control write speed, MONGO write too fast, the main node after the write operation, the backup node is too late to follow, by regular call GetLastError, set w value greater than 1 can force the write operation on this connection to wait until the copy succeeds, but this will block the write operation
custom Replication Guarantee Rules
1 guaranteed to replicate to a single server in each data center
2 Guaranteed write operations are replicated to most of the visible nodes
3 Creating additional Rules
Send a read request to a backup node
1 Due to consistency considerations
2 Due to load considerations
Applicable scenarios
1 The primary node is dead, requiring the backup node to be able to read the data (primarypreferred)
2 obtain low latency data, or nearest parameters, based on the ping time of the driver to the replica set. This is the only way if your application needs to read from multiple data centers to the same document with the lowest latency. If the relevance of the document and location is greater, use fragmentation. If it is required for low latency reading and writing, it must be a slicing scheme.
3 If you can accept any stale data, use secondary it will always send read requests to the backup node
4 Secondary preferred: Priority sent to available backup nodes, no more master
5 General real-time requirements of the primary, not very tall primarypreferred
View server configuration on line
Spock:primary> db.servercmdlineopts ()
{
"ARGV": [
"Mongod",
"-F",
"/etc/mongod.conf"
],
"Parsed": {
"Config": "/etc/mongod.conf",
"NET": {
"Bindip": "10.0.11.244"
},
"Processmanagement": {
"Fork": true,
"Pidfilepath": "/var/run/mongodb/mongod.pid"
},
"Replication": {
"Replset": "Spock"
},
"Storage": {
"DBPath": "/var/lib/mongo"
},
"Systemlog": {
"Destination": "File",
"Logappend": true,
"Path": "/var/log/mongodb/mongod.log"
}
},
"OK": 1
}
turning the primary node into a backup node
Rs.stepdown (60)//60 seconds no one else promoted the primary node requires the node to be returned to the election to prevent the election
Rs.freeze (1000)//is executed on each backup node, preventing it from becoming the primary node, and preventing other nodes from usurping the host while maintaining the primary node, Rs.freeze (0) indicates that the maintenance is complete to release
The easiest way to monitor replication is to view logs
Monitor replication latency and run the following command on the standby
Spock:secondary>db.printslavereplicationinfo ()
source:10.0.11.243:27017
Syncedto:mon Dec 2014 18:12:32 gmt+0800 (CST)
0secs (0 hrs) behind the primary
to adjust the Oplog size
1 if the current server is the master node, let him abdicate so that other members can update as soon as possible to the same
2 shut down the current server
3 Start the current server in stand-alone mode
4 temporarily saves the last insert in Oplog to another collection
5 Delete the current oplogdb.oplog.rs.drop ()
6 Create a new Oplog
7 write the last action record back to Oplog
8 Restart the current server as a replica set member
recovering from deferred backup nodes
Method 1: May cause this member to overload
1 Close all other Members
2 Delete all data in the other member data directory
3 Reboot all Members
Method 2: Causes each server to have the same size Oplog
1 Close all other Members
2 Delete all data in the other member data directory
3 Copy data files that delay the backup node to another server
4 Reboot all Members