MongoDB Replica set
First, Introduction:
In a nutshell, a replica set (Replica set) is a master-slave cluster with automatic recovery capabilities. The most obvious difference between a master-slave cluster and a replica set is that the replica set has no fixed "master node": The entire cluster elects a "master node" and changes to other nodes when it does not work. However, they look very similar: the replica set always has an active node (primary) and one or more backup nodes (secondary).
The most wonderful thing about a replica set is that everything is automated. First of all, he has done a lot of management work for you, automatically upgrade the backup node as an active node, and secondly, it is very easy for developers to use: just specify the server for the replica set, the driver will automatically find the server, in the current active node crashes automatically handle the failure of such things.
When the active node is hung, the backup node automatically becomes the active node.
When the active node is resumed, it automatically becomes the backup node because there are already active nodes. ii. nodes in a replica set
1. Standard: Regular node, storage of complete data copy, participate in voting, may become active node, priority is not 0.
2. Passive: A full copy of the data is stored, participating in voting and not being an active node. Priority is 0.
3. Arbiter: Arbitrators, only participate in the voting, do not accept the copied data, also cannot become an active node.
If the active node is broken, the other nodes will pick up a new active node. The election process can be initiated by any non-active node. The new active node is generated by most of the elections in the replica set. The quorum node will also participate in voting to avoid deadlock. The new active node will be the highest priority node, with the same priority then the data wins newer.
The active node uses the heartbeat to track how many nodes in the cluster are visible to it. If half is not enough, the active node is automatically demoted to the backup node. This prevents active nodes from ever being decentralized.
Regardless of when the active node changes, the data of the new active node is assumed to be the latest data of the system. The operations of the other nodes are rolled back, even if the previous active node resumes work. elections:
When the following scenarios occur, the replica set determines the primary node in the replica set through an "election":
· When initializing a replica set for the first time;
· Primary when steps down, this can happen because the Replsetstepdown command is executed, or a node in the cluster that is more appropriate for the primary, such as when the primary node and most of the other nodes in the cluster cannot communicate. When Primarysteps is down, it shuts down all client connections.
· When a secondary node in a cluster fails to establish a connection to the primary node, it can also cause a election to occur.
· One failover.
· Executes the rs.conf () command.
A node that includes hidden nodes, arbitrators, and even recovering states in an election has a "voting right". The default configuration of all the participating nodes have equal rights, of course, in some specific cases, it should be explicitly specified that some secondary will be preferred to become primary, such as a far away from the remote computer room node should not become primary node, The weight of the election is adjusted by setting the priority, which is 1 by default, and is described in how to modify the value in the previous simple replica set build.
Any node in the cluster can veto an election, even if it is non-voting member:
· If the elected node does not have the right to vote (priority 0 member);
· The data of the node that initiated the election is too far behind;
· The priority value of the node that initiates the election is smaller than one of the other nodes in the cluster;
· If the current primary node has newer or equivalent new data than the node that initiated the election (the "Optime" value is equal or greater).
· The current primary node will veto if it has an update or the same new data than the node that initiated the election.
The first to get the most votes members (actually more than half) will become the primary node, which also explains why when there are two nodes in the cluster primary node down, the rest can only become secondary, when primary down, At this point, only one secondary is left in the replica set, it has only 1 votes, not more than half the total number of nodes, and it will not elect itself as primary. Read extensions:
readreferences:
The application driver uses read reference to set how to read the replica set, and by default, all client-driven read operations have direct access to the primary node, ensuring strict data consistency.
But sometimes in order to relieve the pressure on the primary node, we may need to read directly from the secondary node, just to ensure eventual consistency.
After MongoDB 2.0, five types of readpreference modes are supported:
primary: By default, only the read operation is performed from the primary node;
primarypreferred: In most cases, the data is read from the master node, and the data is read from the secondary node only when the primary node is unavailable, for example, for a period of 10 seconds or longer for failover.
warning : MongoDB prior to version 2.2 is not fully supported for readpreference, and if the client driver takes primarypreferred, the actual read operation will be routed to the secondary node.
secondary: Only read from the secondary node, the problem is that the data of the secondary node is "old" than the primary node data.
secondarypreferred: read operation from secondary node is preferred;
nearest: It is possible to read from the primary and possibly from the secondary node, which is handled through a process called memberselection.
MongoDB allows these patterns to be specified on different granularity: connections, databases, collections, or even single operations. These granularity is largely supported by drivers in different languages. Sync
The primary node data is fully synchronized from the first start of the node. Copying each document on the master node from the node is resource intensive. After the synchronization is complete,
Start querying the Oplog of the master node from the node and perform these operations to ensure that the data is up to date.
If the operation from the node is very far from the master node, the slave node is not synchronized. The slave node cannot keep up with synchronization. The master node cannot be chased. because
All the operations of the master node Oplog are too new. This occurs when a node has been down or is struggling to read, and after the full synchronization has been performed
Reverse the same thing, because as long as the synchronization time is too long, when the synchronization is complete, oplog may have been operating a lot.
When the synchronization is not synchronized from the node, the replication stops and the node needs to redo the full synchronization. You can manually perform a new synchronization with {"Resync": 1} command.
You can also use the--autoresync option to automatically resynchronize when starting from a node, and the cost of resynchronization is high and should be avoided as much as possible by configuring
Large enough oplog, the Oplog large enough to hold the record for quite a long time. Large Oplog will take up more disk space, then you need to weigh it.
The default Oplog size is 5% of the remaining disk space.
Third, examples
Prepare:
Two servers:
192.168.229.80:80 Server as Node 1, directory as follows
/export/servers/dbs/node1/
192.168.192.75:75 Server as Node 2, directory as follows
Start node1 node:
-bash-3.2#./bin/mongod--dbpath./data/node1--logpath./data/node1/node1.log--port 10001--replSet test/ 192.168.192.75:10002 |
Start node2 node: :
[Root@localhost mongodb-linux-x86_64-2.0.4]#./bin/mongod--dbpath./data/node2/--port 10002--replSet test/ |
link to a server via 192.168.192.68:
./bin/mongo 192.168.229.80:10001/admin |
Initialize replica set:
Db.runcommand ({ "Replsetinitiate": { "_id": "Test", "Members": [ { "_id": 1, "Host": "192.168.229.80:10001" , "Priority": 3 }, { "_id": 2, "Host": "192.168.192.75:10002", "Priority": 2 } ] } }); |
The results are as follows: Prompt ok:1 is successful, there are several times the error is disk full, clear after OK
To view replica set status:
To see if the master node: see Ismaster:true
Switch to 75 view: Ismaster:false
To insert data into the main library:
To view from the Vault:
Found no data, this is normal, because secondary is not allowed to read and write, in the application of writing more than read, using replica sets to achieve read and write separation. By specifying Slaveok at the time of connection, or by specifying the secondary in the main library, the pressure of reading is shared by the primary and only the write operation is assumed.
The secondary node in the replica set is unreadable by default, from the library and the main library settings Setslaveok.
Reference:
Http://www.cnblogs.com/refactor/archive/2012/08/13/2600140.html