MongoDB Replica Set

Source: Internet
Author: User
Tags failover mongodb mongodb support
MongoDB Replica set First, Introduction:

In simple terms, the replica set (Replica set) is the master-slave cluster with automatic fault recovery. The most obvious difference between a master-slave cluster and a replica set is that the replica set does not have a fixed "master": the entire cluster elects a "master node" and changes to other nodes when it is not working. However, the two look very similar: the replica set always has an active node (primary) and one or more backup nodes (secondary).

The most wonderful thing about a replica set is that everything is automated. First he did a lot of management for you, automatically raising backup nodes as active nodes, and secondly, for developers it is also very easy to use: Just specify the server for the replica set, and the driver will automatically find the server and automatically handle the recovery when the active node crashes.

When the active node is dead, the backup node automatically becomes an active node.

When the active node is restored, it automatically becomes a backup node because there are active nodes. nodes in the replica set

1. Standard: Regular nodes, which store a full copy of the data, participate in the voting, may become active nodes, priority is not 0.

2. Passive: A full copy of the data is stored, participating in the ballot and not being an active node. The priority is 0.

3. Arbiter: Arbitrators, who participate in voting only, do not accept replicated data or become active nodes.

If the active node is broken, the other nodes will pick up a new active node. The electoral process can be initiated by any inactive node. The new active node is generated by most of the elections in the replica set. The quorum node also participates in voting to avoid deadlock. The new active node will be the node with the highest priority, with the same priority being the new winner of the data.

The active node uses a heartbeat to track how many nodes in the cluster are visible to it. If not half, the active node is automatically demoted to the backup node. This prevents active nodes from always being decentralized.

Regardless of when the active node changes, the data for the new active node is assumed to be the most recent data for the system. The operations of the other nodes are rolled back, even if the previous active node resumes work. elections:

When the following scenarios occur, the replica set determines the primary node in the replica set through "election":

· When a replica set is initialized for the first time;

· Primary when steps down, this could happen because the Replsetstepdown command was executed, or if there were nodes in the cluster that were more suitable for primary, such as when the primary node and most of the other nodes in the cluster were unable to communicate, When Primarysteps down, it closes all client connections.

· A election occurs when a secondary node in a cluster cannot establish a connection with the primary node.

· One failover.

· Executes the rs.conf () command.

In an election, a node that includes a hidden node, an arbitrator, or even a recovering state, has "voting rights". In the default configuration, all participating nodes have equal rights, of course, in some specific cases, it should be clearly specified that some secondary will be given priority to primary, such as a distant distance from the remote room of the node should not become a primary node, The weight of the election by setting the priority to adjust, the default value is 1, in the previous simple replica set of the build has been described how to modify the value.

Any node in a cluster can veto an election, even if it is non-voting member:

· If the node initiating the election does not have the right to vote (priority 0 members);

· The number of nodes that initiated the election was too backward;

· The priority value of the node initiating the election is smaller than that of one of the other nodes in the cluster;

· If the current primary node has updated or equivalent new data (also "optime" values equal or greater) than the node that initiates the election.

· The current primary node will veto if it has a newer or identical new data than the node that initiated the election.

The first person to get the most votes (actually more than half) becomes the primary node, which also explains why when primary nodes are down in a cluster with two nodes, the rest can only be secondary, when primary is down, At this point the replica set is left with only one secondary, which is only 1 votes, not more than half the total number of nodes, and it will not elect itself primary. Read extensions:

readreferences:

Application driven through the read reference to set how to read the replica set, the default, the client-driven all read operations are direct access to the primary node, thus ensuring the strict consistency of the data.


But sometimes in order to alleviate the pressure of the main node, we may need to read directly from the secondary node, just to ensure that the final consistency is OK.


MongoDB 2.0 supports five kinds of readpreference modes:

primary: Default, only read from the master node;

primarypreferred: In most cases, the data is read from the master node, which is read from the secondary node only when the primary node is unavailable, such as in the failover 10 seconds or longer.

warning : Prior to version 2.2, MongoDB support for readpreference is not complete, and if the client-side driver takes primarypreferred, the read operation will actually be routed to the secondary node.

secondary: Only read from the secondary node, the problem is that the secondary node data than the primary node data "old."

secondarypreferred: The first reading operation from the secondary node;

nearest: It is possible to read from the primary and possibly from the secondary node, this decision is processed through a process called memberselection.

MongoDB allows these patterns to be specified on a variety of granularity: connections, databases, collections, and even single operations. The drivers of different languages basically support these granularity. Sync

The primary node data is fully synchronized when the node is first started. Replicating each document on the master node from the node is a resource consuming. After synchronization is complete,

Start querying the Oplog of the master node from the node and perform these operations to ensure that the data is up to date.

If the operation from the node is far from the main node, the node is not synchronized. The From node cannot keep up with the master node since it is not synchronized. because

All of the operations of the master node Oplog are too new. This occurs when a node is down or is struggling with reading, and after a full synchronization is performed

Reverse the same thing, because as long as the synchronization time is too long, the synchronization is complete, oplog may have been operating a lot.

When synchronizing from a node, replication stops and the node needs to redo the full synchronization. You can use the {"Resync": 1} command to perform a new synchronization manually,

You can also use the--autoresync option to automatically resynchronize when starting from a node, and the cost of resynchronization is high and should be avoided by configuring

Large enough oplog, large enough oplog can hold a considerable amount of time to operate records. Large Oplog will take up more disk space, you need to weigh it.

The default Oplog size is 5% of the remaining disk space.

Iii. Examples

Preparation:

Two servers:

192.168.229.80:80 Server as Node 1, directory below

/export/servers/dbs/node1/


192.168.192.75:75 Server as Node 2, directory below


Start node1 node:

-bash-3.2#./bin/mongod--dbpath/data/node1--logpath/data/node1/node1.log.--port 10001--replSet test/ 192.168.192.75:10002

Start node2 node:

[Root@localhost mongodb-linux-x86_64-2.0.4]#./bin/mongod--dbpath./data/node2/--port 10002--replSet test/

link to a server via 192.168.192.68:

./bin/mongo 192.168.229.80:10001/admin

Initialize replica set:

Db.runcommand ({

"Replsetinitiate": {

"_id": "Test",

"Members": [

{

"_id": 1,

"Host": "192.168.229.80:10001" ,

"Priority": 3

},

{

"_id": 2,

"Host": "192.168.192.75:10002",

"Priority": 2

}

]

}

});

The results are as follows: Prompt ok:1 is successful, there are several errors are disk full, clear after OK

To view replica set status:


To see If the master node: see Ismaster:true

Switch to 75 view: Ismaster:false

To insert data into the main library:


To view from the library:


Found that the data is not available, this is normal, because secondary is not allowed to read and write, in the application of less reading, the use of replica sets to achieve the separation of reading and writing. By specifying or specifying SLAVEOK at the time of the connection, the secondary is used to share the read pressure, and the primary only assumes the write operation.

The secondary node in the replica set is not readable by default, and the Setslaveok is set from the library and the main library, respectively.

Reference:

Http://www.cnblogs.com/refactor/archive/2012/08/13/2600140.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.