About the failover cluster of MongoDB Replica set--Actual combat

Source: Internet
Author: User
Tags failover readable

If you don't understand the theory of replica set, poke the portal to read the author's last blog post.

Since replica set is already part of MongoDB's advanced application, the basic knowledge of MongoDB below is not covered by the author, please refer to MongoDB Manual.

Here are a variety of scenarios that tell you how to create a replica Set.

Standalone to replica Set

This is a relatively simple situation. If you have just applied MongoDB in a production environment, it is likely that this scenario will apply.

A stand-alone MongoDB instance becomes the first member of the replica set is easy and requires two steps:

1. Add--replicaset <set name> to the operating parameters. Such as:

Mongod--dbpath/var/lib/mongo/--replicaset Rs0--fork

If authentication mode is turned on, an additional step is required: Configure a key for the instance as a basis for future authentication between different instances.

The principle of key is very simple, as long as the different instances have the same key, the authentication is successful. There is no public private key exchange and so on the troublesome process.

Generating a key is also simple, and any file that has a string can be a key. Just make sure that different instances use the same key file.

To add a key, use the parameter--keyfile=<file path>. Please refer to the official documentation for key generation.

2. Go to the MongoDB command line for initialization, as follows:

$ MONGO Localhost:27011mongodb Shell version:2.4.9connecting to:localhost:27011/test> rs.initiate () {    "Info2": "No configuration explicitly specified--making one",    "Me": "yx-arch:27011",    "info": "Config now saved locally."  should come online in about a minute. ",    " OK ": 1}

Wait a moment, the cluster initialization is complete, the Enter after the prompt from ">" Into

Indicates that the instance has become a primary node. At this point, there is only one node in the cluster, there is no voting problem, so no matter how this node will be kept primary state. But be careful in the next step.

Rs0:primary> Rs.add ("yx-arch:27012") {"OK": 1}

At this point the second node yx-arch:27012 is joined to the cluster, you can view the cluster status using the following command:

 rs0:primary> Rs.status () {"Set": "Rs0", "date": Isodate ("2014-01-24t07:21:01z"), "MyState": 1, "mem Bers ": [{" _id ": 0," name ":" yx-arch:27011 "," Health ": 1," state "            : 1, "Statestr": "PRIMARY", "uptime": 353, "Optime": Timestamp (1390548012, 1),            "Optimedate": Isodate ("2014-01-24t07:20:12z"), "Self": true}, {"_id": 1,            "Name": "yx-arch:27012", "Health": 1, "state": 2, "statestr": "Secondary", "Uptime": "Optime": Timestamp (1390548012, 1), "Optimedate": Isodate ("2014-01-24t07:20 : 12Z ")," Lastheartbeat ": Isodate (" 2014-01-24t07:21:00z ")," Lastheartbeatrecv ": Isodate (" 2014-01-24 t07:21:00z ")," Pingms ": 0," syncingto ":" yx-arch:27011 "}]," OK ": 1} 

Visible 27011 At this time is primary identity, and 27012 is secondary identity. If you want to change the role of a node, you need to modify the configuration of the cluster. First, the cluster configuration is paged out:

Rs0:primary> conf = rs.conf () {    "_id": "Rs0",    "version": 2,    "members": [        {            "_id": 0,            "host" : "yx-arch:27011"        },        {            "_id": 1,            "host": "yx-arch:27012"        }    ]}

The previous article says that each cluster node has a priority attribute, which defaults to 1. To change the role of an instance, we only need to set a priority for the new instance that is higher than the current primary can be implemented.

Note Because only the primary nodes are writable, the reconfiguration of the cluster can only be done on the primary node.

rs0:primary> conf.members[1].priority = 22rs0:primary> rs.reconfig (conf) Fri Jan 15:25:36.069 DBClientCursor:: Init call () Failedfri Jan 15:25:36.070 trying reconnect to Localhost:27011fri Jan 15:25:36.070 Reconnect Localhost:2 7011 okreconnected to server after RS command (which is normal) rs0:secondary>

Visible after a short strike the command line automatically re-connects to the current instance, but as you can see from the prompt, the current instance has become the secondary role. Let's take a stroll around the contents of the secondary (test is the library I created myself)

Rs0:secondary> show Dbstest    0.203125GBlocal    2.0771484375gbrs0:secondary> use testswitched to DB Testrs0 :secondary> Show Collectionsfri Jan 15:28:32.697 error: {"$err": "Not Master and Slaveok=false", "Code": 13435} At src/mongo/shell/query.js:128

If the reader of the master slave cluster should have discovered that the slave node is readable, the secondary node is not readable by default. To solve this problem, you need a simple step:

Rs0:secondary> Rs.slaveok () rs0:secondary> show collectionssystem.indexestest

At this point a 2-instance MongoDB Replica set is completed. So let's try the legendary failure recovery feature.

Suppose the primary node stops working (we kill it manually)

$ ps AWX | grep mongo12546?        Sl     0:08 mongod--config mongodb1.conf14201?        Sl     0:07 mongod--config mongodb2.conf  <--primary in this 20258 pts/2    sl+    0:00 MONGO localhost:2701224186 pts/0    s+     0:00 grep--color=auto mongo$ kill 14201

In the beautiful fantasy, secondary should be ascended to become the new primary right away?

God horse? Or a secondary? What about the failure recovery?

Maybe the wrong posture? So let's restart two instances, and this time we'll try to get rid of secondary.

$ ps AWX | grep mongo12546?        Sl     0:10 mongod--config mongodb1.conf  <--Secondary in this 20258 pts/2    sl+    0:00 MONGO localhost : 2701227024?        Sl     0:00 mongod--config mongodb2.conf27413 pts/0    s+     0:00 grep--color=auto mongo$ kill 12546

God horse? Primary downgraded to secondary? If you do not read the reader in the heart should secretly dozens it? "Beast, what the hell is this?" Recovery cluster? Obviously, two of them are dead! “

So we're going to try to kill that secondary resurrection? You'll find primary back again. The previous article said, ascension to primary is a very strict process, once there are 2 primary cases, the cluster is obsolete. So it is better to kill 1000 by mistake.

So what actually happens is this: when the only 2 instances in the cluster are hung up, the remaining one does not determine its survival, because it can also be caused by disconnecting itself from the network. Another instance might be alive and well. So there is no way, temporarily let oneself become secondary live.

Once another instance recovers, two instances can prove each other's survival, so the higher priority continues to play primary.

In order to avoid this kind of tragedy, the necessity of arbiter is manifested.

Naturally you can add a secondary in the cluster, let three instances prove to each other, so that not only the problem can be solved, automatic switching can also be achieved. But to become a secondary server but the performance has certain requirements, in reality this may not be a high price. Then add a arbiter, it exists just to vote, not to occupy additional resources, in the limited resources of the virtual machine is enough. If you have multiple arbiter for different cluster services, you can also put them into the same virtual machine to conserve resources.

Rs0:primary> Rs.addarb ("yx-arch:27013") {"OK": 1}

Incidentally, just now our operation has been using my machine name "Yx-arch" instead of localhost. LocalHost is forbidden here because of this:

If you view the cluster information at the command line

Rs0:primary> Rs.status () {"Set": "Rs0", "date": Isodate ("2014-01-24t08:08:32z"), "MyState": 1, "members"            : [{"_id": 0, "name": "yx-arch:27011", "Health": 1, "state": 2,            "Statestr": "Secondary", "uptime": 1010, "Optime": Timestamp (1390550842, 1),            "Optimedate": Isodate ("2014-01-24t08:07:22z"), "Lastheartbeat": Isodate ("2014-01-24t08:08:30z"),        "Lastheartbeatrecv": Isodate ("2014-01-24t08:08:32z"), "Pingms": 0, "syncingto": "yx-arch:27012" }, {"_id": 1, "name": "yx-arch:27012", "Health": 1, "state"            : 1, "Statestr": "PRIMARY", "uptime": 1758, "Optime": Timestamp (1390550842, 1),            "Optimedate": Isodate ("2014-01-24t08:07:22z"), "Self": true}, {"_id": 2, "Name": "Yx-arch:27013 "," Health ": 1," state ": 7," STATESTR ":" Arbiter "," uptime ": 70, "Lastheartbeat": Isodate ("2014-01-24t08:08:31z"), "Lastheartbeatrecv": Isodate ("2014-01-24t08:08:32z ")," Pingms ": 0}]," OK ": 1}

We can see that there are three instances in the cluster, namely

yx-arch:27011yx-arch:27012yx-arch:27013

When using a language such as C # to connect to MongoDB, the information that is available from the server is not from the connection string, but driver will get a similar message to determine which servers are available. As can be seen, if the address added to the cluster is localhost:27011, then driver will also consider localhost:27011 to be an available instance. But in most cases the application and the database are separate, which can cause the application to find MongoDB service in vain from Port 27011 on its own machine, and it is obviously impossible to succeed. About getting the available instances, I wrote a log before, please poke the portal.

The above has already established a complete cluster, hope everybody uses happily. But as an IT person who has been on the go for years, I never dared to use an know everything system on the production environment without considering the posterior situation. It is common sense to consider a retreat before stationing, so consider how to return the standalone scene from replica set in case of an event.

Replica Set to Standalone

Considering the above 3 instances of the replica Set, although it can be in any one instance when the situation will remain normal operation, if 2 instances at the same time off? What, can't it be so bad? The experience of our predecessors tells us that in the world of it, anything that can happen is bound to happen.

So how can you not be beaten up with a black and black when the disaster comes, if you're empty-handed? Well, the guy's doing it.

No matter primary or secondary, as long as remove--replicaset can immediately change back to standalone state. If you want to completely return to the standalone state, you should also delete the local.* file under the database directory (note that the delete operation must be done in the case of MongoDB stopping the service).

If you do not delete these files, MongoDB will work correctly, but the TTL collection will not work properly. In addition, the author has not found any problems.

The reality is that in most cases we would not want to change from replica set back to standalone. If you do this, there's probably only one reason: stepping on one of the mines above causes the cluster to become completely read-only. You expect to temporarily revert back to the standalone state so that it does not affect the daily operations of the production environment, such as when other instances are restored and the--replicaset parameter reverts back to the replica set state. Unfortunately, if that's what you're doing, you're going to have to step on the second Ray right away.

We know that the principle of replica set is to transmit oplog and do it, and Oplog is generated only in Master/slave or replica set mode. So everything you do when you go back to standalone mode is not oplog, which means that when you go back to replica set mode, you lose some of the data.

More irritating: MongoDB will not prevent you from doing such a thing, it will let you succeed back to replica set state. Both primary and secondary are available. You can even modify its contents when secondary becomes standalone and then let it go back to Replicaset, so its content is inconsistent with other instances!

After knowing the pit, take a look at the simple and crude solution:

1. Stop the MongoDB instance

$ sudo systemctl stop MongoDB

2. Go to database folder Delete local.*

$ cd/var/lib/mongodb/$ RM local.*-RF

3. Re-add the--replicaset parameter back to the configuration file

4. Reboot to start MongoDB

sudo systemctl start MongoDB

At this point MongoDB changes back to the normal non-replica set instance

5. Reinitialize the replica set and set up the cluster as described above.

Although complex point, but this is no way to do things, do not take shortcuts.

Master/slave to Replica Set

Master/slave to replica set is no different from standalone to replica set, and all steps are identical. One thing to note is that once master becomes primary,slave it will no longer synchronize data from it. In other words, from execution

> Rs.initiate ()

Moment, slave becomes a snapshot of the database and is no longer updated. So when applying this operation in a production environment, be aware of the possible hazards of data synchronization, and must be maintained when necessary. When you encounter this type of problem:

Scenario One: You can use the original slave application to switch to use master, the advantage is that the whole service can be uninterrupted. The downside is that your master performance is required to be good enough.

Scenario Two: If you want to use scenario one but the performance of Master is not good enough, you can use a new powerful enough machine first as slave, and then all the system downtime maintenance, then the slave into master, all services use the new master to continue to work. The conversion from this master to the replica set is then started. The advantage is that the downtime can be relatively short. Disadvantages ... The service was interrupted ... In addition, all systems point to the new master when the scale of the system may be modified more and more easily overlooked.

Scenario Three: Complete shutdown for Master/slave to replica set conversion. All secondary complete the replication before starting the service again. This is, of course, the most relaxed solution, and the natural downtime is also the longest.

Postscript

MongoDB, as a new non-relational database, has developed rapidly in recent years. New features abound. The simplest way to understand every aspect of it is to read its user manual. Although it is a laborious thing, but there is no way to do things, as mentioned above, want to take a shortcut to the result is often fall into the pit.

I hope that the author of the History of blood and blood for the small partners to provide a precedent, this article if there is not the correct place to welcome correct.

About the failover cluster of MongoDB Replica set--Actual combat

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.