Mongodb cluster node fault recovery scenario analysis

Source: Internet
Author: User
Tags mongodb sharding

A properly configured Mongodb sharding cluster does not have a single point of failure.
This article describes several potential node failure scenarios in a sharded cluster and how Mongodb handles these node faults.
1. Mongos node down
A Mongos process should run on each application server, which should exclusively occupy the Mongos process and communicate with the sharded cluster through it.
Mongos processes are not persistent. On the contrary, they collect all required configuration information from the Config Server at startup.
This indicates that the failure of any application server node does not affect the overall sharding cluster, and all other application servers will continue to work normally.
In this case, recovery is quite simple. We only need to start a new application server and a new Mongos process.
2. A Mongod node in the shard goes down.
Each shard consists of n servers, which are configured as a replica set ). If any node in the replica set goes down, read and write operations on the shard are allowed.
More importantly, data on the down server will not be lost, because the replication mechanism has an option that forces the copy write operation to be performed on other nodes of the shard and then returns the data, this is similar to setting write = 2 on Dynamo.
Replica sets are available in Versions later than MongoDB v1.6.
3. All Mongod nodes in the shard are down
If all nodes (replicas) in a shard are down, data in the shard cannot be accessed. However, the operation continues, but is shared by other parts. You can see why.
If a shard is configured as a replica set, At least one member should be in another data center. In this case, the entire Shard is down almost impossible. We recommend this configuration for greater redundancy.
4. A Config Server goes down
A product-level sharding cluster requires three Config Server processes, each of which runs on an independent machine. Write operations on the cluster metadata in Config server use a two-phase commit to ensure that it is an atomic and replicated transaction operation.
When any configuration server fails, the metadata of the Mongodb cluster will become read-only. The cluster system continues to run, but chunks cannot be split or migrated across shards in one shard. For most use cases, this will not cause problems. It is necessary to change the Chunk metadata not frequently.
In addition, it is important to make the down Config Server recover within a reasonable time period (one day), so as to avoid load imbalance due to the lack of migration (relatively speaking, for most product scenarios, this phenomenon is not very serious ).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.