[Mongodb translation] partitioning and Failover

Source: Internet
Author: User
Tags failover mongodb sharding

A configured MongoDB sharding cluster does not have single point of failure.

This section describes possible faults in cluster servers and corresponding countermeasures.

1. A mongos routing process fault
Each mongos runs on each application server. The application server can only communicate with the cluster through this mongos process. The mongos process is not fixed; on the contrary, they get the necessary configuration information from the configuration server at startup.
This means that the failure of any application server will not affect the entire cluster, and other application servers can provide services as usual. Recovery is also very simple. You only need to restart a new application server and mongos process.

2. A mongod server fault in a shard

Each Shard is composed of a replication group containing N servers. If any server in any replication group fails, the shard still allows read and write operations. Furthermore, a server failure will not cause data loss, because the replication group provides an option to forcibly synchronize the write operation to the slave node before the write operation is returned. This is similar to setting W to 2 in Amazon's dynamo.

3. All mongod servers in a sahrd are faulty.

If all the servers in a shard replication group fail, the data on the shard will be inaccessible. At this time, you can continue to work when other shard operations occur.

If a shard is configured as a replication group and at least one node is placed in another data center, the failure probability of the entire Shard is very low, this is also recommended in maximizing redundancy.

4. One configuration server failure

A production environment configuration server may have three configuration servers, each running on an independent machine. The write-to-configuration server operation uses a two-phase commit mechanism to ensure atomicity and replication transaction of shard cluster metadata.

If any configuration server fails, the metadata of the entire system will become read-only. The entire system can continue to provide services, but data blocks in one shard will not be split or migrated to other shard. For most applications, this will only bring about a small number of problems, because the metadata of data blocks is rarely changed.

That is to say, it is very important to restart the server with downtime configuration within a reasonable period of time, so that shards will not become unbalanced due to lack of data migration (again, for most production scenarios, this may not be a very urgent issue ).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.