Mongodb distributed read/write operation tutorial

Last Update:2017-01-13 Source: Internet

Author: User

Tags mongodb

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Distributed read operations

This section describes how sharded clusters and replica sets affect read operation performance.

1.1 sharded cluster read operations

The sharded cluster allows data to be split into different mongod instances in the cluster, which is almost transparent to applications.

For sharded clusters, applications send out mongos instances related to the cluster.

The read operation is directly directed to a specific Shard, and the cluster read operation is the most effective.

The partition set should include the partition key of the set. When a query contains the shard key, mongos uses the cluster metadata from the config server to route the query to the Shard.

The query conditions include key parts. The query router mongos directly locates the corresponding shard.

If the query does not contain the shard key, mongos broadcasts the query to all shards in the cluster. This distributed aggregate query is very inefficient. For large clusters, this operation is terrible.

1.2 replica set read operations

The replica set uses the read-first mode to determine where and how to route read operations to members of the replica set.

By default, application read operations are performed on the primary of the replica set. Reading from primary ensures the latest version of the document, because the replica set is asynchronously replicated. Then, by allocating some or all of the read operations to the secondary node of the replica set, you can increase the read throughput or reduce the waiting time, which is not high for real-time applications.

You can modify the read priority mode to change the read feature.

Read operations can be performed in the following modes:

Read mode	Description
Primary	Default mode. All read operations are read from the primary node.
PrimaryPreferred	In most cases, the read operation reads data from the primary node, but if the primary node is unavailable, the read operation reads data from the secondary node.
Secondary	All read operations are read from the secondary node.
SecondaryPreferred	In most cases, the read operation reads data from the secondary node, but if the secondary node is unavailable, the read operation reads data from the primary node.
Nearest	Read operations are read from the node with the smallest network latency of the replica set, regardless of the node type.

Select different read operation modes based on application requirements.

Maximum consistency: In any case, avoid reading old data and use the primary mode. When there is no primary node, it occurs in the election phase, or most of the nodes are unavailable, it will block all read operations.

Maximum Availability: ensure read operations as much as possible. Use primaryPreferred mode. When there is a primary node, it will get consistent read. If not, you can still query the secondary node.

Reduce wait time: always read from low-latency nodes and use nearest. The driver and mongos read from the minimum latency. Nearest does not guarantee consistency. If the replication delay occurs, the query may return expired data. Nearest only reflects network latency and does not reflect I/O and CPU load.

Reading from the secondary node does not reflect the current state of primary. In asynchronous replication, the secondary node may lag behind the primary node for a certain period of time. In general, applications do not require such strict consistency and have the most high availability requirements.

II. Distributed write operations

1.1 sharded cluster write operations

For a sharded cluster Shard set, mongos specifies the write operation of the application to the specified shard. Mongos uses the cluster metadata from the config server to route write operations to appropriate shards.

Partition data based on the partition key. Then, MongoDB partitions these blocks to the partition. The partition key determines the distribution of the partition. This may affect the cluster write operation performance.

Note:

UPDATE operations that affect a single document must include the partition key or the _ id field.

The update operation of multiple documents is affected. In some cases, the update operation is more effective. If a partition key is available, all parts can be broadcast.

If the partition key value increases or each insert operation decreases, all the insert operations target a single partition. Therefore, a single shard capacity restricts the insertion capability of the Shard cluster.

1.2 replica set write operations

In a replica set, all write operations are performed on the primary node. All write operations are recorded in the oplog. Oplog is a reproducible sequence of a dataset. The secondary node constantly copies and applies the oplog. This process is asynchronous.

A large number of write operations, especially batch operations, may create a problem. It is difficult for secondary nodes to copy and apply oplog from the primary node, resulting in secondary nodes falling behind the primary node.

When secondary lags behind primary significantly, it is a problem for normal replica sets, especially in case of failover, data needs to be rolled back to achieve data consistency.

To avoid this problem, you can customize the write concern to confirm writing to another node. This provides an opportunity for secondary to catch up with primary. Write concern can slow down the overall progress of write operations, but it ensures that the current status of the secondary node is roughly the same as that of the primary node.

Set the write concern level to w: 2 to ensure that the data is written to the primary node and at least one secondary node.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More