Mongodb distributed read/write operation tutorial

Source: Internet
Author: User
Tags mongodb

1. Distributed read operations

This section describes how sharded clusters and replica sets affect read operation performance.

1.1 sharded cluster read operations

The sharded cluster allows data to be split into different mongod instances in the cluster, which is almost transparent to applications.

For sharded clusters, applications send out mongos instances related to the cluster.

 

 

The read operation is directly directed to a specific Shard, and the cluster read operation is the most effective.

The partition set should include the partition key of the set. When a query contains the shard key, mongos uses the cluster metadata from the config server to route the query to the Shard.

The query conditions include key parts. The query router mongos directly locates the corresponding shard.

 

 

If the query does not contain the shard key, mongos broadcasts the query to all shards in the cluster. This distributed aggregate query is very inefficient. For large clusters, this operation is terrible.

 

 

1.2 replica set read operations

The replica set uses the read-first mode to determine where and how to route read operations to members of the replica set.

By default, application read operations are performed on the primary of the replica set. Reading from primary ensures the latest version of the document, because the replica set is asynchronously replicated. Then, by allocating some or all of the read operations to the secondary node of the replica set, you can increase the read throughput or reduce the waiting time, which is not high for real-time applications.

You can modify the read priority mode to change the read feature.

Read operations can be performed in the following modes:

Read operations can be performed in the following modes:

Read mode Description
Primary Default mode. All read operations are read from the primary node.
PrimaryPreferred In most cases, the read operation reads data from the primary node, but if the primary node is unavailable, the read operation reads data from the secondary node.
Secondary All read operations are read from the secondary node.
SecondaryPreferred In most cases, the read operation reads data from the secondary node, but if the secondary node is unavailable, the read operation reads data from the primary node.
Nearest Read operations are read from the node with the smallest network latency of the replica set, regardless of the node type.

Select different read operation modes based on application requirements.

Maximum consistency: In any case, avoid reading old data and use the primary mode. When there is no primary node, it occurs in the election phase, or most of the nodes are unavailable, it will block all read operations.

Maximum Availability: ensure read operations as much as possible. Use primaryPreferred mode. When there is a primary node, it will get consistent read. If not, you can still query the secondary node.

Reduce wait time: always read from low-latency nodes and use nearest. The driver and mongos read from the minimum latency. Nearest does not guarantee consistency. If the replication delay occurs, the query may return expired data. Nearest only reflects network latency and does not reflect I/O and CPU load.

 

 

Reading from the secondary node does not reflect the current state of primary. In asynchronous replication, the secondary node may lag behind the primary node for a certain period of time. In general, applications do not require such strict consistency and have the most high availability requirements.

II. Distributed write operations

1.1 sharded cluster write operations

For a sharded cluster Shard set, mongos specifies the write operation of the application to the specified shard. Mongos uses the cluster metadata from the config server to route write operations to appropriate shards.

 

 

Partition data based on the partition key. Then, MongoDB partitions these blocks to the partition. The partition key determines the distribution of the partition. This may affect the cluster write operation performance.

 

 

Note:

UPDATE operations that affect a single document must include the partition key or the _ id field.

The update operation of multiple documents is affected. In some cases, the update operation is more effective. If a partition key is available, all parts can be broadcast.

If the partition key value increases or each insert operation decreases, all the insert operations target a single partition. Therefore, a single shard capacity restricts the insertion capability of the Shard cluster.

1.2 replica set write operations

In a replica set, all write operations are performed on the primary node. All write operations are recorded in the oplog. Oplog is a reproducible sequence of a dataset. The secondary node constantly copies and applies the oplog. This process is asynchronous.

 

 

A large number of write operations, especially batch operations, may create a problem. It is difficult for secondary nodes to copy and apply oplog from the primary node, resulting in secondary nodes falling behind the primary node.

When secondary lags behind primary significantly, it is a problem for normal replica sets, especially in case of failover, data needs to be rolled back to achieve data consistency.

To avoid this problem, you can customize the write concern to confirm writing to another node. This provides an opportunity for secondary to catch up with primary. Write concern can slow down the overall progress of write operations, but it ensures that the current status of the secondary node is roughly the same as that of the primary node.

 

 

Set the write concern level to w: 2 to ensure that the data is written to the primary node and at least one secondary node.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.