Mongodb sharding principle learning and trial (3)

Source: Internet
Author: User
Tags mongodb sharding

1. When to enable multipart?

A: Although parts are powerful, they also require more hardware and more complex configurations. The idea of using sharding should be that you would rather not use it. However, you must be prepared in advance. Don't start to think about it when it's too late, because there is not much time to think about it at that time. If the application still needs to use parts at the end of the process, you should first consider which sets will use parts. And design the chip key.

When one of the following three conditions is met, you can also consider using sharding:

(1) The dataset size is close to the storage capacity of a single node.

(2) The active data volume is close to the node's maximum memory capacity.

(3) The Write Request speed of a node cannot meet the requirements. (Read/write splitting or replicSet mode can be used when the Read Request speed cannot meet the requirements)

2. Standard Configuration:

Three configuration service instances, each of which is a replicSets and at least one routing instance. Routing instances that do not occupy resources can be deployed on the same machine as the storage data node.

3. The default block size for data sharding is 64 MB.

4. When will the data be migrated (only the data volume is considered)

A: (1) Before Version 2.2, when the block size of the node with the maximum data volume is different from that of the node with the minimum data volume by eight blocks, the migration will take place.

(2) The Migration threshold is introduced after version 2.2. The migration threshold value is also used to describe the difference between data blocks of different nodes, but it will increase with the increase of the data block size of the node.

When the number of data blocks on a node does not exceed 20 (the sum of data blocks on all nodes), the Migration threshold is 2. That is, when the difference between the maximum number of data blocks and the minimum number of data blocks reaches two, data migration occurs.

When the data block of a node is between 20 and 80, the Migration threshold is 4. Similarly.

When the node's data block exceeds 80, the Migration threshold is 8. Similarly.

(3) When data migration starts, it will not stop until the difference of data blocks between any two nodes does not exceed 2.

5. Chip key

(1) The partition key is a column of the set. This column must exist for each record. Otherwise, a record that does not have a partition key column cannot be allocated.

(2) The partition key can be a single column or multiple columns.

(3) data distribution is based on the scope of the chip key. If a timestamp column is used as the partition key, insertion is always performed on the last data node.

(4) The key cannot be changed once selected. If no partition key is specified, the system uses ObjectID as the partition key by default.

6. How to Select a key

(1) slice keys should be conducive to chunking. The disk key range of each block must be different. Imagine that there are only a few possible values for the partition key. The number of segments that can be divided is at most the same as the number of possible values of the partition key. In this way, a block cannot be segmented when the data volume of a block exceeds the default block size of 64 MB. This affects the read speed.

(2) The chip key should facilitate distributed write requests. Write operations cannot be performed on only one node due to partition key distribution. Imagine that if ObjectID is used as the partition key and ObjectID is incremental by time, the ObjectID of the newly inserted data must be the largest, and therefore will be allocated to the same node. Insert operations are performed on a node. This node may become a bottleneck. If most write operations are update, the performance will be less affected.

(3) The sharding key should facilitate data query. If the query condition contains the partition key, the route can forward the query to the node that matches the partition Key Distribution and then perform data query. If the query condition does not contain the partition key, the route must forward the request to all nodes for data query.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.