MongoDB Data distribution

Source: Internet
Author: User

In MongoDB (version 3.2.9), the distribution of data refers to splitting the collection data into chunks (chunk) and distributing them on different shards (shard). There are 2 main ways to distribute data: Balanced distribution based on blocks (chunk) and directed distribution based on the Slice key range (range). The MongoDB built-in equalizer (balancer) is used to split blocks and move blocks, automatically achieving uniform distribution of data blocks on different shard. Balancer only guarantees that the number of chunk on each shard is roughly the same and does not guarantee that the number of doc on each shard is approximately the same. The default size of the block is 64MB.

First, the data is distributed evenly according to Chunk

The balanced distribution is automatically implemented by MongoDB, which makes the database schema transparent to application and simplifies the management of the system, making it easy to add and subtract shards into the Shard cluster. The balanced distribution is implemented by the MongoDB built-in equalizer (balancer), which balancer the data distribution according to the collection's indexed field called the Slice key (sharded key). There are generally three types of tablet keys: Ascending tab key, Random slice key, and group-based slice key.

For example: A MongoDB shard cluster has 3 shard, respectively, Shard1,shar2,shard3. The minimum value of the slice key is: $MinKey, the maximum value is: $MaxKey. The chunk containing the end value $minkey is the smallest block, and the chunk that contains the end value $maxkey is the largest block.

1, Ascending Tab key

Ascending slice keys resemble a date field or a _id field, which is a field that grows steadily over time. If the Shard field is a _id field, there are 10 doc in the set Foo, and one data block in each shard: Chunk1: $MinKey -3,check2:4-8,check3:9-$MaxKey.

The disadvantage of using the Ascending tab key is that each time a new doc is inserted, it is inserted into the largest chunk, which causes all write requests to be routed to the same shard, causing the largest blocks to grow, constantly being split, and being moved to other shards, resulting in unbalanced data writes. Block movement will increase the amount of disk write extra. The advantage of using the Ascending tab key is that the performance is high when the range is read according to the TAB key.

2, Random Tab key

The Random tab key is the value of the key is not fixed growth, but some irregular key values. Because the write data is distributed randomly, the shards grow at roughly the same speed, reducing the number of chunk migrations. The disadvantage of using random shards is that range queries are slow.

3, group-based Chip keys

A grouping-based slice key is a composite slice key for two fields, the first field is used for grouping, the field's potential is preferably lower, the potential is the number of different values in the same field (distinct value) or the proportion of the field, and the second field is used for self-increment, preferably a self-increment field. This chip key strategy is the best, can realize the multi-hotspot data read and write.

A single mongod is most efficient when processing an ascending write request, and the data needs to be written only to the end of the collection. Based on the grouping of the chip key, the number of packets distributed in the Shard cluster, each shard only a small amount of chunk, so that the data can be written to the distribution of the partition on each shard in the cluster, on a single Shard, in ascending mode to read and write data. There are too many groups on a shard, writing requests is tantamount to random writing, but not good.

Second, directed distribution according to the chip key range

If you want a specific range of chunk to be distributed to a particular shard, you can add a tag to the Shard, and then specify the corresponding slice key range for the tag, so that if a doc is part of the tag's slice key range, it will be directed to a specific shard.

1, specify tag for Shard

Sh.addshardtag ("Shar1", "Shard_tag1"), Sh.addshardtag ("Shar2", "Shard_tag2"); Sh.addshardtag (" Shar3 "," Shard_tag2 ");

2, specify the chip key range for tag

Sh.addtagrange (
"Db_name.collection_name", {field:"Min_value"}, {field:"Max_value"}, " Shard_tag ")

Each shard tag can use any number of Tag,mongodb's equalizer to move the block, moving the chunk of a particular slice key range to a specific shard.
Third, manual distribution of data

MongoDB built-in equalizer (balancer), automatic data block splitting and movement, sometimes you can turn off balancer, using the Movechunk command to manually move the data block.

1, Close balancer

Connect to a MONGOs, update the Config.setting namespace

Use Config
Db.setting.update ({"_id": "Balancer"},{"Enabled":false},true)


Sh.setbalancerstate (FALSE);

2, split block
A split block is a new boundary point that splits a chunk into two chunk at the boundary point. In MongoDB, the slice keys are sorted from small to large, and the boundary values belong to the chunk on the right.

Sh.splitat ("Db_name.collection_name", {sharded_filed: "New_boundary_value"})

3, moving the block
MongoDB moves the chunk that contains the specified document to the specified shard, and you must use the slice key to find the chunk you want.

Sh.movechunk ("Db_name.collection_name", {sharded_filed: "Value_in_chunk"}, "New_shard_name")

4, Enable Balancer

Sh.setbalancerstate (true)

5, refresh the MONGOs cache

Between the application layer and the data store, there is a query Router that Mongos,mongos synchronizes the configuration data from config server and caches it in MONGOs after the first boot or the metadata of the Shard is updated. Sometimes, MONGOs cannot synchronize the latest configuration information on the config server in a timely manner, resulting in the inability to route to the appropriate chunk, returning the correct data, and using the Flushrouterconfig command to manually refresh the MONGOs cache

Db.admincommand ({"Flushrouterconfig": 1})

Reference Documentation:

Sharding

MongoDB Data distribution

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.