Part V Architecture Chapter 18th MongoDB sharding Architecture (Balance)

Source: Internet
Author: User
Tags mongodb sharding

1. Balance Profile

If there are multiple shards available, MongoDB will migrate the data to other shards as long as the number of blocks is large enough, and this migration process is called balancing (balancing), which is performed by a process called a balancer (balancer).

2. Balancing Work Flow

The advantage of being able to move data blocks from one shard to another is that you don't have to worry about how to keep the data evenly distributed between shards, and this work has been done by the balancer for you, but it's also a disadvantage because it automatically means that if you don't like the way the tower does load balancing, you're unlucky. , if you do not want a block to exist on Shard 3, you can manually move to Shard 2, but the balancer will probably take it back to Shard 3, you can only choose to either re-shard the collection (Re-shard), or turn off the balance.

Before that, the equalizer's algorithm was not very smart, it moved the block every day based on the overall size of the Shard, and it will become more advanced in the near future.

The goal of the balancer is not only to keep the data evenly distributed, but also to minimize the amount of data being moved, so the trigger balancer requires a lot of conditions, to trigger a balance, a shard must have at least 9 blocks more than the smallest shard of the block, and then the block will be migrated out of the congested shard until it is balanced with the other shards.

The reason the balancer is not very aggressive is that MongoDB wants to avoid moving the same data back and forth, and if the balancer is to be balanced without a small difference, it is likely to waste resources: Shard 1 score 2 more than two blocks, it sends a block to Shard 2, and then some write operations fall on the Shard 2, So that the Shard 2 is more than the Shard 1 more than two blocks, the result of the same piece of data is tossing back, by waiting for more serious imbalance occurs, MongoDB can minimize meaningless data transmission, to know that the difference of 9 blocks is actually not so unbalanced, because this is less than 2GB data?


Description

If, as shown, the slight imbalance of each point is corrected, it will eventually result in a large amount of unnecessary data movement.


3. Skills

Everyone wants to prove to themselves that sharding works by seeing data movement, which creates a problem: the amount of data needed to trigger a balance is much larger than most people think.

For example, you are trying to shard, so you write a command line script to insert 500,000 documents into the Shard collection:

for (i=0;i<500000;i++) {Db.foo.insert ({"_id": I, "x": 1, "y:2"});
When the insert is complete, I should be able to see some data flying around, right? Wrong, if I take a look at the database state, it will happen still far? These data about 40MB, which is not enough a block, not even enough a block of 1/4, the front of a block is the default 200MB, really want to see data movement, you need to insert 2GB data, that is, 25 million copies of such documents, or now inserted data 50 times times.

When you start using shards, people want to see the data moving around, which is the nature of the human being, however, in a production system you don't want to have too many migrations, because this is a very expensive operation, so on the one hand we want to see the migration actually happening, On the other hand the fact is that if it does not seem to be slow to make people irritable can not work very well, for this contradiction, MongoDB needs to implement two techniques, so that the Shard more understanding, but also not the production of hope to cause the impact of fragmentation.

    • Custom Block Size
The first time you start MONGOs, you can declare the--chunksize n parameter, where n is the block size you want, the unit is MB, if you just want to try the Shard, you can set--chunksize 1, so as long as you insert a few megabytes of data to see the migration occurred.
    • Increment block Size
Even if the deployment of a real application, to reach the 2GB is still in sight, so for the first more than 10 blocks, MongoDB will deliberately automatically reduce the block size, from 200MB to 64MB, and this is to better take care of the user's feelings, once the data block more up, It automatically increments the block size back to 200MB.
    • Change block size
The block size can be changed by specifying the--chunksize n parameter at startup or by modifying the config.settings set merge integrity reboot, however, unless you try the 1MB block size for fun, do not start changing its size.

Part V Architecture Chapter 18th MongoDB sharding Architecture (Balance)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.