Part V Architecture Chapter 17th MongoDB sharding Architecture (Understanding block)

Source: Internet
Author: User
Tags mongodb sharding

1. How to create a block

In the previous understanding of the basic concepts of blocks in the MongoDB shard architecture, here's how to create blocks? When you decide to assign data, you must select a key for the block interval (we've been using username for the first time) This key is called the Slice key (Shard key), The slice key can be a combination of any field or field, as in the following document:

<pre name= "code" class= "java" ><pre name= "code" class= "java" >{"username": "Paul", "Age": 23}{"username": " Simon "," Age ": 17}{" username ":" widdly "," Age ": 16}{" username ":" Grill "," Age ": 95}{" username ":" Bertango "," Age ": 55}

If we select the age field as the slice key and get a block interval [20,26], this block contains the following documents:

{"username": "Paul", "Age": 23} {"username": "Grill", "Age": 95} {"username": "Bertango", "Age": 55}

As you can see, all the documents in this block have their age field values within the bounds of this block.

2, the collection of shards
For a set Shard, no matter what data in the collection MongoDB will only create a block, the block's interval (negative infinity, positive infinity), where the negative infinity is MongoDB can represent the minimum value (also called $minkey), positive infinity is the maximum value (also called $maxkey).

Note: If a collection of shards contains a large amount of data, MongoDB will immediately split the initialization into multiple smaller chunks.

In fact, because the set in the above example is too small to trigger a split, there is only one block (negative infinity, positive infinity) before inserting more data, although, for the purposes of the demo, we assume that the amount of data is large enough.

MongoDB splits the initialization block (negative infinity, positive infinity) into two new blocks, which are generally selected near the midpoint of the existing data interval, so if about half of the documents have an age field less than 20 and the other half is greater than 20. MongoDB is likely to choose 20 so that it gets two blocks: (negative infinity, 20] and [20, positive infinity), and if we continue to insert the data into the block [20, positive infinity], it will be split again (for example, split into [20,30] and [30, Infinity)], There are 3 blocks (negative infinity, 20), [20,30] and [30, positive infinity] in the collection, and MongoDB continues to split the existing blocks into more new chunks while inserting more data.

A block's interval can contain only one value (for example, a user named Paul only), but each block must have a different interval (there cannot be two blocks with ["A", "F"), and there can be no overlapping blocks, and each block must be adjacent to the interval of the next block. So if you want to split a block with [4,8], the result can be [4,6] good [6,8] (because the two together can cover the original block range), but not [4,5] and [6,8] (because such a collection will lose all the data in the interval [5,6)], it cannot be [4,6] and [5,8] (because the blocks overlap), each document must belong to and belong to only one block, such as demonstrating that a block is split into two blocks:


Description

Since MongoDB does not mandate any form of structure definition, you may wonder where the document that has no value can be used as the key to the tablet.

In fact, MongoDB does not allow inserting a document without a slice key (although it is also possible to use NULL as a chip key), and it does not allow you to modify the document's slice key values (such as with the $set command). The only way to give the document a new slice key is to delete the document first, then modify the value of the slice key on the client and reinsert the document.

What if you use strings in some documents and numbers in other documents? This is also possible, because there is a strict order between the types in MongoDB, and if you insert a string (or array, Boolean, NULL, etc.) into the age field, MongoDB sorts it by type, with the following types of precedence:

null< Numbers < Strings < objects < arrays < binary data <objectId< Boolean < date < regular expressions.

Within the same type, the sort is likely to be the same as you expect: 2<4 or a<z, each block is hundreds of GB in size, but in real systems, the block size defaults to only 200MB, because the cost of moving data is very large, takes a long time, consumes system resources , and it will obviously increase network traffic, you can try it yourself, insert 200MB data into a collection, then try to retrieve all 200MB data, and then imagine doing the same thing on a system that is building multiple indexes, and there are other data traffic on the system, You're not going to want to watch the app slow down performance until it stops, and MongoDB is dragging the data in the background, in fact, if a block is too big, MongoDB doesn't move it at all, and in turn you don't want the block to be too small, because each block requires a little bit of management overhead ( So that you don't have to worry about tracking countless chunks of data, we've found that 200MB happens to be the best choice for both mobility and minimal overhead.


Summarize:

A block is a logical concept, not a physical one, where documents in a block are physically not stored on disk or clustered together in any form, they may be scattered around the entire set, and a document belongs to a block when and only if its slice key value is within the corresponding block interval.

Part V Architecture Chapter 17th MongoDB sharding Architecture (Understanding block)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.