MongoDB Research Note: The working mechanism of a shard cluster

Source: Internet
Author: User

The above (http://www.cnblogs.com/guoyuanwei/p/3565088.html) describes the deployment of a default shard cluster, the MongoDB shard Cluster has a general understanding, So far we have not set up other databases on the cluster, MongoDB shards are based on collections (tables), so to Shard a collection, you must first make the database in which it is located supports sharding. How do I make a collection shard? How do I select a slice key for a shard? How does the balancer make chunks blocks migrate in slices? What about the read and write status of shards? These issues are discussed next.

Make a collection Shard

(1) Connect to the MONGOs instance in the cluster configured above

> MONGO--port 40009

(2) Create database eshop and collections in the cluster users

Mongos> Use eshop

Switched to DB eshop

Mongos> Db.users.insert ({userid:1,username: "Lili", City: "Beijing"})

At this point, there is only one record in the collections users:

{"_id": ObjectId ("521dcce715ce3967f964c00b"), "userid": 1, "username": "Lili", "City": "Beijing"}

Observing the status information of the cluster, the field databases adds a record, and the other fields are the same as the initialized cluster information:

Mongos> Sh.status ()

Databases

{"_id": "eshop", "partitioned": false, "PRIMARY": "Rs0"}

You can see that the database eshop does not yet support sharding, and all the non-fragmented collections in the database are saved in the slice Rs0, and by looking at the data files on the disk, the eshop.0, ESHOP.1, Eshop.ns three files and is located in the data directory corresponding to Rs0, the chunks collection in the cluster is empty because there are no shards for the collection users.

(3) sharding

      MongoDB shards are range-based, meaning that any document must be in a range of the specified slice key, and once the slice key is selected, Chunks will then follow the TAB key to logically group the documents together. Here the Users collection select the "City" field as the slice key to Shard, if now "City" field value has "Beijing", "Guangzhou", "Changsha", initially randomly inserted into the cluster containing the above field values of the document, Because the size of the chunks does not reach the default threshold of 64MB or 100,000 documents, there should be only one chunk in the cluster, and as the document continues to be inserted, chunk that exceed the threshold will be split into two chunks, and the final chunks and chip key distribution may be shown in the table below. The table simply describes the situation of the Shard in general and may actually vary, where- means all documents with a key value less than "Beijing", and for all documents with a key value greater than "Guangzhou". It is also important to emphasize that the document contained in chunks is not a physical inclusion, it is a logical inclusion, it only means that the document with the chip key falls in which range, and that the scope of the document corresponding to the chunk is located in which slices can be queried, Subsequent read and write operations are positioned in the specific collection on this slice.

Rs0

Start key value

End key value

Shard

-

Beijing

Rs0

Beijing

Changsha

rs1

Changsha

Guangzhou

Guangzhou

rs1

The following continues with the command to make the collection users Shard, so that the collection shard must first enable its database to support Sharding, as follows:

Mongos> sh.enablesharding ("eshop")//enable the database to support sharding

To shard a collection of existing data, you must first create an index on the selected slice key, and MongoDB automatically creates an index on the selected slice key if there is no data at the beginning of the collection.

Mongos> db.users.ensureIndex ({city:1})//Create a slice-key-based index

Mongos> sh.shardcollection ("Eshop.users", {city:1})//Make collection shards

After successfully executing the above command, review the cluster status information again:

Mongos> Sh.status ()

---sharding Status---

Sharding version: {

"_id": 1,

"Version": 3,

"Mincompatibleversion": 3,

"CurrentVersion": 4,

"Clusterid": ObjectId ("521B11E0A663075416070C04")

}

Shards:

{"_id": "Rs0", "host": "Rs0/guo:40000,guo:40001"}

{"_id": "Rs1", "host": "Rs1/guo:40003,guo:40004"}

Databases

{"_id": "admin", "partitioned": false, "PRIMARY": "Config"}

{"_id": "eshop", "partitioned": true, "PRIMARY": "Rs0"}//Database has shards supported

Eshop.users//Collection of shards

Shard Key: {"City": 1}//Tablet key

Chunks://All block info

Rs0 1//Current only 1 blocks on sheet RS0

{"City": {"$minKey": 1}}-->> {"City": {"$maxKe

Y ": 1}} on:rs0 {" T ": 1," I ": 0}//This block contains the key value range is- to , and on the slice Rs0, because at this time there is only one record in the collection, the block is not split, migrated

(4) Continue inserting data to make the collection auto-sharding

In order to observe that the set is divided into multiple chunk and distributed over multiple slices, it continues to insert some data for analysis.

> for (var i = 1; i<10000;i++) Db.users.insert ({userid:i,username: "Lili" +i,city: "Beijing"})

> for (var i = 0; i<10000;i++) db.users.insert ({userid:i,username: "xiaoming" +i,city: "Changsha"})

> for (var i = 0; i<10000;i++) db.users.insert ({userid:i,username: "Xiaoqiang" +i,city: "Guangzhou"})

After inserting the document through the three cycles above, the first chunk will be larger than 64MB, and the chunk segmentation and migration process occurs. Observing the status information of the cluster again, the field databases value becomes:

Databases

{"_id": "admin", "partitioned": false, "PRIMARY": "Config"}

{"_id": "eshop", "partitioned": true, "PRIMARY": "Rs0"}

Eshop.users

Shard Key: {"City": 1}

Chunks

Rs1 1

Rs0 2

{"City": {"$minKey": 1}}-->> {"City": "Beijing"

} on:rs1 {"T": 2, "I": 0}/Block interval

{' City ': ' Beijing '}-->> {' City ': ' Guangzhou '} on

: rs0 {"T": 2, "I": 1}/Block interval

{"City": "Guangzhou"}-->> {"City": {"$maxKey": 1

}} on:rs0 {"T": 1, "I": 4}/Block interval

Note that there are three blocks in the cluster, where there are two blocks on the slice Rs0, a block on the slice Rs1, and each block contains a range of documents for a certain interval. To get a clearer idea of how these blocks are split and migrated, you can view the record information in the Changelog collection for analysis.

The following steps can be seen from the command Db.changelog.find () output content:

The first step: split the block larger than 64MB, the original block of the slice key interval range is- to , after the partition changed to- to "Beijing", "Beijing" to two intervals.

The second step: as the document continues to be inserted, the interval "Beijing" to contains more than 64MB of the block, at which time the interval is divided into two intervals: "Beijing" to "Guangzhou", "Guangzhou" to .

The third step: after the above division, now equivalent to have three blocks, this step is to be the interval- to "Beijing" corresponding to the chunk from the slice Rs0 migrated to the slice rs1, the end result is the Shard Rs0 contains "Beijing" to "Guangzhou" "," Guangzhou "to a block of two intervals, the Shard Rs1 contains a block of bands- to" Beijing ".

The above loop inserts the document when also inserted the slice key value is "Changsha" the record, this slice key record should be in the interval "Beijing" to "Guangzhou" the corresponding chunk, only because the chunk size has not yet reached 64MB, therefore has not been split, If you continue inserting the document of this slice key, the interval may be divided into two interval blocks of "Beijing" to "Changsha", "Changsha" to "Guangzhou". And so on and so on, MongoDB is to realize the distributed storage of massive data, at the same time, because each slice is composed of duplicate sets, it guarantees the reliability of the data.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.