The above (http://www.cnblogs.com/guoyuanwei/p/3565088.html) describes the deployment of a default shard cluster, the MongoDB shard Cluster has a general understanding, So far we have not set up other databases on the cluster, MongoDB shards are based on collections (tables), so to Shard a collection, you must first make the database in which it is located supports sharding. How do I make a collection shard? How do I select a slice key for a shard? How does the balancer make chunks blocks migrate in slices? What about the read and write status of shards? These issues are discussed next.
Make a collection Shard
(1) Connect to the MONGOs instance in the cluster configured above
> MONGO--port 40009
(2) Create database eshop and collections in the cluster users
Mongos> Use eshop
Switched to DB eshop
Mongos> Db.users.insert ({userid:1,username: "Lili", City: "Beijing"})
At this point, there is only one record in the collections users:
{"_id": ObjectId ("521dcce715ce3967f964c00b"), "userid": 1, "username": "Lili", "City": "Beijing"}
Observing the status information of the cluster, the field databases adds a record, and the other fields are the same as the initialized cluster information:
Mongos> Sh.status ()
Databases
{"_id": "eshop", "partitioned": false, "PRIMARY": "Rs0"}
You can see that the database eshop does not yet support sharding, and all the non-fragmented collections in the database are saved in the slice Rs0, and by looking at the data files on the disk, the eshop.0, ESHOP.1, Eshop.ns three files and is located in the data directory corresponding to Rs0, the chunks collection in the cluster is empty because there are no shards for the collection users.
(3) sharding
MongoDB shards are range-based, meaning that any document must be in a range of the specified slice key, and once the slice key is selected, Chunks will then follow the TAB key to logically group the documents together. Here the Users collection select the "City" field as the slice key to Shard, if now "City" field value has "Beijing", "Guangzhou", "Changsha", initially randomly inserted into the cluster containing the above field values of the document, Because the size of the chunks does not reach the default threshold of 64MB or 100,000 documents, there should be only one chunk in the cluster, and as the document continues to be inserted, chunk that exceed the threshold will be split into two chunks, and the final chunks and chip key distribution may be shown in the table below. The table simply describes the situation of the Shard in general and may actually vary, where- ∞ means all documents with a key value less than "Beijing", and ∞ for all documents with a key value greater than "Guangzhou". It is also important to emphasize that the document contained in chunks is not a physical inclusion, it is a logical inclusion, it only means that the document with the chip key falls in which range, and that the scope of the document corresponding to the chunk is located in which slices can be queried, Subsequent read and write operations are positioned in the specific collection on this slice.
Rs0
∞
Start key value |
End key value |
Shard |
- ∞ |
Beijing |
Rs0 |
Beijing |
Changsha |
rs1 |
Changsha |
Guangzhou |
Guangzhou |
rs1 |
The following continues with the command to make the collection users Shard, so that the collection shard must first enable its database to support Sharding, as follows:
Mongos> sh.enablesharding ("eshop")//enable the database to support sharding
To shard a collection of existing data, you must first create an index on the selected slice key, and MongoDB automatically creates an index on the selected slice key if there is no data at the beginning of the collection.
Mongos> db.users.ensureIndex ({city:1})//Create a slice-key-based index
Mongos> sh.shardcollection ("Eshop.users", {city:1})//Make collection shards
After successfully executing the above command, review the cluster status information again:
Mongos> Sh.status ()
---sharding Status---
Sharding version: {
"_id": 1,
"Version": 3,
"Mincompatibleversion": 3,
"CurrentVersion": 4,
"Clusterid": ObjectId ("521B11E0A663075416070C04")
}
Shards:
{"_id": "Rs0", "host": "Rs0/guo:40000,guo:40001"}
{"_id": "Rs1", "host": "Rs1/guo:40003,guo:40004"}
Databases
{"_id": "admin", "partitioned": false, "PRIMARY": "Config"}
{"_id": "eshop", "partitioned": true, "PRIMARY": "Rs0"}//Database has shards supported
Eshop.users//Collection of shards
Shard Key: {"City": 1}//Tablet key
Chunks://All block info
Rs0 1//Current only 1 blocks on sheet RS0
{"City": {"$minKey": 1}}-->> {"City": {"$maxKe
Y ": 1}} on:rs0 {" T ": 1," I ": 0}//This block contains the key value range is-∞ to ∞, and on the slice Rs0, because at this time there is only one record in the collection, the block is not split, migrated
(4) Continue inserting data to make the collection auto-sharding
In order to observe that the set is divided into multiple chunk and distributed over multiple slices, it continues to insert some data for analysis.
> for (var i = 1; i<10000;i++) Db.users.insert ({userid:i,username: "Lili" +i,city: "Beijing"})
> for (var i = 0; i<10000;i++) db.users.insert ({userid:i,username: "xiaoming" +i,city: "Changsha"})
> for (var i = 0; i<10000;i++) db.users.insert ({userid:i,username: "Xiaoqiang" +i,city: "Guangzhou"})
After inserting the document through the three cycles above, the first chunk will be larger than 64MB, and the chunk segmentation and migration process occurs. Observing the status information of the cluster again, the field databases value becomes:
Databases
{"_id": "admin", "partitioned": false, "PRIMARY": "Config"}
{"_id": "eshop", "partitioned": true, "PRIMARY": "Rs0"}
Eshop.users
Shard Key: {"City": 1}
Chunks
Rs1 1
Rs0 2
{"City": {"$minKey": 1}}-->> {"City": "Beijing"
} on:rs1 {"T": 2, "I": 0}/Block interval
{' City ': ' Beijing '}-->> {' City ': ' Guangzhou '} on
: rs0 {"T": 2, "I": 1}/Block interval
{"City": "Guangzhou"}-->> {"City": {"$maxKey": 1
}} on:rs0 {"T": 1, "I": 4}/Block interval
Note that there are three blocks in the cluster, where there are two blocks on the slice Rs0, a block on the slice Rs1, and each block contains a range of documents for a certain interval. To get a clearer idea of how these blocks are split and migrated, you can view the record information in the Changelog collection for analysis.
The following steps can be seen from the command Db.changelog.find () output content:
The first step: split the block larger than 64MB, the original block of the slice key interval range is-∞ to ∞, after the partition changed to-∞ to "Beijing", "Beijing" to ∞ two intervals.
The second step: as the document continues to be inserted, the interval "Beijing" to ∞ contains more than 64MB of the block, at which time the interval is divided into two intervals: "Beijing" to "Guangzhou", "Guangzhou" to ∞ .
The third step: after the above division, now equivalent to have three blocks, this step is to be the interval-∞ to "Beijing" corresponding to the chunk from the slice Rs0 migrated to the slice rs1, the end result is the Shard Rs0 contains "Beijing" to "Guangzhou" "," Guangzhou "to a block of ∞ two intervals, the Shard Rs1 contains a block of bands-∞ to" Beijing ".
The above loop inserts the document when also inserted the slice key value is "Changsha" the record, this slice key record should be in the interval "Beijing" to "Guangzhou" the corresponding chunk, only because the chunk size has not yet reached 64MB, therefore has not been split, If you continue inserting the document of this slice key, the interval may be divided into two interval blocks of "Beijing" to "Changsha", "Changsha" to "Guangzhou". And so on and so on, MongoDB is to realize the distributed storage of massive data, at the same time, because each slice is composed of duplicate sets, it guarantees the reliability of the data.