This article goes on to the above four articles to continue to describe the Shard cluster construction under the window platform MongoDB. Indexes can also be created in a shard cluster in the same way that indexes are created in a separate database. So that's not much to say. This paper focuses on the selection of sharding keys.
A sharding key is a marker that splits large amounts of data. If more efficient partitioning of large amounts of data is often dependent on the choice of sharding keys. If the Sharding key is not well selected, the application cannot take advantage of the many advantages offered by the Shard cluster. In this case, both the query and the insertion can fall back significantly.
one, inefficient sharding key1.1 Distribution Difference
The Bson object ID is the default primary key for each MongoDB document. The most important component of all object IDs is the timestamp, which means that the object ID is ascending, and unfortunately ascending is bad for the sharding key. Because shards are range-based. With the ascending sharding key, all recently inserted documents fall within a small contiguous range. If you want to make the insertion load less than multiple shards, you cannot use the ascending sharding key, and you should need something more random.
1.2 Lack of localityThe ascending sharding key is defined by the direction of the completely random sharding key at the root without direction. The former cannot be dispersed, while the latter may spread the insertion too slowly. Assume that each document in a Shard collection contains a MD5, and the MD5 field is a sharding key. Because the MD5 changes as the document is different. All this sharding key ensures that the inserted document is evenly distributed across the shards of the cluster. However, there is a problem that every virtual memory paging in the index is likely to be accessed during the insert process for each Shard's MD5 field index. It's a surprise that it's possible that all indexes and data are loaded in memory. Thus exceeding the physical memory.
3. Blocks that cannot be splitrandom sharding keys and ascending sharding keys are not good, so try coarse-grained sharding keys. For example, if the user ID uploads 100 photos, then the Shard key is the user ID, the first reason is random for each photo, and can be used to improve the efficiency of the local reference. But one problem is that when a user ID uploads a photo that is too big, it has to be chunked. And the system can not split a user's photos into multiple fast.
Second, the ideal partition key
through the above analysis, the ideal sharding key should satisfy:
1. Distribute the inserted data evenly across the shards2. Ensure that CRUD operations can take advantage of locality3. Sufficient granularity for block partitioning
For Example: Create a website Analysis system, a good data model is that each page to save a document each month, and then in that document to maintain the daily data of the month, each visit to a page to add some counter fields. The following is an example analysis document about a sharding key:
{_id:objectid ("34535353245eraf32223sdarwe")domin: "Org.mongod"URL: "Download"perid: "2011-12"}
The simplest shard is the domain name that contains each page, followed by Url{domain:1, url:1} All pages from the specified domain usually fall on one shard, but some special domains have a large number of pages that are still split onto the shards when necessary.
Note: This article is mostly quoted in the "MongoDB in Action" by Kyle banker
"MongoDB" in a shard cluster of MongoDB under Windows Platform (v)