[MongoDB] mongodb sharded cluster (5) on windows platform, mongodb Cluster
In this article, the above four articles continue to describe how to build a mongodb sharding cluster on the window platform. You can also create indexes in a sharded cluster. The method for creating indexes is the same as that for creating indexes in a separate database. Therefore, we will not talk about it much more. This article focuses on the selection of partition keys.
The partition key is generally a token used to separate massive data. More efficient partitioning of massive data often relies on the choice of partition keys. If the partition key is poorly selected, applications cannot take advantage of the many advantages provided by the partition cluster. In this case, the query and insertion performance are both significantly reduced.
1. Inefficient partition keys with 1.1 distribution difference
The BSON Object ID is the default primary key of each mongodb document. The most important part of all object IDS is the timestamp, which means that the Object ID is in ascending order. Unfortunately, the ascending order is very bad for the partition key. Because the parts are range-based. With the sharding key in ascending order, all recently inserted documents fall within a very small continuous range. If you want to make the inserted load fail to be divided into Multiple shards, you cannot use the uporder partition key. You need something more random.
1.2 The lack of local ascending partition keys is in a clear direction, and completely random partition keys have no direction at the root. The former cannot distribute inserts, while the latter may be too slow to distribute inserts. Assume that each document in the fragment set contains an MD5, And the MD5 field is the partition key. Because MD5 varies with the document. All the sharding keys ensure that the inserted documents are evenly distributed across the cluster. But there is a problem. During the insertion of the MD5 field index of each shard, each virtual memory page in the index may be accessed. In this case, all indexes and data may be stored in the memory. This exceeds the physical memory.
3. If neither the random partition key nor the ascending partition key cannot be split, try the coarse-grained partition key. For example, if the user ID uploads 100 photos, the partition key is the user ID. The first reason is that each photo is random, and local reference can be used to improve efficiency. But one problem is that when the photos uploaded by the user ID are too large, they have to be segmented. However, the system cannot split a user's photo into multiple quick ones.
Ii. Ideal partition key
Through the above analysis, the ideal partition key should meet the following requirements:
1. evenly distribute the inserted data to each partition. 2. Ensure that the crud operation can use locality. 3. There is sufficient granularity for block division.
For example, to create a website analysis system, a good data model is to save a document on each webpage every month, and then keep the daily data for that month in that document, add some counter fields each time you access a page. The following is an instance analysis document related to the partition key:
{_ Id: objectId ("34535213245eraf32223sdarwe") domin: "org. mongod" url: "download" perid: "2011-12 "}
The simplest part is to include the domain name of each web page, followed by the url {domain: 1, url: 1}. All pages from the specified domain are usually located in one part, however, some special domains have a large number of pages and will be split into shards if necessary.
Note: Most of the content in this article is from MongoDB in action by Kyle Banker.