MongoDB splits the documents in the collection according to the Sharding key, and then assigns them to the members of the Shard cluster.
The Shard key can be an indexed field or compound index field that exists in each file.
MongoDB uses a different range of sharding key values to split the data in the collection. The different sharding key ranges are non-overlapping and each Shard key range is associated with a chunk.
Select the Sharding key
Select the Shard key to make the chunks smooth distribution to the Shard of the cluster as much as possible. If you do not do that, it will affect the performance of the cluster:
- Assuming that all chunks are assigned to a shard, the ability of the entire cluster is the ability of this Shard
- Assuming that the chunks is not evenly distributed and concentrated in one Shard, this shard can be a bottleneck. Because the total time-consuming depends on the slowest shard.
In order to select a good partition key, you also need to understand the following properties of the Sharding key
- Cardinal Nature
- Frequency
- Monotonous change
Cardinal Nature
The cardinality of the Sharding key determines the maximum number of chunks that the balancer can create.
At any given moment, a unique key-value pair can only exist in a chunk with no more than one. There is a sharding key with a cardinality of 4, so there are up to 4 (valid) chunks in the cluster, because adding additional shards does not yield revenue, and each chunk stores a unique Shard key.
However, the high cardinality also does not guarantee that data is distributed smoothly in the cluster, which is also related to frequency and monotonicity. These three factors have to be taken into account when choosing a sharding key.
Frequency
The frequency of the sharding key refers to the number of occurrences of a given Shard key value in a file.
If most files contain only a subset of the sharding keys, then the Shard that stores most of the files becomes a bottleneck for the cluster. If most files contain only one shard key, then the corresponding chunk will be large and indivisible. This reduces the performance of the cluster.
If your data model needs to be in a high-frequency sharding key, consider using a unique or low-frequency composite index instead.
monotonic Change of Sharding key
Monotonic change means that the sharding key is monotonically increasing or monotonically decreasing, so that the sharding key is more easily inserted into a shard in the cluster (rather than evenly distributed).
This happens because each cluster has two chunk that catch the Shard key that is out of bounds. A shard key that captures the maximum value beyond the (sharding key), and a Shard key that captures less than the minimum value.
If a sharding key is monotonically incremented, then after a certain time, all additions will be entered into [maxKey, 正无穷]
this chunk. Similarly, the monotonically decreasing sharding key will enter [负无穷, minKey]
. Shards that contain corresponding chunk become bottlenecks in write operations.
Unique index
[2] only the entire Shard key is used as the unique index of its prefix to ensure that it is unique across shards.
Hash Shard
Hash shards use a hash index of a field as the Shard key to split the data.
Hash shards provide a more evenly distributed shard cluster at the expense of query isolation. A document that is adjacent to a shard value is more unlikely to be on the same shard, so a query mongos for a given range is more likely to execute a broadcast query. At the same time, MONGOs can match an equal query to a shard.
Hash Shard Key
The field you select as the hash shard key should have a high cardinality. Assuming there is no high cardinality, the data is concentrated on some shards rather than evenly distributed to all shards, and then too many shards of data can cause bottlenecks.
The ideal hash sharding key is a monotonic field, such as ObjectId or time.
Limit size of sharding keys
The size of the Shard key cannot exceed bytes.
Sharding Key Index Type
The index of a sharding key can be an index that is incremented on a shard key, a composite index that is incremented on a shard key prefixed by a shard key, or a hash index.
The Shard key index cannot be a multi-key index, a text index, or a geospatial index on a sharding key field.
The Shard key cannot be changed
If you must change the sharding key:
- Export all data
- Drop the Old Shard collection
- To set a new Shard key
- Pre-partition the scope of the Sharding key to ensure that the initial distribution is uniform
- Import data
The Shard key in the document cannot be changed
You cannot modify the value of a shard key in the corresponding field in the text.
Reference
- https://docs.mongodb.com/manual/core/sharding-shard-key/#shard-key
- https://docs.mongodb.com/manual/reference/limits/#sharded-clusters
Mogodb sharding Key