1. Sharding
What a shard is. Sharding is the storage of data on multiple machines. When the data set exceeds the capacity of a single server, the server's memory and disk IO are problematic, exceeding the performance bottleneck of a single server. At this point there are two solutions, vertical scaling and horizontal scaling (sharding).
The vertical expansion is to increase the CPU and increase the capacity, but the CPU and capacity of the high-performance system is out of proportion, so the expansion cost is big and there is an upper limit.
Scale shards horizontally, distribute data to multiple servers, each server is a separate database, and each server is combined to form a logical database that distributes write pressure and operations to different servers, increasing capacity and throughput.
MongoDB documents are modeless and do not have a fixed structure, so they can only be horizontally fragmented. When the block exceeds the specified size or the number of documents exceeds the maximum number of documents, MongoDB attempts to split the block, and if the partition succeeds, mark it as a chunk to avoid repeating the split. The key to splitting blocks is the slice key, which describes the types of common slice keys.
2. Chip key Types
A slice key is a property field or a compound index field of a document that cannot be changed once it is established. Chip key is the key of shard splitting data, the choice of chip key directly affects the performance of cluster.
MongoDB first divides the block according to the chip key chunks when the block exceeds the specified size (default 64M), and then divides the block into other shards, the key types are as follows:
Note: The slice key is also an index commonly used when querying.
(1) Increment tablet key
This kind of chip key is more common, such as using time stamp, date, self-increment primary key, objectid,_id, etc., this kind of chip key write operation concentrates on one shard server, writes does not have the dispersibility, this causes the single server pressure is big, but the segmentation is relatively easy, this server may become the performance bottleneck.
Increment slice key creation, use timestamp timestamp shard for bar collection of Foo database
mongos> use foo
mongos> db.bar.ensureIndex ({"Timestamp": 1})
mongos> sh.enablesharding ("foo")
{"OK": 1}
Mongos> sh.shardcollection ("Foo.bar", {"timestamp": 1})
{"collectionsharded": "Foo.bar", "OK": 1}
(2) Hash pad key
The advantage of using a hash index field as a slice key is that the data is distributed evenly across the nodes, and data writes can be distributed randomly to each shard server, and the pressure on each server is distributed. But reading is also random, may hit more shards, generally have the randomness of the chip key (such as password, hash, MD5) query isolation performance is relatively poor.
Hash key creation, using files_id hash shard for chunks collection of Gridfs
Mongos> Db.bar.ensureIndex ({"files_id": "Hashed"})
mongos> sh.enablesharding ("foo")
{"OK": 1}
Mongos> sh.shardcollection ("Foo.fs.chunks", {"files_id": "Hashed"})
{"collectionsharded": "Foo.fs.chunks", " OK ": 1}
(3) Combination sheet key
The database does not have a suitable chip key to choose from, or is intended to use the chip key cardinality is too small (that is, the change is less than 7 days a week can be changed), you can choose another field using the combination of key, or even add redundant fields to combine. It is generally a combination of coarse-grained + fine-grained.
Creation of the composite slice key, using files_id and N combined shards for the chunks collection of Gridfs
Mongos> sh.enablesharding ("foo")
{"OK": 1}
mongos> sh.shardcollection ("Foo.fs.chunks", {"files_id": 1 , "n": 1})
{"collectionsharded": "Foo.fs.chunks", "OK": 1}
(4) Label Shard
The data is stored on the specified shard server, you can add a tag tag for the Shard, and then specify the appropriate tag, such as Let 10.*.*.* (T) appear on the shard0000, 11.*.*.* (Q) appears on shard0001 or shard0002, You can use tag to let the equalizer specify distribution.
Creation of label Shards
MONGOs > Sh.addshardtag ("shard0000", "T")
MONGOs > Sh.addshardtag ("shard0001", "Q")
MONGOs > Sh.addshardtag ("shard0002", "Q")
mongos> sh.addtagrange ("Foo.ips", {"IP": "010.000.000.000", ..., "IP": " 011.000.000.000 "}}," T ")
mongos> sh.addtagrange (" Foo.ips ", {" IP ":" 011.000.000.000 ", ...," IP ":" 012.000.000.000 "}}," Q ")
3. Chip Key selection Strategy
Roughly understand the type of chip key, then how to choose the Tablet key it. Nothing more than two considerations, data query and write, the best effect is that the data query can hit less shards, data writing can be randomly written to each shard, the key is how to weigh the performance and load.
How to choose the key is mainly from the following several issues to consider:
(1) First determine the field of a recurring query
(2) Identify key points that affect the performance of these operations