Shardkey Selection in MongoDB

Source: Internet
Author: User
Shardkey needs to be selected for sharding of collections stored in the MongoDB database. The selection of Shardkey determines whether the data distribution in the cluster is balanced.

Sharding Key (Shard key) must be selected for sharding of collections stored in MongoDB databases. The selection of sharding keys directly determines whether the data distribution in the cluster is balanced.

To partition the Collection objects stored in the MongoDB database, you need to select the Shard Key. The selection of the Shard key directly determines whether the data distribution in the cluster is balanced and whether the cluster performance is reasonable. So what fields should we choose as the shard Key? Consider the following.

The following Document that records logs is used as an example:

{

Server:"Ny153.example.com",

Application:"Apache",

Time:"2011-01-02T21: 21: 56.249Z",

Level:"ERROR",

Msg:"Something is broken"

}

Base

All data of a sharded Collection in Mongodb is stored in numerous chunks. A Chunk stores data within a range of the shard field. It is very important to select a good shard field. Otherwise, a large Chunk cannot be split.

Take the preceding log as an example. If {server: 1} is selected as a shard Key, all data on a server is in the same Chunk, it is easy to think that the log data on a Server will exceed 200 MB (the default Chunk size ). If the shard Key is {server: 1, time: 1}, the log information on a Server can be split until the millisecond level, and there is absolutely no Chunk that cannot be split.

It is very important to maintain the Chunk size at a reasonable size. Only in this way can data be evenly distributed, and the cost of moving Chunk will not be too high.

Scalable write operations

One of the main reasons for using sharding is to distribute write operations. To achieve this goal, it is important to distribute write operations to multiple Chunks as much as possible.

With the above log instance, selecting {time: 1} as the shard key will cause all write operations to fall into the latest Chunk, thus forming a hotspot area. If {server: 1, application: 1, time: 1} is selected as the shard Key, the log information of the application on each Server will be written in different places, if there are 100 servers and 10 Server pairs, each Server will share 1/10 of write operations.

Query isolation

The other thing to consider is how many shards a query operation will provide services. Ideally, a query operation is directly routed from the Mongos process to a Mongodb, And the Mongodb has all the data for this query. Therefore, if you know that the most common query operations all use server as a query condition, using Server as a starting shard Key will make the entire cluster more efficient.

Any query can be executed, no matter what is used as the shard Key. However, if the Mongos process does not know which Mongodb Shard has the data to be queried, mongos will allow all Mongod shards to perform the query operation, and then summarize the results and return them. Obviously, this increase in server response time will increase network costs and unnecessary Load.

Sort

When you need to call sort () to query the sorted results, Mongos can query the minimum number of shards Based on the leftmost field of the shard Key, return the result information to the caller. This will take the least time and resource cost.

On the contrary, if sort () is used for sorting, the field used for sorting is not the shard Key at the left (START, then Mongos will have to concurrently pass the query request to each shard, and then merge the results returned by each shard before returning the request to the requester. This will increase the extra burden on Mongos.

Reliability

A very important factor in selecting a shard Key is the size of the affected Chunk (even with a seemingly trustable Replica Set) If a shard is completely inaccessible ).

Assume that there is a system similar to Twiter, and the Comment record is similar to the following format:

{

_ Id: ObjectId ("4d084f78a4c8707815a601d7"),

User_id: 42,

Time:"2011-01-02T21: 21: 56.249Z",

Comment:"I am happily using MongoDB",

}

Because this system is very sensitive to write operations, You Need To flat write operations to all servers. In this case, you need to use id or user_id as the shard Key. Using Id as the shard Key has the largest granularity flattening, but when a shard goes down, it will affect almost all users (some data is lost ). If User_id is used as the shard Key, only a very small percentage of users will be affected (20% of users will be affected when five shards exist ), however, these users will no longer see their data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.