MongoDB: sharding (Introduction & Auto Shard & Tablet key)

Source: Internet
Author: User

Sharding (increased server, horizontal scaling) is the way MongoDB expands, with shards that can add more machines to cope with increasing load and data without impacting applications.

Introduction

Sharding (sharding) refers to the process of splitting data and dispersing it across different machines. In a relational database, when a table is too large (more than hundreds of millions of rows of data), we also have a sub-table approach, and here The Shard is similar to the concept.

Manual sharding: When we apply a bottleneck to the database system, if we are using a relational database, we usually do manual sharding. That is, we use our application layer code to maintain a connection to several database systems, and each connection is independent. Our application layer code is responsible for shielding the underlying multiple database instances, query-oriented to a specific instance, this way has a disadvantage, is to maintain too much trouble! For example, to add or remove nodes to the underlying database cluster, adjust the data distribution and load patterns, and so on, we have this layer of application layer code to put forward a small challenge!

MongoDB is designed to consider the scale-out, which supports automatic sharding! We can easily add or remove machines to a MONGODB database server cluster, and the cluster will automatically slice the data and load balance!

"Auto Shard"

A "slice" is a standalone MongoDB service (that is, a mongod service process, in a development test environment) or a replica set (in a production environment). The idea of slicing data is to split a large set into a small part and place it on a different "slice". Each "slice" is just a part of the total data. Automatic sharding is: The application layer does not know that the data has been fragmented, and will not know exactly which data on which specific "slice". In MongoDB, a routing service MONGOs is provided, which needs to be run before sharding, which specifically knows the relationship between data and "slices". The application communicates with the routing service, and the routing service forwards the request to a specific "slice", and the route collects the response data and returns it to the application-tier program. The following two images show the processing path of a user sending a request without using shards and using shards:

Before the Shard:

After fragmentation:

So when are we supposed to give our old system () improved to a new system after sharding (), usually with the following principles:

1 "The disk of the machine is not enough, the amount of data is too large

2 "single Mongod has been unable to meet the performance requirements of writing data (here to review, if you want to increase read performance, a better solution is to build a master-slave structure, and let the slave node can respond to query requests)

3 "To put a lot of data into memory to improve performance, the memory size of a machine always has a limit (this is the difference between vertical scaling and scale-out)

"Tablet key"

When you set up a shard, you select a key from the collection that is used as the basis for splitting the data. This key is called the "slice Key". We can provide a simple example, for a collection of information about a store users, we want to shard it, the selected slice key is the person name name, then the result of the last Shard may be: The first piece of the person stored in the name is A-f, the second piece is the beginning of g-p, In the third chapter, Q-z begins. When the user submits the query is: Db.users.find ({"Name": "Jimmy"}), the query request is assigned to the second slice for processing, when the user submits the query is: Db.users.find ({"name": {"$lt": "J"}}, The query request is assigned to the first and second slices for processing, and when the user submits a query that does not contain the slice key information, the query is sent to all slices for processing. For an insert operation, the routing service sends the request to a specific slice based on the value of the key name of the inserted document. This is the role of the chip key.

With the increase or decrease of the data, there may be a load of a large, another load easy situation, for this case, MongoDB will automatically balance the data and load, is the final flow of each piece is basically the same!

for which key to select as the slice key? There is a principle is that the tablet key should have more changes in the value, if the tablet key is set to gender, only "male" and "female" two values, then this set is divided into two pieces, if the collection is too large, this shard will not ultimately solve the problem of efficiency! Here we can see that the selection of the chip key and the creation of the index when the key selection principle is similar, in practice, the usual slice key is to create the index using the key!

Here first introduce the introduction and principle of the Shard and so on, the next we will build their first shard out!

MongoDB: sharding (Introduction & Auto Shard & Tablet key)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.