What is the Principles of MongoDB Sharding?

Source: Internet
Author: User
Keywords mongodb fragmentation mongolab sharding how sharding works in mongodb
MongoDB sharding
Why do I need Sharded cluster?
MongoDB currently has three core advantages: "flexible mode" + "high availability" + "scalability". Flexible mode is realized through json documents, high availability is guaranteed through replication sets, and scalability is guaranteed through sharded cluster.
Alibaba Cloud Simple Application Server:  Anti COVID-19 SME Enablement Program
$300 coupon package for all new SMEs and a $500 coupon for paying customers.

When to use sharding technology

 Storage capacity requirements exceed the capacity of a single disk
 The active data set exceeds the memory capacity of the single machine, causing many requests to read data from the disk, affecting performance
 Write IOPS exceeds the write service capacity of a single MongoDB node

Fragmentation technology makes the data in the collection scattered into multiple fragmentation sets. Make MongoDB have horizontal development.

Sharded cluster sharding architecture
Sharded cluster is composed of three components: Shard, Mongos and Config server.

Mongos is the access entrance of Sharded cluster,
Mongos itself does not persist data, all metadata of Sharded cluster will be stored in Config Server
 The user data will be scattered and stored in each shard. After Mongos is started, it will load metadata from the config server, start to provide services, and correctly route user requests to the corresponding Shard.
 
Data distribution strategy
Sharding supports a single set of data scattered on multiple shards. Currently, there are mainly two data fragmentation strategies.

Range based sharding
Hash based sharding
Range slicing

As shown in the figure, the collection is fragmented according to fields. Store the data of a set in different shards according to the scope of the field.
 On the same shard, each shard can store many chunks. The information about which shard the chunk is stored in will be stored in the Config server, and mongos will automatically perform load balancing according to the number of chunks on each shard.
Range sharding is suitable for searching within a certain range, for example, searching for data with the value of X between [100-200], and mongo routing can be directly located to the Chunk of the specified shard according to the metadata stored in the Config server
Disadvantages If the shardkey has a significant increase (or decrease) trend, the newly inserted documents will mostly be distributed to the same chunk, and the writing ability cannot be expanded

Hash fragmentation

Hash sharding calculates the hash value (64bit integer) based on the user's shard key, and distributes the document to different chunks according to the hash value according to the "range sharding" strategy

Advantages Hash sharding is complementary to range sharding, which can randomly distribute documents to each chunk, fully expands the writing ability, and makes up for the shortcomings of range sharding.
Disadvantages but not efficient service range query. All range queries must be distributed to all shards in the backend to find documents that meet the conditions.

Reasonable choice of shard key

When choosing a shard key, you should choose reasonably according to the needs of the business and the advantages and disadvantages of the two methods of "range sharding" and "Hash sharding". The data should be sharded according to the actual reason of the field, otherwise it will produce too large Chunk

Mongos

Mongos serves as the access entrance of the Sharded cluster. All requests are routed, distributed, and merged by mongos. These actions are transparent to the client driver. The user connects to mongos just like connects to mongod.

Query request

If the query request does not contain the shard key, the query must be distributed to all shards, and then the combined query results are returned to the client
 If the query request contains the shard key, the chunk to be queried is directly calculated based on the shard key, and the query request is sent to the corresponding shard
 Write request
 The write operation must include the shard key. Mongos calculates which chunk the document should be stored in according to the shard key, and then sends the write request to the shard where the chunk is located.
 Update/delete request
 The query conditions of update and delete requests must include shard key or _id. If it contains shard key, it will be directly routed to the specified chunk. If it only contains _id, the request must be sent to all shards.
 Other order requests
  
 Config Server
 config database
  Config server stores all metadata of Sharded cluster, all metadata are stored in config database
 Config Server can be deployed as an independent replication set, which greatly facilitates the operation and maintenance management of Sharded cluster.
 config.shards
 The config.shards collection stores the information of each shard. You can dynamically add or remove shards from the Sharded cluster through the addShard and removeShard commands
 config.databases
 The config.databases collection stores all database information, including whether the DB is sharded and primary shard information. For collections that do not have sharding in the database, all data will be stored on the primary shard of the database.
 config.colletions
 Data sharding is for the collection dimension. After the sharding function is enabled for a database, if you need to store the collection in shards, you need to call the shardCollection command to enable sharding for the collection.
 config.chunks
 After collection sharding is enabled, a new chunk will be created by default, and the documents (that is, all documents) within the shard key value [minKey, maxKey] will be stored in this chunk. When using the Hash fragmentation strategy, you can also create multiple chunks in advance to reduce chunk migration.
 config.settings

 The config.settings collection mainly stores the configuration information of the sharded cluster, such as chunk size, whether to enable balancer, etc.

Other collections

config.tags mainly stores sharding cluster tags (tag) related to your washing
config.changelog mainly stores all the change operations in the sharding cluster. For example, the movement of the balancer to migrate chunks will be recorded in the changelog
config.mongos stores the information of all mongos in the current cluster
 config.locks stores lock-related information. When operating on a certain collection, such as moveChunk, you need to acquire the lock first to prevent multiple mongos from migrating chunks of the same collection at the same time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.