MongoDB Shard Extension Schema

Source: Internet
Author: User
Tags mongodb metabase

[TOC]

First, Introduction

MongoDB currently has 3 core advantages: "Flexible mode" + "High Availability" + "extensibility", through JSON documents to achieve flexible mode, through the replication set to ensure high availability, through the sharded cluster to ensure scalability.

The MongoDB shard cluster sharded cluster enables high scalability by spreading data across multiple shards (Shard).
When the MongoDB replica set encounters the following business scenario, you need to consider using sharded cluster

    • Storage capacity requirements exceed stand-alone disk capacity
    • Active datasets exceed the capacity of a single machine, causing many requests to read data from disk, impacting performance
    • Write ioPS exceeds the write service capability of a single MongoDB node

As shown, the Sharding cluster enables the collection of data to be distributed across multiple shard (replica Sets or single mongod nodes), allowing MongoDB to have scale-out capabilities that enrich the MongoDB application scenario.

Second, the Shard cluster

When implementing a shard cluster, MongoDB introduces config server to store the metadata of the cluster, introduces MONGOs as the portal for application access, MONGOs reads the routing information from config server, and routes the request to the Shard on the back end.
Diagram of a sample sharded cluster for production purposes. Contains exactly 3 config servers, 2 or more mongos query routers, and at least 2 shards. The shards is replica sets.

Role description

A. Data fragmentation (shards)
It is used to save data to ensure high availability and consistency of data. Can be a separate mongod instance, or it can be a replica set.
In the production environment Shard is generally a replica Set to prevent a single point of failure of the data sheet. There is a primaryshard in all Shard that contains a collection of data that is not partitioned:

B. Configuration server (config servers)
Save the cluster's metadata (metadata), which contains the routing rules for each shard.

C. Query routing (routers)
MONGOs is the access portal for sharded cluster, which itself does not persist data (sharded cluster all metadata is stored to config Server, and the user's data is stored in separate shard)
After the MONGOs is started, the metadata is loaded from config server, the service is started, and the user's request is routed correctly to the corresponding Shard
Sharding clusters can have either a mongos or multiple mongos to relieve the pressure on client requests.

Three, the data distribution strategy

sharded cluster supports distributing data from a single collection on multiple Shard, and users can specify that data is distributed based on a field of the document within the collection, which is Shard key.
At present, the main support 2 kinds of data distribution strategy, range Shard (range based sharding) or hash shard (hash based sharding).

Range Shard
Diagram of the Shard key value space segmented into smaller ranges or chunks.

As shown, the collection is fragmented according to the X field, and the value range of x is [Minkey, Maxkey] (x is integer, here Minkey, Maxkey is the minimum and maximum of the integer), and the entire range of values is divided into chunk, Each chunk (typically configured as 64MB) contains one of the small pieces of data.
CHUNK1 contains all the documents for the value of x in [Minkey,-75), while CHUNK2 contains all documents with an X value between [-75, 25] ... Each chunk data is stored on the same shard, each shard can store a lot of chunk,chunk stored in which shard the information is stored in the config server type, MONGOs also automatically load balance based on the number of chunk on each shard.

Range sharding can well meet the "scope query" requirements, such as to query the value of x in [-30, 10] All documents, then MONGOs can directly route the request to CHUNK2, you can query out all eligible documents.
The disadvantage of a range shard is that if the Shardkey has a significant increment (or decrement) trend, the newly inserted document will be distributed to the same chunk, unable to extend the ability to write, such as using _id as the Shard key, and MongoDB automatically generated ID high is the timestamp, is continuously incremented.

Hash Shard
Hash Shard is based on the user's Shard key to calculate the hash value (64bit integer), according to the hash value according to the "Scope fragmentation" strategy to distribute the document to different chunk.
Diagram of the hashed based segmentation.

Hash fragment and the scope of the complementary, can be scattered randomly distributed documents to the various chunk, full expansion of write ability, to compensate for the lack of scope fragmentation, but not efficient service range query, all the scope of the query to be distributed to the back end of all the Shard to find the documents that meet the criteria.

A reasonable choice shard key
Choose Shard Key, according to the needs of the business and the "scope of fragmentation" and "hash Shard" 2 ways of reasonable choice, but also note that the value of Shard key must be enough, otherwise there will be a single jumbo chunk, That is, a single chunk is very large and cannot be split (split); For example, a collection stores information about a user, shards from the age field, and the age value is very limited, which is bound to result in a very large single chunk.

Four, MONGOs access mode

All requests are routed, distributed, and merged by MONGOs, and these actions are driver transparent to the client, and the user connection MONGOs is used like a connection mongod.
The MONGOs will route the request to the corresponding shard based on the request type and shard key, so there are different restrictions on the different operation requests.

    • Query Request
      The query request does not contain Shard key, you must distribute the query to All Shard, and then merge the query results back to the client
      Query request contains shard key, then directly according to Shard Key to calculate the need to query chunk, to the corresponding shard send a query request

    • Insert Request
      The write operation must contain Shard Key,mongos based on Shard key to figure out which chunk the document should be stored in, and then send the write request to chunk where Shard is located.

    • Update/Delete Request
      The query condition for the update, delete request must contain Shard key or _id, and if it contains Shard key, it will be routed directly to the specified chunk, and if only _id is included, the request should be sent to all Shard.

    • Other command requests
      In addition to add and remove changes to the other command request processing methods are different, have their own processing logic, such as the listdatabases command, will be forwarded to each Shard and config server listdatabases request, and then merge the results.

How to connect
A typical Connecturi structure is as follows:

mongodb://[username:[email protected]]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]//说明- mongodb:// 前缀,代表这是一个Connection String;- username:[email protected] 如果启用了鉴权,需要指定用户密码;- hostX:portX多个 mongos 的地址列表;- /database鉴权时,用户帐号所属的数据库;- ?options 指定额外的连接选项,比如指定readPreference=secondaryPreferred实现读写分离

The Shard cluster can provide multiple mongos to achieve the current load balance, and when a mongos failure, the client can also automatically failover, the request is dispersed to the status of normal MONGOs.
When the number of MONGOs is large, you can also group MONGOs by application, for example, there are 2 applications A, B, there are 4 MONGOs, you can have application a access MONGOs 1-2 (URI only specify the address of MONGOs 1-2),
Apply B to access the MONGOs 3-4 (only the address of MONGOs 3-4 is specified in the URI), which is used to implement access isolation between applications (the MONGOs of the application access is isolated from each other, but the backend Shard is still shared), as

V. Config meta-data

Config server stores all metadata for sharded cluster, all of which are stored in the Config database,
With version 3.2, the Config server can be deployed as a standalone replica set, which greatly facilitates operation and management of the sharded cluster.

The config data collection is shown in the following table:

Collection Name Description
Config.shards Stores information for each shard, dynamically adding or removing cluster from sharded shard by Addshard, removeshard commands
Config.databases Stores information about all databases, including whether or not the DB opens shards, primary shard information, and all data is stored on the primary shard of the database for collections that do not have shards open in the database
Config.colletions Data shards are for a collection dimension, and once a database is enabled for sharding, you need to call the Shardcollection command to turn on the Shard for the collection if you want the collection shard to be stored.
Config.chunks When the collection Shard is turned on, the default is to create a new Chunk,shard key value [Minkey, Maxkey] in which the document (i.e. all documents) is stored in this chunk. When using the hash shard strategy, you can pre-create multiple chunk to reduce chunk migration
Config.settings Store sharded cluster configuration information, such as chunk size, whether to turn on balancer, etc.
Config.tags Main storage sharding cluster tag (tag) related to your wash to achieve the function of distributing chunk according to tag
Config.changelog All change operations in the primary storage sharding cluster, such as balancer migration chunk, are recorded in Changelog.
Config.mongos Stores all mongos information for the current cluster
Config.locks When storing lock-related information, when working on a collection, such as movechunk, you need to acquire the lock first to avoid multiple MONGOs migrating the same collection at the same time chunk.
Six, shard equalization

Mongodb realizes auto-shard equalization, which is a process of monitoring the Shard chunk in the background, and when the chunks difference quantity of a shard reaches the threshold, it will automatically begin to migrate chunk database in the middle of shard to achieve a balanced goal. The entire migration process is transparent to the application layer, starting with version 3.4, where the equalizer is no longer executed by MONGOs, but is handled by the master node of the config replica set.

There is a certain impact on the performance of the cluster during the migration process, so you can generally snap to the business Idle window by setting up the equalization windows.

Threshold Reference Table
| Number of chunks| Migration threshold|
|-|-|
| Fewer than 20| 2| |
|20-79| 4|
|80 and greater| 8|

Migration process

    1. The equalizer sends the Movechunk command to the source shard;
    2. The source Shard executes the internal movechunk process, and the data operation still points to the current Shard
    3. The target shard constructs the missing index;
    4. Target Shard request and receive chunk copy data;
    5. After chunk received, the target Shard to the source Shard confirm the existence of incremental update data, if present, continue to synchronize;
    6. After a full synchronization, the source shard notifies the config replica set to update the metabase to update the chunk location to the target Shard
    7. The source shard deletes the migrated chunk copy after updating the metabase and ensuring that there is no cursor associated with it.
Reference documents

MongoDB Shard Cluster principle
http://www.mongoing.com/archives/2782

MONGO Chinese community-high-availability MongoDB cluster
https://yq.aliyun.com/articles/61516

Official website-mongodb Shard Cluster
https://docs.mongodb.com/manual/core/sharding-balancer-administration/

MongoDB Shard Extension Schema

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.