This paper briefly introduces the sharding function of MongoDB, summarizes the shards, detailed functions, and the following articles will be introduced in succession.
Sharding is a way of assigning data to multiple servers, and MongoDB uses sharding for big data deployments and high throughput operations.
Big data and high-throughput applications can pose a significant challenge to the capacity of a single server. For example, a high-frequency query operation consumes the server's CPU and can have an impact on the performance of the hard disk if the data set is larger than the system's RAM capacity.
There are two ways to increase the capacity of the system: horizontal scaling and vertical scaling.
Vertical scaling increases the capacity of a single server, for example, by using higher-performance CPUs, increasing RAM capacity, or increasing storage capacity. However, there is a limit to the capacity of a single server, and the cloud appliance provider can easily reach the maximum size on a hardware configuration. So vertical expansion is very restrictive.
Horizontal expansion is to spread the data and system pressure across multiple servers, increasing the number of servers to achieve the purpose of increasing the overall system capacity. Although the speed or capacity of a single server is not very large, each server can share part of the system pressure and be more efficient than a single, high-performance server. Increasing the capacity of the entire deployment system requires only a few additional servers, which is more cost-efficient than increasing the performance of a single service. The disadvantage is to increase the complexity of the system and the maintenance cost of the system.
Shard Cluster
The Shard cluster consists of several members:
Shard: Each shard contains a set of Shard data, each shard can be deployed as a replica set.
Mongos:mongos's task is to query the route, which is the interface between the client and the Shard cluster.
Config servers: The configuration server holds the metadata of the system and the configuration information of the cluster. After MongoDB3.4, the configuration server must be deployed in the same way as the replication set.
is clustered:
Sharding is a way of assigning data to multiple servers, and MongoDB uses sharding for big data deployments and high throughput operations.
Big data and high-throughput applications can pose a significant challenge to the capacity of a single server. For example, a high-frequency query operation consumes the server's CPU and can have an impact on the performance of the hard disk if the data set is larger than the system's RAM capacity.
There are two ways to increase the capacity of the system: horizontal scaling and vertical scaling.
Vertical scaling increases the capacity of a single server, for example, by using higher-performance CPUs, increasing RAM capacity, or increasing storage capacity. However, there is a limit to the capacity of a single server, and the cloud appliance provider can easily reach the maximum size on a hardware configuration. So vertical expansion is very restrictive.
Horizontal expansion is to spread the data and system pressure across multiple servers, increasing the number of servers to achieve the purpose of increasing the overall system capacity. Although the speed or capacity of a single server is not very large, each server can share part of the system pressure and be more efficient than a single, high-performance server. Increasing the capacity of the entire deployment system requires only a few additional servers, which is more cost-efficient than increasing the performance of a single service. The disadvantage is to increase the complexity of the system and the maintenance cost of the system.
Shard Cluster
The Shard cluster consists of several members:
Shard: Each shard contains a set of Shard data, each shard can be deployed as a replica set.
Mongos:mongos's task is to query the route, which is the interface between the client and the Shard cluster.
Config servers: The configuration server holds the metadata of the system and the configuration information of the cluster. After MongoDB3.4, the configuration server must be deployed in the same way as the replication set.
is clustered:
MongoDB is the shard of data at the collection level.
Tablet key
MongoDB uses a slice key to fragment a document in a collection. A slice key is one or more fields that are owned by all documents in the Shard collection.
You need to select a slice key for a collection of shards, and once selected, you can no longer modify it. A Shard collection can have only one slice key.
For a non-empty collection, you must have an index on the tablet key. If the collection is empty, MongoDB automatically creates an index on the tablet key when it creates the slice key.
The choice of chip keys can affect the performance, efficiency, and scalability of the cluster. Even with the best hardware, the cluster will quickly reach a performance bottleneck if there is a problem with the tablet key selection. The selection of the slice key also affects the Shard strategy used by the cluster.
Block
MongoDB shards allocate data to blocks. Each block contains part of the data based on the slice key. MongoDB realizes the balance of the data of each shard server through the equalizer.
Advantages of sharding Read and write
MongoDB clusters distribute read and write pressure by sharding, and each shard maintains a subset of the cluster operations so that the ability to read and write horizontally can be expanded by adding shards.
If the query contains a slice key, the MONGOs will be positioned on the specific shard. This operation is much more efficient than traversing the entire cluster.
Store
Each shard maintains a set of data, and increasing the Shard increases the capacity of the cluster as the data grows.
Highly Available
A shard cluster can still provide read and write operations in the case of one or more shard outages. Data on other available shards can be manipulated even if the data on the outage shard is not operational.
In a production environment, Sharding is recommended to be deployed as a replica set, providing redundancy and availability.
Things to consider before partitioning
The infrastructure of a fragmented cluster requires granular capacity planning, execution, and maintenance.
The choice of the chip key must be careful to ensure the performance and efficiency of the cluster. Because the slice key cannot be modified or canceled after the Shard. If there are no slice keys inside the query, MONGOs will traverse the entire cluster, which can be time consuming.
Shards and non-shard collections
A database can contain both a shard collection and a non-shard collection. The Shard collection is assigned to multiple shard servers in the cluster, and the non-fragmented collection resides in the primary shard. Each database has a primary shard.
Connecting to a shard cluster
To interact with a collection in a Shard, you must first connect to the MONGOs route. The client should never connect to a single shard for read and write operations. The method of connecting Mngos is the same as connecting to a single Mongod instance.
Sharding Policy
MongoDB supports two kinds of sharding methods: Hash Shard, Range shard
Hashed Shards
A hash shard is a set of hash values that form the value of a slice key, and each shard holds a certain amount of document according to the range of the hash value.
Even if the values of the slice keys are similar, but their hashes are likely to be different, all of them are not allocated to a Shard service, and the benefit is that the scope-based queries on the sharding key are dispersed across multiple servers, reducing the stress on a single server.
ranged Shards
Range shards are shards based on the range of slice key values, and each shard server stores a range of data.
Data with similar chip-key values is likely to be assigned to the same block. For the range of shards, the choice of the chip key is very important, the chip key selection can greatly improve the system system, and vice versa. The choice of the chip key unreasonable will result in uneven distribution of data, but can not reflect the advantages of fragmentation or cause system early system bottlenecks.
MongoDB Shard Introduction