Copy of data
A set of replicas in MongoDB is a bunch of mongod processes that maintain the same set of data. Replica sets provide redundancy and high availability, and are the basis for deployment of production environments.
Data redundancy and availability
By storing the same data on different servers, the replica mechanism guarantees a degree of fault tolerance, that is, after a database is hung up, the data service is still available.
In some cases, replicas can improve the read performance of data because users can read data from different databases. Maintaining copies of data in different data centers can improve the availability of distributed applications. You can also maintain additional replicas for other purposes, such as disaster recovery, alerting, or backup.
Copy in MONGO
A MONGO replica set consists of several data bearing nodes and an optional arbiter. In these nodes, there is only one master node, and the other nodes are considered secondary nodes.
The master node receives all write operations. In a set of replicas, only the master node can confirm
{w: "Majority"}
Although in some cases, another mongod process will consider itself a master node for a short time. The master node records all data changes in the log, such as Oplog
.
The Oplog of the master node is copied from the node, and then the data of the master node is consistent according to the change of the log replay data set. If the primary node hangs, a qualified slave node will initiate an election to elect itself as the new master node.
You can add an additional Mongod instance as an arbiter in the replica set. The arbiter does not maintain the dataset, its primary function is to maintain heartbeat between nodes and respond to requests from other replica set members. Because the arbiter does not maintain the data, it consumes less resources than a full node. If the number of nodes in the replica set is even, add the ability of an arbitrator to add a majority of votes in the primary node's election.
The role of the arbiter does not change, and the primary node may be demoted from node to node, or it may be promoted to the primary node.
Asynchronous replication
If you want to learn Java engineering, high performance and distributed, in Layman's. Micro-service, Spring,mybatis,netty source analysis of friends can add my Java Advanced Group: 582505643, the group has Ali Daniel Live interpretation technology, as well as Java large-scale Internet technology video free to share to everyone.
Asynchronously applies an operation from the primary node to the node. After synchronizing data from the master node, the replica set can remain operational even if some nodes are hung up.
Auto Fail-Over
When a primary node and other nodes in the replica set are disconnected in more than 10 seconds, a qualified slave node initiates an election and promotes itself to the primary node. The first node that initiates an election and obtains the majority of the votes in the replica will become the master node.
The failover process is usually completed in a minute. Other nodes in the replica set may take 10-30 seconds to make sure that the primary node is inaccessible. An election will be initiated after confirmation. The election process can take up to 10-30 seconds.
Read operation
By default, the user reads the data from the master node, but the user can also send read requests to the slave node by setting. Asynchronous means that the data from the node may not be consistent with the primary node.
Data sharding
Data fragmentation is the spread of data across multiple machines. MongoDB uses sharding technology to support the deployment of very large datasets and to increase the throughput of the system.
A single server faces the challenge of large amounts of data and high-throughput applications. For example, high-frequency queries deplete the server's CPU resources, and working data sets that are larger than the system's memory can cause heavy pressure on disk I/O.
There are two ways to cope with the growth of system data: vertical scaling and horizontal scaling.
Vertical scaling includes the ability to increase a single server, such as using a stronger CPU, adding more memory, or increasing storage space. The limitations of existing technologies may allow a single machine to be unable to cope with a given workload. In addition, the cloud service provider can provide the hardware configuration also has a certain limit. Thus, in practice, vertical scaling can cope with a maximum load.
Horizontal scaling includes the partitioning of datasets, and the allocation of load across multiple servers, and horizontal scaling can increase processing power by adding new machines. While the ability to stand alone may not be strong, each machine is responsible for processing a subset of the overall load and therefore has the ability to provide higher efficiency than high-speed, large-capacity servers. Horizontal scaling increases the processing power of the system by adding only new servers, which is less costly than boosting high-end server performance. The disadvantage is increased complexity of infrastructure deployment and maintenance.
MongoDB supports the horizontal expansion of the system through sharding technology.
Shard Cluster
MongoDB has the following components in the Shard cluster:
Shard: Each shard contains a subset of the data shards. Each shard can be deployed as a replica set
Mongos:mongos as a query route that provides an interface between a client application and a shard cluster
Config Servers:config servers store the metadata and configuration data in the cluster. In Mongo3.4, the config server must be deployed as a replica set.
Shows the interaction between the components.
Shard Keys
MongoDB uses Shard key to collection data shards. Shard key consists of a field that is immutable or that exists in each document in the target collection.
You need to select Shard key when partitioning the collection. Cannot be changed after Shard key. A Shard's collection can have only one shard key.
For a non-empty collection Shard, collection must have an index starting with Shard key. For empty collection, MongoDB creates an index if there is no appropriate index.
The choice of Shard key can affect the performance, efficiency and scalability of the cluster. Shard Key can be a bottleneck for the cluster, even if the machine performance is high in the cluster.
Chunks
MongoDB shards the data into chunk. Depending on the selected shard key, each chunk size has a lower and upper limit.
In a cluster, MongoDB uses a shard cluster equalizer to migrate individual chunk. The equalizer attempts to achieve chunk in the cluster.
Advantages of Sharding
Read/write
MongoDB in a shard cluster, the load of read and write is assigned to each node, allowing each shard to handle a subset of the cluster operations. This ability to read and write can be scaled horizontally by adding more shard.
For queries that include Shard Key, MONGOs can position the query to a specific shard.
Storage performance
Sharding technology assigns data to nodes in the cluster, each shard a subset of the total data collection. As the data set grows, adding shard can increase the storage capacity of the cluster.
Highly Available
The cluster can continue to perform partial read/write operations even if some shard are not available. The available Shard reads and writes are not affected while the Shard is not available.
In a production environment, Shard should be deployed as a replica set to provide data redundancy and availability.
sharding Considerations
Implementing sharding on a cluster requires careful planning, execution, and maintenance.
Careful selection of the Shard key is necessary to ensure the performance and efficiency of the cluster. You cannot change the Shard key after the Shard, nor can you undo the Shard.
Sharding has certain operational requirements and limitations.
If the query does not include the Shard Key,mongos broadcast operation, the query is executed in Shard in the cluster. Such queries can have a longer time-consuming period.
Shard and non-shard collection
A database may have both fragmented collection and non-Shard collection. The collection of the shards are partitioned and distributed in different shard of the cluster. Non-fragmented collection are stored in the main shard. Each database has its own master shard.
Connection to a shard cluster
If you want to learn Java engineering, high performance and distributed, in Layman's. Micro-service, Spring,mybatis,netty source analysis of friends can add my Java Advanced Group: 582505643, the group has Ali Daniel Live interpretation technology, as well as Java large-scale Internet technology video free to share to everyone.
You must connect to the MONGOs route to interact with the collection in the Shard cluster. This interaction includes both Shard collection and non-Shard collection. The client does not allow direct connection to a separate shard for read and write operations.
Sharding Policy
MongoDB supports two kinds of sharding policies.
Hash Shard
A hash shard is a shard of the value of a shard key after it is hashed. Each chunk is allocated based on the value after the hash.
A shard key within a range may be close, but the result of the hash is probably not in the same chunk. Hash-based data distributions form a more balanced distribution of data, especially in the case of Shard key monotonic changes.
However, a hash distribution means that it is unlikely that a range query will be positioned to a single shard, which results in a broadcast operation.
Range Shard
A range shard is a slice of data based on a value of Shard key. Each chunk is assigned based on the value of Shard key.
A value within a range of Shard key may be assigned to the same chunk, and MONGOs directs the request to only the Shard that contains the requested data.
The efficiency of a range shard depends on the Shard key. The Shard key under consideration can result in uneven distribution of data, which can reduce the benefits of data fragmentation or lead to performance bottlenecks.
Regions of a shard cluster
In a shard cluster, you can create a data region based on Shard key. Multiple shard in a cluster can be associated in the same region. A shard can be associated with any number of non-conflicting areas. In a balanced cluster, MongoDB migrates the chunk in the zone to the associated shard in the zone.
Each zone includes one or more ranges of shard key. The coverage of each area is always inclusive of its lower bounds and the upper bounds of its row.
High-availability MongoDB cluster deployment for Ali projects