MongoDB shard cluster and replica set

Last Update:2016-08-25 Source: Internet

Author: User

Tags mongodump

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Shard Cluster
1.1. Concept
A shard cluster is an operation that stores data on multiple machines, consisting mainly of query routing MONGOs, shards, and configuration servers.
Query routing distributes requests to the appropriate shards based on metadata on the configuration server and does not itself store the cluster's metadata, but is cached in memory.
Shards are used to store data blocks. The dataset splits the collection into chunks based on the Shard key, stored on different shards. In a production environment, typically a shard consists of a replica set.
Configure the metadata for the server storage cluster, including the mapping of the data to the Shard, and once the configuration server is hung up, the cluster will not work.
Attention:
When MONGOs is restarted, metadata from the configuration server is read to update its own cached meta data
The configuration server is written when data is split or when data is moved between shards.
In a shard cluster, the configuration server can take the schema of the replica set, but the quorum and delay nodes are not allowed in the replication set, and the buildindexes must be set to true.
The data for the collection is distributed across multiple shards, and if a shard fails, the query returns an error, allowing incomplete data to be accepted by specifying the partial option for the query
Role
A single machine can not meet storage requirements, memory, disk space is not enough, read and write throughput is not enough.
1.2, how to maintain the data balance distribution
The cluster uses the splitter and balancer two background processes to maintain data distribution evenly.
Splitter
The function of the splitter is to prevent the block from getting larger, the block size is 64MB by default, and when it exceeds 64MB, the splitter divides it into two.
The segmented object is not the actual data, but the metadata, but logically the division of logical blocks, does not affect the actual data distribution
Data block is too small to produce a large number of blocks, easy to make the cluster imbalance, resulting in frequent data block movement, reduce cluster performance, metadata increase, reduce query efficiency
Data block is too large, will reduce the frequency of movement, the metadata is small, useful for data query, but once moved, it will take a long time
Not all collections are shards, and collections that are not shards are stored on the same primary shard
Data is distributed on different shards only when shards are turned on for databases and collections, otherwise it is stored only on the primary shard
Insert and update operations are likely to cause splits
Balancing device
The function of a balancer is to manage the movement of data blocks.
When the distribution of data blocks in a cluster reaches the move threshold, the balancer moves the data blocks.
Adding or removing shards or additions or deletions can also cause the balancer to move data blocks
1.3 How data blocks are stored on the corresponding shards
Each collection that needs to be fragmented requires the specified index field as the Shard key, and MongoDB uses the interval partition or hash partitioning algorithm to split the data into chunks based on the sharding key.
Interval partitioning
Blocks cover a section, and any shard key will be overwritten by a section
Advantages and Disadvantages
Interval partitioning supports better range queries, and query routing can easily determine which chunks contain queries that require data and allocate requests to shards.
Uneven distribution of the data by interval partitioning
Hash partition
The data is allocated according to the hash value of the Sharding key.
Advantages and Disadvantages
Data is randomly assigned to different blocks of data
When a range query is made, because neighboring data is distributed over different shards, many shards are accessed
Attention
The Shard key cannot be a multi-key index, that is, the value of an indexed field cannot be an array
Once a shard key is specified, it cannot be modified to another field, and the field value of the Shard key cannot be modified.
If the cluster writes more, you can use hash partitioning, evenly distribute the data to the nodes, and apply the write operation to the cluster evenly.
If the cluster reads more, you can use interval partitioning to divide adjacent data on the same node for easy querying
If the index field is not specified in the query, the query route distributes the request to all nodes, waits for the result to be returned, and the query is inefficient
If an indexed field is specified at query time, query routing distributes the request to a small number of nodes with high efficiency
1.4. Data migration Process
Balancer sends Movechunk instructions to the source node
The source node moves the specified data block, and the read and write operations of the block are still routed to the source node during the migration
Destination node If no index is required, the index is built at this time
The destination node begins to request data from the data block, which is saved locally
During migration, when data on the source node changes, the destination node synchronizes the changed data on the source node after the migration has completed
After the synchronization is complete, the destination node establishes a connection with the configuration server to configure the server update metadata, during which the source node blocks the write operation
Old data on the source node is deleted
1.5. Backup Data
mongodump-h dbhost-d dbname-o Directory command format
Mongodump-h 127.0.0.1:28002-d Test-o/home/backup
To back up the data in the native database test to/home/backup
Recovering data
Mongorestore-h dbhost-d Dbname–directoryperdb dbdirectory dbdirectory is where the backup data resides

Copy Set
2.1. Concept and Characteristics
Concept
A copy set is a set of Mongod instances with the same data, including the master node and the slave node. At any time in the cluster, there is only one master node, and the master node writes the data change operation to the Oplog (capping table), reads the Oplog from the node, and applies the local data to the operations in the Oplog to synchronize the data.

Characteristics
Asynchronous replication
Data from the master node is not replicated in real-time from the node
Automatic disaster recovery
Primary node down, initiate election
Read operation
Data read from a node may not be up-to-date
2.2. Replica set Members
A copy set contains up to 50 nodes and can be voted up to only 7 of the total. Contains the following node types
Master Node Primary
Reads and writes can be performed, and all nodes can perform read operations. By default, read requests are sent only to the primary node, which can be set through the read preference. The primary node has priority precedence of at least 1.
From node secondary
Only read operations can be performed. From the node by synchronizing with the master node to achieve the function of backing up data, the replica set has at least one slave node. By configuring the profile of the replica set, you can set whether the node participates in the election (vote=0) and whether the node that can be elected as the primary node (priority=0) has priority 0 nodes that cannot initiate an election, cannot be elected as the primary node, but can vote.
Hide Node
By setting the hidden property from the node, you can hide the node from the client, do not accept read and write requests, cannot be elected as the primary node (priority=0), and can only vote, mainly for backing up data.
Delay node
By setting the Slavedelay property of the hidden node, you can make the node delay to copy data from the master node for a certain time, which can play a role in protecting the data. The delay node is based on the hidden node, and a delay attribute is more.
Quorum node Arbiter
itself does not store data, can not be elected to the main node, can only vote, the quorum node is mainly used to make the number of nodes in the replication set is odd, so easy to reach the majority. The quorum node consumes very little resources, but does not deploy on the same physical node as the primary node and from the node.
Non-polling nodes
Do not participate in voting, but store data, can accept read operation
2.3. Replication Set Management
Use admin to switch to admin database
config={_id: "MySet", members:[{"_id": 0, "host": "127.0.0.1:28001", "Priority": 2},{"_id": 1, "host": "127.0.0.1:28002 "," Priority ": 1}]}
Rs.initiate (config)
Modifying a replica set configuration
Cfg=rs.conf ()
Cfg.members[0].priority=1
Rs.reconfig (CFG)
Replication Set Maintenance
Comment out the replset in the configuration file to start the replication set in stand-alone mode, and then add the replica set after maintenance is complete.
2.4. Most principles
Concept
If the number of nodes in the replication set is N, most of them are n/2+1 (N/2 down), and if the number of surviving nodes in the replication set is less than the majority, the primary node does not exist and the write service cannot be provided.

Role
Most of the principles guarantee that there will be no more than one master node in the replication set at any time. For example, the replication set deployed in two room, two computer room communication failure, does not contain the main node of the room will elect a master node, wait until the fault recovery, the replica set will have two master nodes, can not guarantee the consistency of data.
2.5. Election
Preconditions for election
Replication sets meet most of the principles. During the election process, the copy set cannot be written.

When elections will be triggered
When the replica set is initialized or is reconfigured
Primary node outage or primary node network unreachable, that is, most nodes cannot connect to the master node
The Rs.stepdown (n) command is executed by lowering the primary node to the slave node
Nodes with higher priority join the replica set

Election features
Nodes with high priority are selected as primary nodes
The node with the highest optime is selected as the primary node
If a node with high priority does not have the latest optime, the primary node's oplog will be synchronized first
A node with a priority of 0 cannot initiate an election and cannot be the primary node, and can only vote.
All members can veto elections, including non-voting nodes non-voting

When to veto an election
The node that initiated the election does not contain the latest data
The priority of the initiating election is lower than the other nodes
The elected node does not hold the highest optime
2.6. Data rollback
Concept: Before the primary node fails, the data on the master node is not fully replicated from the node, and the original master node re-joins the replica set after electing the new master node, which causes the old primary node to be inconsistent with the new master node, and the old master node rolls back inconsistent data, which is consistent with the master node data.

Avoid data rollback
By default, after a successful write on the primary node, the results are returned to the client, which can result in rollback, and the client may modify the write policy Writeconcern to return the results after writing to most nodes successfully.
2.7. Reading and writing Strategies
Writeconcern: The client returns the result without waiting for the primary node to write successfully, returns the result if the primary node is successfully written, and returns the result until most of the nodes have been successfully written.
Readconcern: Read-only primary node, read-Only slave node, priority primary node, priority slave node, least Read network latency node
2.8. Advantages and disadvantages of copy set
Advantages
Automatic disaster recovery. Master node down, vote to elect the main node, to ensure data security
Automatically backs up data without the need for manual intervention
Easy to scale
High data reliability
Disadvantages
High resource consumption
Does not solve the problem of load balancing
The data read by the client may not be persisted, for example: The client can read the most recently written data, but the data may have the possibility of disk write failure, and the data read by the client may occur rolled back

Reference
Sharding Cluster in MongoDB

MongoDB shard cluster and replica set

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More