MongoDB database high availability and partitioning solutions

Source: Internet
Author: User
Tags manual failover hash mongodb mongodb driver reset rollback mongodb support
MongoDB is a popular document database, which has features such as easy to use, easy to expand, rich in function and excellent in performance. The MongoDB itself has a highly available and partitioned solution, which is the replica set (Replica set) and fragmentation (sharding), which we'll look at in the following two features.

1. Replica set

It is said that the MongoDB replica set requires at least three nodes, but in fact this sentence is problematic, because the replica set node can be at least one, 3.0 before the maximum 12 nodes, 3.0 start node number can reach 50. However, when the number of nodes is 1 or 2, MongoDB cannot play the unique advantages of the replica set, so we generally recommend that the number of nodes is greater than 3.

First, let's look at the various roles of the MongoDB replica set.

Primary: Primary server, only one group, processing client requests, generally read and write

Secondary: From the server, there are more than one group, save a copy of the primary server data, one of the main server problems can be promoted from the server to the new home server, can provide read-only service

Hidden: Typically only for backup nodes, not processing client read requests

Secondary-only: Can not be a primary node, only as a secondary replica node, to prevent some of the performance of the node to become the main node

Delayed:slavedelay to set, in order not to handle client requests, the general need to hide

Non-voting: No voting secondary node, pure backup Data node.

Arbiter: Quorum node, do not save data, only participate in the election, can not be used

Then we think about how the MongoDB replica set is synchronized data, we understand the Oracle Dataguar synchronization mode, we also understand the MySQL master-slave synchronization mode, they are the transfer log to the standby and then apply the method, it is not difficult to imagine, MongoDB's replica set is basically the same path, and here we have to mention the core Oplog that synchronization relies on. Oplog, like MySQL's Binlog, records every operation performed on the master node, and secondary synchronizes data by copying Oplog and applying it. Oplog size is fixed, the default allocation of 5% of free space (64 bits), of course, we can also use the –oplogsize option to specify the size of the appropriate size in production applications is a very important link, we may wonder why? This is because Oplog and MySQL binary are different, it is recycled, it is different from the Oracle log, no more than a group of redo logs, and no archived logs. Oplog is a fixed size, recycled log files, when the secondary backward primary a lot, until the oplog is replicated, that can only be a full amount of synchronization, and pull total synchronization cost is particularly high, directly affect the primary read and write performance.

You may also ask if the MongoDB replica set is in real-time sync? This is actually a question of database consistency. MySQL's semi-synchronous replication mode guarantees the consistency of the database, and the Oracle Dataguard maximum protection mode can ensure the strong consistency of the database, while MongoDB can guarantee the writing security by GetLastError command, but it is not a transaction operation after all. Unable to achieve strong consistency of data.

The MongoDB replica set secondary is typically a few milliseconds behind, and can be more delayed if there are load problems, configuration errors, network failures, and so on.

The MongoDB replica set itself holds failover (Failover), manual switching (switchover), and read/write separation functions, you may be concerned about how the MongoDB replica set election, how to prevent brain crack and so on, don't worry, put it below to say. The MongoDB replica set by default is to request read and write pressure to the primary node, but we can set the Setslaveok to put the read pressure on each secondary, the MongoDB driver also provides five reading strategies (read Preferences), As follows:


    • Primary: Default parameter, only reads from the main node;

    • Primarypreferred: Most of the data read from the main node, only the main node is not available from the secondary node to read data;

    • Secondary: Only read from the secondary node, the problem is that the secondary node data than the primary node data "old";

    • Secondarypreferred: First read from the secondary node, the secondary node is not available to read data from the main node;

    • Nearest: Read data from the node with the lowest latency of the network, whether it is the primary node or the secondary node.

Next look at the MongoDB replica set of the election method, the election we can simply understand how to select the appropriate node from the cluster node to promote to primary process. Like many NoSQL databases, the MongoDB replica set uses the bully algorithm, which you see in the wiki documentation.

The idea is that each member of the cluster can declare itself to be the master node and notify other nodes that the node accepted by the other node can become the master node. The MongoDB replica set has the concept of "most", which must follow the "most" rule when it is elected, that the node becomes the primary node when most support is received, and that the number of nodes in the replica set must be greater than the number of "most".

MongoDB to 3.0 the number of replica set members broke through to 50, but more than 12 nodes started "most" 7.

MongoDB the following conditions trigger the election:


    • When the replica set is initialized;

    • When the backup node is unable to communicate with the primary node (possible primary node outage or network reason);

    • Primary manual demotion, Rs.stepdown (sec), default 60s.

Then look at the election steps:


    • Gets the last action timestamp for each server node. Each mongodb has oplog mechanism will record the operation of the machine, convenient and the main server to compare the data is synchronized can also be used for error recovery;

    • If most of the servers in the cluster are down, keep the surviving nodes secondary and stop the election;

    • If the elected master node in the cluster or all the last sync from the node looks old, stop the election and wait for manual action;

    • If none of the above is the case, select the server node with the last action timestamp to be the most up-to-date (keep the data up to date) as the primary node.

Some people may be in the design of the MongoDB replica set of the architecture process will produce a member node must be an odd number of misunderstandings, MongoDB replica set member nodes number is even a problem?

We see clearly in the picture above, in a single room regardless of the number of members of the replica set is even or odd is not a problem, but if it is two rooms, each computer room, the number of member nodes in the same, in two room between the heartbeat interrupted, the whole cluster will be unable to elect primary problems, This is the mongodb of the replica concentration in the brain. How to prevent the brain from cracking? From an architectural perspective, we recommend the following:

The left image is that "most" members are in a data center

Requirements: The primary of the replica set is always in the primary data center

Disadvantage: If the primary data center is hung, there is no available primary node

The figure on the right is the same number of two data center members, and the third place has a replica node for determining the outcome (but the quorum node)

Demand: cross-room disaster tolerance

Disadvantage: Additional need for a third room

Therefore, the MongoDB replica set of the number of odd number of the argument is actually targeted at the multi-room deployment of the scene.

In addition, during the design of the MongoDB replica set, we also need to consider the overload problem, because overload results in MongoDB database performance is very poor. Therefore, we must measure the amount of good reading, fully consider the possibility of reading and writing node downtime.

The MongoDB replica set also has some concepts of sync, heartbeat, rollback and so on, I simply sorted it out.

Synchronous

Initialize synchronization: Synchronize all other nodes from the replica set, triggering the condition:


    • When the secondary node first joins;

    • The secondary node lags behind oplog size data;

    • When the rollback fails.

To remain synchronized: Incremental synchronization after initializing synchronization

Note: The sync source is not a primary node and MongoDB chooses the synchronization source based on the ping time. When you select a sync source, you select a member that is closer to you and has more data than yourself.

Heartbeat


    • Primary node is? Which node is down? Which node can be used as a synchronization source? -Heartbeat to solve;

    • Each node sends a heartbeat request per 2s to the other node, maintaining its own status view based on its results;

    • Primary nodes know whether they meet the "most" condition through their heartbeat, and if they are not satisfied, they will abdicate and become secondary.

Rolling back

Primary executed a write request after the outage, the secondary node has not yet been able to replicate the write operation, it means that the new election primary does not have this write operation. When the original primary is restored and becomes secondary, it is necessary to roll back the write operation to be able to resynchronize. The rollback data volume is greater than 300M or needs to be rolled back for more than 30 minutes, the rollback fails, and the full amount of synchronization must be renewed.

2. Fragmentation

Fragmentation (sharding) is actually a data split, the data dispersed on multiple nodes, that is, horizontal split. MongoDB support automatic fragmentation, regardless of the advantages or disadvantages of automatic slicing, MongoDB still have this feature and proud.

MongoDB slices apply to the following scenarios:


    • When a single server cannot withstand the pressure, the pressure includes load, frequent write, throughput, etc.

    • The server is low on disk space;

    • Increase the available memory size and access it in memory with more data.

As shown in the figure above, the MongoDB fragment has a total of three components, as described below:

MONGOs: The portal of the database cluster request, which is responsible for forwarding the corresponding request of the data request to the corresponding Shard server. Multiple MONGOs are required in a production environment.

Config server: Save cluster and fragmented metadata, MONGOs configuration information on the configuration server at startup, and later if configuration server information changes will notify all MONGOs to update their status. A production environment requires multiple configuration servers. Regular backups are also required.

Shard server: Slices that actually store data. Production environment requirements are replica sets.

Here's a quick sketch of the slicing process:

Before slicing, you can think of a set as a single block, and all documents are contained within this block.

After the selection of slices, the collection is split into multiple pieces of data, this time the first block and the last block will appear $minkey and $maxkey, respectively, negative infinity and positive infinity, of course, this is MongoDB fragment internal use, we just know.

The data blocks that are split up are then distributed evenly across the nodes.

Some people may wonder how the pieces are split? I still have 4 drawings to explain.


    • MONGOs records the amount of data in each block, reaches a threshold, checks whether the need to split blocks;

    • such as split block, MONGOs update the block metadata of config server;

    • Config server is the birth of a new block, modify the scope of the old block (split point);

    • After the split, MONGOs reset the original block tracking and new block tracking;

Note: The block here (Chunk) is the concept of logic, a chunk is not actually stored in a page or a file and so on, but only in the config node in the metadata reflected. In other words, the split block modifies only the metadata and does not move data.

Split pieces of the process is also a hidden danger, such as the separation point can not find the resulting large blocks, as well as the configuration of the server can not be reached to split the storm.

Select the key is unreasonable: MONGOs found that the block to reach the threshold, and then request a fragmented split block, but fragmentation can not find the split point, which leads to larger blocks.

Chain reaction: Chip key unreasonable –> large block (can not be opened) –> block can not be moved –> result in uneven distribution of data –> and data write imbalance –> further increase the uneven distribution of data

Prevention: Correct selection of key

Config server unreachable: MONGOs does not communicate with the configuration server and cannot update the metadata, which results in a cyclical phenomenon: the attempt to switch back and forth between split and split failures, which in turn affects the performance of MONGOs and current fragmentation. The process of repeatedly initiating a split request without splitting is called splitting the Storm (split Storm).

Prevention:

1) To ensure the availability of the configuration server

2 reboot MONGOs, reset write counter

Having said so much, we don't know how to create a fragment, which is divided into two types:


    • Create a fragment from scratch: Generally is the new business on-line, architecture design at the beginning of the selection of fragmentation;

    • Converts a replica set to fragmentation: A service runs for a period of time, a single replica set is unable to meet the requirements and needs to be converted to fragmentation;

The first to create a slice from scratch nothing to say, select a good piece key is particularly critical, the second replica set into a fragment, the following process:


    1. Deploy config server and mongos;

    2. Connect MONGOs, add the original replica set to the cluster, the replica set will become the first fragment;

    3. Deploy other replica sets, and add them to the cluster as well;

    4. Modify the client configuration, and all access entries are changed to MONGOs;

    5. Select the slice key to enable fragmentation.

Note: In the existing set of fragments, you need to ensure that the TAB key on the index, if not, you need to create first.

A very important component in the MongoDB fragment is called The Equalizer (balancer), which is actually played by MONGOs. The equalizer is responsible for the migration of the Block (chunk), which periodically checks the balance of blocks between fragments, starting block migrations, if not balanced. The migration of a block does not affect the access and use of the application, and read and write requests the old block before migrating. If the metadata update completes, all MONGOs processes that attempt to access the old location data will get an error that is not perceived by the client, and MONGOs will silently handle the errors and repeat them on the new fragment.

There may be a misunderstanding here-fragment depending on the size of the data, keep in mind that the measure of balance between slices is the number of blocks, not the size of the data.

In some scenarios, block migrations can also result in performance, such as when using Hot Chip keys, because all new blocks are created on hotspots, and the system needs to process the data that is written to the hotspot fragment, and when adding new slices to the cluster, the equalizer triggers a series of migration processes.

For a long time we do not know how the block migration is done, the simple finishing process is as follows:


    1. The process of equalizer sends movechunk instructions to the source fragment;


    1. The source fragment begins to move the block, during which all operations on this block are routed to the source fragment;


    1. The target fragment creates all the indexes on the source fragment, unless the target fragment is already indexed;

    2. The target fragment starts the document in the request block and receives a copy of the data;

    3. After receiving the last document, the target fragment starts synchronizing all the changes that occur during the move block;

    4. After a full synchronization, the target fragment update configures the server's metadata (the new address of the block);


    1. After updating the metadata, confirming that there are no cursors open on the block, the source fragment deletes the data copy.

The above introduces the MongoDB architecture and fragmentation process, but in fact, the most important part of MongoDB fragmentation is the correct choice of key. What is a key? Select one or two fields in the collection to divide the data, which is called the key. We should be slicing the initial selection of key, after the operation to modify the key is very difficult.

How to choose the key of the film? We first from the point of view of data distribution to analyze the key, data distribution commonly used methods:


    • Ascending key: Keys that grow steadily over time, such as date, or Objectid.

* * MongoDB itself does not have a self-added primary key.

Phenomenon: The new data is concentrated on a fragment

Disadvantage: MongoDB is busy dealing with the balance of data


    • Key to random distribution: no regular keys in the dataset, such as user name, MD5 value, email address, uuid, etc.

Phenomenon: The growth rate of each fragment is basically the same, reduce the number of migrations

Disadvantage: The random request data exceeds the available memory size is inefficient


    • Positional key: Here "position" is an abstract concept, such as an IP address, latitude, or address.

Phenomenon: All documents that are close to the key value are saved in the same range of blocks.

We can also choose the appropriate slice key according to the application type, its strategy is as follows:


    • Hash sheet key (Hashed Shard key): Random distribution.

Application type: The pursuit of fast data loading, in a large number of queries using ascending keys, but also want to write data random distribution

Disadvantages: Unable to do the specified target range query through the hash key

Note: The unique option cannot be used, the array field cannot be used, and the value of the floating-point type is rounded first


    • Gridfs Hash key: Gridfs is ideal for fragmentation because it can contain a large amount of file data


    • Pipelining strategy: A server in the cluster performance more (such as SSD), using the label + Ascending slice key scheme to let the server handle more load

Disadvantage: If the request exceeds the processing power of a powerful server, it is not easy to expect load balancing to other servers

OK, about the key to our research here, in short, if the choice of MongoDB fragmentation, from the beginning of the initial slice of the application according to your type of correct choice of key, so as to let the chip play the best performance, and then your application has excellent performance capabilities.

MongoDB fragmentation is also nearing the end of the explanation, and finally we look at the automatic fragmentation (auto-sharding) and manual Fragmentation (pre-spliting).

Auto-sharding

Although the authorities say data migration has little impact on reading and writing, it is possible to squeeze hot data out of memory in the process, which increases IO pressure. So you can consider the normal turn off the automatic balance, the choice of pressure small time to carry out.

and MONGOs mobile Chunk is single-threaded, a single mongos can move only one block at a time.

Pre-spliting

Usually also called manual-sharding, you need to close the auto balance in advance. In this scenario, we need to fully understand their own data distribution, the data for the prior division, that is, for each slice of the appropriate size of the slice key range, and then with the manual move chunk to achieve manual fragmentation.

Automatic fragmentation is a very ideal choice, but the automatic fragmentation in the real application scene still will have a lot of pits, unless we continue to trample on this road and constantly landfills, have enough strength, enough experience, control each of its details, then we might as well choose automatic fragmentation. But many companies still avoid this road, choose manual slicing mode, the biggest reason is the manual fragmentation controllable ability.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.