Related concepts of MongoDB replica set "Go"

Source: Internet
Author: User
Tags failover joins mongodb client mongodb driver prefetch

I. Basic concepts of replica sets

Replica set (replica set)

MongoDB's replica set is a Mongod process instance cluster where data is replicated to each other in the cluster and is automatically failover.

MongoDB's database replication adds redundancy, ensures high availability, simplifies administrative tasks such as backup, and increases read capability. Replication is used for most product deployments. MongoDB Primary handles write operations, while other replication members are secondaries.

A replica set can support a maximum of 12 members, but only 7 members can participate in the poll.

Note: MongoDB It also provides a traditional Master/slave replication, which operates in the same way as the replica set, but Master/slave replication does not support automatic failover. It is easy to understand that in master and Standby mode,the CLI side is specified with the address and port for MongoDB access, while the replica set mode is accessed by MONGOs to hide the dynamic switch.

Member Configuration

A member can be one of the following roles:

Become a Primary

Visible to Client

Participation in voting

Delay synchronization

Copying data

Default

/

Secondary-only

/

/

Hidden

/

/

/

Delayed

/

Arbiters

/

/

/

/

Non-voting

/

/

Table: Replica set role properties

failover recovery

The replica set is capable of automatic failover recovery. If primary is dropped or unresponsive and the majority of replica set members are able to connect to each other, a new primary is selected.

In most cases, when the primary is down, unavailable, or not suitable for primary, failover occurs after a few seconds without the manager's intervention.

If the MongoDB deployment does not failover as expected, the following may be the problem:

    • The remaining members are less than half the number of replica sets
    • No member is eligible to become a primary

Rollback

In most cases, a rollback operation can gracefully recover from a situation where failover recovery is not possible.

The rollbacks operation occurs when the primary handles a write operation, but the other member does not successfully copy the primary before the line is dropped. When the previous primary starts copying, it shows rollback. If the operation is replicated to other members, the member is available and can be connected to most replica sets, there is no rollback.

Rollbacks removes those operations that are not replicated to ensure consistency of the data set.

Elections (Elections)

When any failover occurs, it is accompanied by the advent of an election, which determines which member will become primary.

Elections provide a mechanism for members of the replica set to automatically elect a new primary without the intervention of an administrator. Elections allow the replica set to recover quickly and resolutely from failures.

When the primary becomes unreachable, the secondary members elect, and the first member to receive the majority of votes becomes the new primary.

Member priority

In a replica set, each member has a priority, which can help determine the primary of the election. By default, all members have a priority of 1.

Consistency

In MongoDB, all read operations for primary are consistent with the result of the last write operation.

If the client has a read option configured to allow secondary read, the read operation can return results from secondary members that have not recently replicated updates or operations. In this case, the query operation may return to the previous state.

This behavior is sometimes referred to as eventual consistency, because the state of the Secodary member will eventually be the primary state. MongoDB cannot guarantee strong consistency of read operations from secondary members.

There is no way to guarantee the consistency of read from the secondary member unless the write operation succeeds on all nodes at configuration time before the success is returned.

Ii. replica set architecture and deployment patterns

Architecture

The architecture of a replica set deployment has a significant impact on its capacity and performance. It is sufficient for most products to be deployed with 3 Priority 1 members.

When developing a replica set schema, be aware of the following factors:

    • Make sure that members of the replica set always pick a primary. Run an odd number of members or run an Arbitrator (arbiter) + an even number of members.
    • Members that are geographically dispersed and know the status of members of the "group" in any network partition. Try to make sure that primary is elected in the members of the primary data center.
    • Consider that the replica set contains hidden or delayed members to support specialized features such as backup, reporting, and testing.
    • Consider reserving one or two members that are located in other datacenters, while configuring to make sure that they do not become primary.
    • Use replica set tags to create custom write rules to ensure that the application is able to control the threshold value of the write operation success. Use write rules to ensure that an operation passes an operation to a specified datacenter or to a machine with a different function before returning to success.

Deployment policy

There is no ideal replica set schema that can meet any deployment environment.

The minimum replica set recommended schema is a collection of three members, one of which is primary, and the other two for Secondary,secondary can become primary under certain circumstances.

If there are more than three members in the replica set, you will need to follow the following schema conditions:

    • There are an odd number of members participating in the poll in the collection. If you have an even number of voting members, deploy an arbitrator to make the number odd.
    • No more than 7 members participating in a poll at the same time in the collection
    • If you do not want some members to become primary during failover, set their priority to 0.
    • The majority of the members of the collection run in the primary data center

Geographically distributed sets

A geographically based distributed replica set can address a failure in a data center recovery. This collection contains at least one member of the collection in the backup datacenter.

Figure: Geographically based distributed replica set

If primary is dropped, the replica set selects a new primary, and if the primary datacenter and the standby datacenter fail to connect, the secondary of the data center cannot be primary. If the primary datacenter fails, it is necessary to manually recover the data from the standby data center.

It is important to note that in this architecture, it is important to note that in the primary data center to maintain an odd number of members participating in the poll, you need to add an arbitrator in the primary datacenter.

Non-producer Members

At some point, we might want a member to be able to copy the entire data set, but not make it a primary. This member can be used as a backup, support reporting operation, or as a cold standby. Such members are divided into the following categories:

    • Low priority: Sets the low priority of a member through local.system.replset.members[n].priority, making it impossible to become a primary
    • Hide (Hidden): This member cannot be primary and not visible to the client.
    • Poll (voting): This will change the number of votes in the copy set for the election.

Iii. Replication settings considerations, Application and development behavior

Write attention (write concerns)

Write attention (write concern) refers to the quality of each MongoDB write operation, describing the total amount of interest in applying the write operation results. If the write focus is set to weak or disabled, the app sends the write to MongoDB without waiting for the database to respond, and if it is set to strong attention, the write will wait for MongoDB's write operation to confirm. MongoDB offers different writing concerns to suit different scenarios.

Write the type of attention (in order from weak to strong):

    • Errors ignored: The write operation does not require MONGODB confirmation. This efficiency is highest because there is no need to wait for a response, but because it hides possible exceptions and errors, it poses a significant risk to the persistence and durability of the data. (Note: This type is not normally used)
    • Unacknowledged:mongodb does not send an ACK because the level of write attention is igore, but the driver receives and handles network errors.
    • Acknowledge:mongod will confirm the received write operation. At this level of write attention, clients can capture network, key duplication, and other exceptions. This is the default write attention level. The default write attention is called GetLastError without parameters.
    • Journaled:mongod will confirm the write operation after writing to the log. This ensures that the write operation is not lost when the Mongod is closed, guaranteeing the persistence of the write operation.
    • Replica acknowledge: This is set for write attention to the replica set. You can guarantee that the write operation propagates to the members of the replica set. See the write attention of the replica set (write Concern for Replica set).

Write attention to replica set (write Concern for Replica set)

MongoDB embeds write attention to ensure that the write operation succeeds in primary the replica set. Write attention after the write operation is complete, use the GetLastError command to get an object that contains an error message or an error-free acknowledgment.

    • Verifying write operations

The default write focus only confirms the write operation on the primary. You can configure the write attention of other members of the replica set through the W option of the GetLastError command.

The W option specifies the number of write operations that are copied to the replica set members, including primary. You can ensure that a write operation propagates to the majority of the members of the collection by specifying a number or majority.

    • Modify the default write concerns

You can configure your own default replica set behavior through GetLastError. Use Getlasterrordefaults to set the configuration of the replica set. The following command line creates a configuration that specifies that the write operation needs to be completed before the majority of the members can return.

CFG = rs.conf ()

Cfg.settings = {}

Cfg.settings.getLastErrorDefaults = {w: "Majority"}

Rs.reconfig (CFG)

    • Custom Write attention

You can use the replica set tag (tag) to create a custom write concern by using the Getlasterrordefaults and getlasterrormodes replica set settings.

For example: If you have a three-member replica set, they have the following markup:

{"Disk": "SSD"}

{"Disk": San, "Disk.san": san}

{"Disk": "Spinning"}

Then create a custom Getlasterrormodes value:

CFG = rs.conf ()

Cfg.settings = {getlasterrormodes: {san: {"Disk.san": 1}}}

Rs.reconfig (CFG)

Use this mode with the San to specify the W option:

Db.runcommand ({getlasterror:1, W:san})

This operation does not return until the label Disk.san is returned.

You can also use Getlasterrordefaults to set custom write concerns:

CFG = rs.conf ()

Cfg.settings.getLastErrorDefaults = {Ssd:1}

Rs.reconfig (CFG)

Reading preferences (read Preference)

Read preferences Describe how a MONGODB client routes read requests to members of a replica set.

By default, an app directs its read operations to primary in the replica set. Reading from primary ensures that the read operation returns the most recent document. Then, if an application does not need full real-time data, it can increase the throughput of read operations or reduce latency by distributing some or all of the read operations to the secondary members of the replica set.

The MongoDB driver allows client apps to read preferences for each connection, each collection, or each operation.

Read Preference mode is also valid for clients that are connected to a fragmented cluster through MONGOs.

Note: Distributed reads from secondary members can increase throughput if the read operation of an application is large in scale.

Read Preference Mode:

    • Primary

All read operations only access the primary of the current replica set, which is the default mode. If primary is not available, the read operation produces an error or throws an exception.

This mode is incompatible with the read preference of the tag set pattern.

    • Primarypreferred

In general, the data is read from the primary of the replica set when primary is not available, that is, during failover, from the secondary member of the replica set.

When the read preference contains the tag set, if primary is available, the client reads the data from the primary, otherwise reads the data from the secondary member of the specified label. If there is no secondary for the matching label, an error is generated.

    • Secondary

The read operation only reads data from the secondary member of the replica set. If no secondary is available, an error is generated or an exception is thrown.

When the read preference contains a set of tags, the client attempts to find the secondary member of the specified set of tags and randomly directs the read operation to one of the secondary members. If there is no matching secondary, an error is generated.

    • Secondarypreferred

In general, read operations read data from secondary members, but when there is only one primary member in the replica set, the data is read from primary.

When the read preference contains a set of tags, the client attempts to find the secondary member of the specified set of tags and randomly directs the read operation to one of the secondary members. If there is no matching secondary, an error is generated.

    • Nearest

A process that drives a member selection when reading data from the most recent collection member. The pattern does not focus on the type of member, whether it is a primary or secondary member.

When the read preference contains a set of tags, the client attempts to locate the collection member for the specified label and directs the read operation to any of the members.

Automatic retry

The link between the Mongod instances in the MongoDB drive and the replica set must balance two issues:

    • The client should attempt to get the current result, and any connection should read the data from the same replica set member as possible.
    • The client should minimize the unavailability of time due to connectivity issues, network problems, or replica set failover.

Here's how:

    • When the connection is stable, reuse the specified Mongod for as long as possible. The connection is bound together with the Mongod.
    • If the connection to Mongod fails, an attempt is made to connect to a new member after following the read preference mode.

The re-connect operation is transparent to the application itself. If the connection allows data to be read from secondary members, after reconnection, the app can receive two read results sequentially from different secondary. Depending on the state of individual secondary members, the document can reflect the state of the database at different times.

    • Error is returned only when attempting to connect three members of a collection by reading Preference mode and tag set. If the collection has fewer than 3 members, the client returns an error after connecting all existing members.

After receiving the error, the driver selects a new member using the specified read preference mode. If no read preference is specified, primary is used.

    • After the failover state is detected, the driver tries to flush the state of the replica set as soon as possible.

Request Federation

Reading data from secondary can reflect the state of the dataset at different points in time, because the secondary members of the replica set have varying degrees of lag relative to the latest state of primary. To prevent subsequent read operations from jumping over time, the driver binds the application thread to the specified collection member after the first read operation. Threads continue to read from the same member until the situation occurs:

    • The app performs a read operation with a different read preference setting
    • Thread termination
    • The client received a socket exception, which could be caused by a network error, or mongod the connection operation was closed during the failover process. This causes a re-connect operation, but it is transparent to the application.

If the application thread issues a query operation in primarypreferred mode when primary is not available, the thread will always have access to a secondary member, although primary will not be cut back after recovery. Similarly, if a thread initiates a secondarypreferred-mode query when all secondary members are down, the application thread will continue to read from the primary, despite secondary recovery.

Member Selection

The client's MONGOs instance of the drive and shard cluster periodically updates the state of the replica set: Which member is up or down, which member becomes the primary, and the delay for each Mongod instance.

For any operation against a non-primary member, the driver will:

    • Assemble a list of appropriate members, taking into account the type of member (e.g. secondary, primary, or all members)
    • Excludes members of all mismatched label sets if a label set is specified
    • Judging the right member from the client nearest
    • Creates a list of members that contains a ping distance to the absolute nearest member.
    • A member is randomly selected in these hosts, and the member receives the read operation.

Shards and MONGOs

In most shard clusters, one replica set provides each shard, where read preferences are also available. Read-preference reads in a shard cluster are identical to non-fragmented replica sets.

Unlike a simple replica set, in a shard cluster, all the client interactions with shards are connected to the replica set members through MONGOs. MONGOs is responsible for the application's read preference, which is transparent to the application.

There is no need to configure changes in a shard environment that supports all read preferences. All MONGOs hold their own connection pool to the members of the replica set.

Iv. internal composition and behavior of the replica set

Oplog Internal composition

In a variety of exceptional cases, updating a secondary oplog may be somewhat delayed than expected.

Members of all replica sets send a heartbeat (ping) package to all other members in the collection, and the ability to add the actions of other members of the collection to the local oplog.

The oplog operation of a replica set is idempotent. The following operations need to be idempotent:

    • Initial synchronization
    • Fast rollback (Post-rollback catch-up)
    • Shard Block Migration

Internal composition of Reading preferences

MongoDB uses a single master copy to ensure that the database remains consistent. However, the client may modify the read preference of each connection to distribute the read operation to the secondary member from the replica set. A read-only deployment enables more queries to be read to secondary members by distribution. But

Election of the internal composition

An election is the process by which a replica set chooses a member to become a primary. Primary is the only member in a replica set that can receive a write operation.

The following events can trigger an election:

    • Initializes a replica set for the first time
    • Primary failure. The Replsetstepdown command can invalidate the primary, or the current secondary member has the right elector and a higher priority. When primary loses contact with most members of the collection, it also fails, shutting down all client connections to prevent the client from writing data to an primary member.
    • A secondary member loses contact in primary. An election is initiated when a secondary member cannot establish a stable connection with primary.
    • Failover occurs.

During the election process, all members have one vote, including hidden members, arbitrators and members who are recovering. Arbitrary Mongod can veto an election. In the default configuration, all members have the same opportunity to become primary. However, you can influence the election by setting the priority level. In some structures, this may have operational factors to increase the likelihood of a member becoming a primary. For example, a member in a remote datacenter should not become a primary.

Any member can veto an election, although the member is a non-voting member.

A member will veto an election in the following cases:

    • If the member who initiated the election is not a member of a polling set
    • If the member who initiated the election did not update the latest copy set operation
    • If the member initiating the election has a lower priority than other members in the collection that are better suited to initiate the election
    • If a member can only be the newest member of the secondary at the time of the election, other suitable members of the collection catch up with the state of this secondary member and attempt to become primary
    • If the current primary has more recent operations than the members that initiated the election, from the polling member's perspective.
    • The election will be rejected if the current primary is in the same or updated operation as the member initiating the election.

The first member to receive the majority of the votes in the collection will become primary, only to the next election. Understand the following conditions and possible scenarios:

    • The replica set member sends a heartbeat packet every two seconds. If the heartbeat packet does not receive a response within 10 seconds, the other member marks the bad member as unreachable.
    • Replica set members are compared only with other members within the collection. The absolute value of the priority does not affect the result of the copy set election, except, of course, 0, which indicates that the member cannot be a primary or initiate an election.
    • If a replica set member has the highest operating time in the visible member, it cannot be a primary
    • If a member has the highest priority in the replica set, but it has a 10-second gap with the most recent Oplog operation, the collection does not elect primary, only the member with the highest priority to update to the most recent operation.

Synchronous

To be able to keep up with the recent state of the replica set, set members to synchronize or copy Oplog records from other members. The member synchronizes data at two different points:

    • The initial synchronization occurs when MongoDB creates a new database on a new or repaired member. When a new or repaired member joins or re-joins a collection, the member waits for the heartbeat packet of the other member to be received. By default, members are synchronized from the nearest collection member, and the member has a closer oplog record, whether it is primary or another secondary.
    • After the initial synchronization, replication continues to ensure that members of the replica set can maintain data updates.

Example:

    • If there are two secondary members in one data center and one primary in the other, two secondary members are most likely to synchronize from primary if the three instances are started at the same time (no data or oplog exist). Because any one of the secondary members does not have a closer oplog record. If one of the secondary members is restarted, then it is likely that the data will be synchronized from another secondary member when it is re-joined to the collection because of its proximity.
    • If there is a primary in one facility, one secondary member is in another facility. If you add a secondary member to another facility, it synchronizes the data from the existing secondary member because the member is closer than the primary distance.

After MongoDB2.2, the secondary members can also perform the following additional synchronization behavior:

    • If no other members are available, the secondary member synchronizes the data from the deferred member
    • Secondary members do not read data from hidden members
    • Secondary members do not start synchronizing from a member that is recovering from a state
    • Because one member is synchronized from another member, two members must have the same value in the Buildindexes field, whether true or false.

If the connection used to Oplog records does not respond for 30 seconds, the secondary member stops synchronizing from this member. If a connection times out, the member will re-select a new member for synchronization.

Multithreaded replication

The multi-threaded approach used by MongoDB for bulk write operations. The replication process separates each batch in a set of threads with a large number of concurrent operations.

Although multithreading can cause operations to be out of order, clients reading data from secondary members do not receive the returned documents, which reflects an intermediate state that does not exist in primary. To ensure consistency, MongoDB blocks all read operations while processing batch operations.

To improve the performance of the operational application, MongoDB gets the index of all memory pages that hold the data and the operations that the batch will affect. The prefetch stage minimizes the time that MongoDB holds a write lock.

prefetch indexes to increase the throughput of replication

By default, secondary improves the throughput of replication by indexing the document that is typically pre-fetching and affecting the operation. You can limit the use of prefetching only for _id domains, or turn this feature off completely.

V. Master/slave replication

From version 1.6 onwards, replica sets are replaced by master-slave replication. All new product architectures use replica sets instead of primary and standby replication.

Replica sets provide a feature superset of the master-slave relationship, which makes the product more robust. Master-slave replication takes precedence over replication, leaving it with a large number of non-primary nodes and copying operations only for a single database. However, master-slave replication provides a small amount of redundancy and does not automatically fail over.

failover to slave node (elevation)

Permanently switch from primary Node A, which is never available or destroyed, to slave B:

    • Close a
    • Stop B's Mongod
    • Back up and remove data files from all dbpath on the B node that start with local
    • Restart B's Mongod with the master option

Convert Master-Slave relations

If you have a master A and a from B, and you want to convert their relationship, follow the procedure below. This process assumes that a is healthy, up-to-date, and available.

If a is not healthy, but the hardware is normal, skip ① and ② and replace all a files with B files in step ⑧.

If a is not healthy and the hardware is not normal, use a new machine instead of a.

Convert Master-Slave deployment:

① using the Fsync command to stop a write operation on a

② Make sure that B is in a state.

③ off B

④ back up and remove data files from all dbpath on the B node that start with local

⑤ Restart B with the master option

⑥ a write operation on B, which allows Oplog to provide a new synchronization point in time

⑦ Close B,b currently has a new dataset file that starts with local

⑧ close A, replace all files on a dbpath path with local starting with a copy of the file under the DBPath path on B that starts with local. Confirm that you want to compress the local files for copy B, as these files can be very large.

⑨ start B with the master option

⑩ starts a with the slave option, but contains Fastsync

Re-sync when too many outdated restores

Asynchronously receives writes from the primary node from the node and pulls the data from the Oplog on the master node. The length of the Oplog is limited, and resynchronization is necessary if there is too much lag from the node. Resync from the node, connect the Mongod from the node, and execute the Resync command:

Use admin

Db.runcommand ({resync:1})

From the node chain

From nodes that cannot be worn into chains, they must be directly connected to the primary node. If a Slave node tries to act as a slave node from another node, you will see a hint in Mongod.

This article from: http://www.cnblogs.com/geekma/archive/2013/05/09/3068988.html

Related concepts of MongoDB replica set "Go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.