A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Into the fifth chapter, came to the core of the distributed system and complex content: copy and consistency . Typically, a distributed system holds copies of the same data on multiple machines connected over the network, so in this article, we'll look at how to manage and maintain these replicas, as well as the various problems encountered in this process.
In the data system, we typically have several reasons to use replica technology:
First, if the data of the replica does not change over time, the management of the replicas is simple: just copy the data to each node once and it's OK. The real difficulty with replica management is the modification of replica data, which involves a lot of trivial issues. Second, there are many tradeoffs to consider when replicating replicas, using synchronous or asynchronous replication, and how to handle failed replicas? Let's go to one by one to discuss the problem.2.leader-follower mechanism
How to guarantee the consistency of multiple replicas on different nodes has always been a core problem in distributed systems. The distributed system needs to be processed by each copy when it writes the data, otherwise the copy will no longer contain the same data. Leader-follower is a common mechanism for us to comb its principles:
Many relational databases use such mechanisms when synchronizing replicas, such as Postgresql,mysql,oracle Data Guard and SQL Server. At the same time, many non-relational databases and distributed Message Queuing also adopt such mechanisms, including MONGODB,RETHINKD,KAFKA,RABBITMQ.2.1 Synchronous and asynchronous replication
is the replication synchronous or asynchronous when the replica is in master-slave replication? (In a relational database, this is often a configurable option.) In other systems, such as Ceph, is the system default)
It is known that synchronous replication has a considerable delay, while asynchronous replication responds fairly quickly. But asynchronous replication does not guarantee how long it will take to complete. In some cases, follower data may be a few minutes or more behind the data on leader. Such as: There are network problems between the nodes or failure recovery of the node. If the leader fails and is not recoverable, any writes that have not been replicated to follower will be lost.
The advantage of synchronous replication is that the consistency between follower and leader is guaranteed, and once any one of the leader fails, any follower data is the same as leader. However, synchronous replication, once a network or node failure occurs, can cause the write to be processed. Leader must block all writes and wait for the copy on follow to be available again. If all the follower are synchronous, then any one node interruption will cause the whole system to be paralyzed. In practice, if synchronous replication is enabled on the database, one of the replicas is usually replicated synchronously and the other asynchronously. If the synchronized copy becomes unavailable or very slow, you can switch the synchronization operation to another asynchronous replica. This ensures that at least two nodes have an up-to-date copy of the data: leader and a synchronous follower. This configuration is called semi-synchronous . ( chained replication is also a replication mechanism similar to semi-synchronous , with no loss of data but still provides good performance and availability of replication methods.) ）2.2 Adding a new follower
Sometimes we need to add new follower to increase the number of replicas or replace failed nodes. At this point, you need to make sure that the new follower has a correct copy of the data. Copying a data file from one node to another is often not enough: the client keeps writing data to the system, so the data copy is always in a state of flux. This can be done simply by locking the system so that it rejects the client's write requests to keep the replicas consistent, but this greatly reduces the availability of the system. So we need a non-stop way to add new follower:
1. Take a snapshot of a copy of Leader at a point in time, and copy the snapshot to the newly joined follower node.
2. Follower connects to leader and requests all data changes to the leader after the snapshot is requested. This is usually the log sequence number of the leader node.
In a distributed system, any node can fail, and it is necessary to be able to restart a single node in the event of a non-stop operation and maintenance. Although each node fails, we need to make the impact of a node outage as small as possible.
On follower's local disk, a log of data changes received from leader is kept. When a follower crashes and restarts, or the network between leader and follower is temporarily interrupted. Follower can find the last transaction that was processed before the failure occurred from its log, and then connect to leader and request all data changes that occurred while the follower was disconnected. ( this process and adding new follower are actually the same idea )
It is obviously more tricky to handle leader failures: one of the follower needs to be promoted to a new leader, the client needs to identify and send subsequent requests to the new leader, while the other follower will need to start working under the new leader. Handling leader Failures is usually the following process:
1, confirm leader failure. Most systems use a timeout mechanism: If a node does not respond for a period of time, for example, 30 seconds, it is considered invalid. ( if the system is centralized, the lease mechanism can be used.) The author in the postgraduate stage of the Cassandra database has a systematic investigation, in Cassandra adopted by the Japanese scholar Naohiro Hayashibara "The Phi accrual Failure Detector" failure detection algorithm, It is a good solution to determine whether a node is invalid by multi-dimensional cumulant, which is very suitable for distributed systems with peer architecture.
2, select a new leader. In the centralized architecture, such as HDFS, the new leader can be specified with a centralized node. In the non-centralized architecture, it can be done through the electoral process, the distribution system of the election agreement is many:2pc,3pc,paxos,raft and so on.
3, adjust the system configuration to use the new leader. If the old leader returns to the cluster, it may still think of itself as leader, when it is necessary to ensure that the old leader become follower and acknowledge the new leader.
If it is an asynchronous copy of the scene, the new leader may have the full write information before the old leader. The most common solution is to discard the old leader before writing more than the new leader, but this clearly violates the requirements for data system write persistence.
In some fault scenarios, two nodes may appear to be leader, a condition known as brain fissure . At this point two leader will accept the write request and the data is likely to be lost or corrupted.
When to failover is also a problem to explore: longer timeouts mean longer recovery times in the case of leader failure. However, if the time is too short, there may be an unnecessary failover. For example, a temporary load spike can cause the node's response time to increase to a time-out, so unnecessary failovers can make the situation worse, not better. For this reason, some operations teams prefer to perform manual failovers, even if the system itself supports automatic failover.
Logs are critical in the consistency of the replicas, so let's briefly comb through the methods available for log replication:
(1) Nondeterministic functions such as now () get the current date and time or rand () to get a random number, which results in inconsistencies between replicas. ( Here you can change the thinking and replace the nondeterministic function call with a definite modified value .)
(2) If you use an auto-incrementing column, or if they depend on existing data in the database (for example, update ...) In < conditions >), they must perform exactly the same order in each copy, otherwise it will also produce inconsistencies. ( asynchronous forwarding, random order arrival.) This can be avoided by operating the serial number and other mandatory requirements. )
(3) statements with side effects (such as triggers, stored procedures, user-defined functions) can cause different side effects on each replica.
Write-ahead Log Replication
A log is a sequence of bytes that contains only all write operations. We can use the exact same log to build a replica on the other node. After leader writes the log to disk, it sends it over the network to follower. When follower processes this log, it constructs a copy of the data structure that is exactly the same as leader. The disadvantage of this approach is that the log describes the data at a very low level. This makes the data copy tightly coupled to the storage engine.
row-based Log Replication
Row-based is similar to the Write-ahead method, but it allows the replication log to be detached from the storage engine. This log is called a logical log, and the logical log is usually a description of the write operation on the granularity of a row:
For inserted rows, the log contains the new values for all columns.
For deleted rows, the log contains enough information to uniquely identify the deleted row. ( primary Key )
For updated rows, the log contains enough information to uniquely identify the updated row and the new values for all columns.
Because the logical logs are decoupled from the storage engine, it is easier to maintain backward compatibility, allowing leader and follower to run different versions of the data system, even different storage engines. Also, the logical log format is easier to parse for external applications. You can send the contents of a logical log to an external system, such as a data warehouse for offline analysis, or to build custom indexes and caches.
Replicas can increase the scalability of the system (processing more requests than a single machine) and reduce latency (place replicas closer to the user). The write operation must pass the leader copy, but the read-only query can be performed on any replica. for a write, multiple-read applications, the use of read-extended architecture is very reasonable. However, because of the reasons mentioned above, we usually do not adopt the synchronous replication method. This results in significant inconsistencies in the data: if you perform the same query on both leader and Follwer, you may get different results because not all writes are fed back on the follower in real time. This inconsistency is only temporary, so this situation is called final consistency.
For this situation we should deal with and understand, we tell ~ ~ ~ ( the fifth chapter of the content of fried chicken more, the next will be through a number of reading notes to everyone to comb, explain, next goodbye ~ ~)
Copy mechanism and copy synchronization------"Designing Data-intensive Applications" Reading notes 6
Start building with 50+ products and up to 12 months usage for Elastic Compute Service