The lease and failure detection in Pacifica the strategy for leader election in-kafka

Source: Internet
Author: User

pacifica is Microsoft's replication technology in a log-based distributed storage System. because Configuration Manager maintains the true state of the current configuration, the primary node does not have to remain intact. this is because the configured local view does not have to be synchronized on different servers. in particular, we must avoid situations where an old master node and a new primary node are processing queries at the same time-the old master node may not be aware that a reconfiguration message has been created by the new master and has removed it from the configuration. Because the new master node can handle new updates, the old master node may also be working on queries that are out of date, so this violates strong consistency. Our solution is to use leases. By periodically sending Beacon, the master requests a lease from each node and waits for confirmation after sending the beacon. if a specified lease period has passed since the last confirmation beacon was sent, the master node considers the lease expired. When any lease from a node expires, the primary node no longer considers itself a master node and stops all queries or update processing. in this case, the master node contacts Configuration Manager from the current configuration to remove the slave node. as long as the sender still retains the master node in the current configuration , the slave node is considered beacon valid. If the grace period at which the last beacon is received from the master node has passed, the lease from the node is considered to have expired, and the Configuration Manager will be contacted to remove the current master node and turn itself into a new master node. Assuming there is no clock skew, the lease on the master node is determined to expire before it is processed from the node as long as the grace period is equal to or greater than the lease period. The slave node will first assume that the configuration has changed, and if only and if its lease expires on the old master node, the slave will attempt to play the role of the master node. therefore, before the new master node is determined, the old master node is reassigned, and the master node remains intact. we use the lease mechanism as the failure detection mechanism. Similar failure detection mechanisms are used in other systems, such as Gfs,boxwood and Bigtable/chubby. The key difference here is that in these systems leases are obtained from the central entity. in our scenario, the monitoring used for failure detection is always between two servers, because data processing exists between each other: the primary node communicates with the slave node when the update is processed, and the beacon and acknowledgment messages are also between the primary and the slave nodes. in this way, the failure detection can accurately capture the state of the communication channel used for replication. when the communication channel is busy, data processing messages can also be handled on their own, just like beacon and acknowledgement messages. only when the communication channel is idle ,The actual beacon and acknowledgment messages are sent, which minimizes the overhead of failure detection. Furthermore, in the implementation of the centralized, the load elimination and the dependence on the central entity should be considered. The load is obvious because the beacon and acknowledgment messages are always exchanged periodically between the central entity in the system and each server, and the time interval of the interchange must be quite small to guarantee fast failure detection. In a centralized scenario, the unavailability of a central entity (such as because of a network partition) can cause the entire system to become unusable because all primary nodes have to be reassigned when the lease information is lost by the central entity.

The lease and failure detection in Pacifica the strategy for leader election in-kafka

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.