The lease and failure detection in Pacifica the strategy for leader election in-kafka

Last Update:2015-08-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

pacifica is Microsoft's replication technology in a log-based distributed storage System. because Configuration Manager maintains the true state of the current configuration, the primary node does not have to remain intact. this is because the configured local view does not have to be synchronized on different servers. in particular, we must avoid situations where an old master node and a new primary node are processing queries at the same time-the old master node may not be aware that a reconfiguration message has been created by the new master and has removed it from the configuration. Because the new master node can handle new updates, the old master node may also be working on queries that are out of date, so this violates strong consistency. Our solution is to use leases. By periodically sending Beacon, the master requests a lease from each node and waits for confirmation after sending the beacon. if a specified lease period has passed since the last confirmation beacon was sent, the master node considers the lease expired. When any lease from a node expires, the primary node no longer considers itself a master node and stops all queries or update processing. in this case, the master node contacts Configuration Manager from the current configuration to remove the slave node. as long as the sender still retains the master node in the current configuration , the slave node is considered beacon valid. If the grace period at which the last beacon is received from the master node has passed, the lease from the node is considered to have expired, and the Configuration Manager will be contacted to remove the current master node and turn itself into a new master node. Assuming there is no clock skew, the lease on the master node is determined to expire before it is processed from the node as long as the grace period is equal to or greater than the lease period. The slave node will first assume that the configuration has changed, and if only and if its lease expires on the old master node, the slave will attempt to play the role of the master node. therefore, before the new master node is determined, the old master node is reassigned, and the master node remains intact. we use the lease mechanism as the failure detection mechanism. Similar failure detection mechanisms are used in other systems, such as Gfs,boxwood and Bigtable/chubby. The key difference here is that in these systems leases are obtained from the central entity. in our scenario, the monitoring used for failure detection is always between two servers, because data processing exists between each other: the primary node communicates with the slave node when the update is processed, and the beacon and acknowledgment messages are also between the primary and the slave nodes. in this way, the failure detection can accurately capture the state of the communication channel used for replication. when the communication channel is busy, data processing messages can also be handled on their own, just like beacon and acknowledgement messages. only when the communication channel is idle ,The actual beacon and acknowledgment messages are sent, which minimizes the overhead of failure detection. Furthermore, in the implementation of the centralized, the load elimination and the dependence on the central entity should be considered. The load is obvious because the beacon and acknowledgment messages are always exchanged periodically between the central entity in the system and each server, and the time interval of the interchange must be quite small to guarantee fast failure detection. In a centralized scenario, the unavailability of a central entity (such as because of a network partition) can cause the entire system to become unusable because all primary nodes have to be reassigned when the lease information is lost by the central entity.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The lease and failure detection in Pacifica the strategy for leader election in-kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The lease and failure detection in Pacifica the strategy for leader election in-kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support