Distributed Lock Based on zookeeper

Source: Internet
Author: User
Tags zookeeper client

There are currently three popular solutions for implementing distributed locks, which are based on databases, redis, and zookeeper. The first two solutions have a lot of information for your reference on the network. Let's take a look at how to implement distributed locks using zookeeper.

What is zookeeper?

Zookeeper (zk in the industry) is a centralized service that provides configuration management, distributed collaboration, and naming. These features are basic and essential functions of distributed systems, however, it is actually very difficult to implement these functions and maintain consistency and availability while achieving high throughput and low latency. Therefore, Zookeeper provides these functions. developers can build their own distributed systems on zookeeper.

Although the implementation of zookeeper is complicated, the model abstraction provided by zookeeper is very simple. Zookeeper provides a multi-level node namespace (called znode). Each node is represented by a path separated by a slash, in addition, each node has a parent node (except the root node), which is very similar to a file system. For example,/Foo/doo indicates a znode. Its parent node is/Foo, its parent node is/, And/is the root node without a parent node. Different from the file system, these nodes can set associated data. In the file system, only file nodes can store data, but directory nodes cannot. To ensure high throughput and low latency, Zookeeper maintains the tree directory structure in the memory. This feature prevents zookeeper from being used to store a large amount of data, each node can store up to 1 MB of data.

To ensure high availability, Zookeeper needs to be deployed as a cluster, so that as long as most machines in the cluster are available (to tolerate certain machine failures ), zookeeper is still available. When using zookeeper, the client needs to know the Cluster machine list and establish a TCP connection with a machine in the cluster to use the service, the client uses this TCP link to send requests, obtain results, obtain listener events, and send heartbeat packets. If the connection is disconnected, the client can connect to another machine.

The architecture diagram is as follows:

The client's read requests can be processed by any machine in the cluster. If the read Requests Register a listener on the node, the listener is also processed by the connected zookeeper machine. For write requests, these requests are sent to other zookeeper machines at the same time and an agreement is reached before the request returns success. Therefore, as the number of zookeeper cluster machines increases, the Read Request throughput increases, but the Write Request throughput decreases.

Ordering is a very important feature in zookeeper. All updates are globally ordered, and each update has a unique timestamp. This timestamp is called zxid (zookeeper transaction ID ). The read request will only be in an updated order, that is, the result of the read request will contain the latest zxid of zookeeper.

How to Use zookeeper to implement distributed locks?

Before describing the algorithm flow, let's take a look at several interesting features about nodes in zookeeper:

  • Ordered node:If a parent node is/lock, we can create a subnode under the parent node. zookeeper provides an optional ordering feature, for example, if we can create a subnode "/lock/node-" and specify the order, Zookeeper automatically adds an integer serial number based on the number of subnodes, that is to say, if it is the first created subnode, the generated subnode is/lock/node-0000000000, the next node is/lock/node-0000000001, and so on.
  • Temporary node:The client can create a temporary node. After the session ends or times out, Zookeeper automatically deletes the node.
  • Event listening:When reading data, we can set event listening for the node at the same time. When the node data or structure changes, Zookeeper notifies the client. Currently, Zookeeper has four types of events: 1) node creation; 2) Node Deletion; 3) node data modification; 4) subnode change.

The following describes how to use zookeeper to implement distributed locks. Assume that the root node of the lock space is/lock:

  • The client connects to zookeeper and creates a temporary and ordered sub-node under/lock. The sub-node corresponding to the first client is/lock-0000000000, the second is/lock-0000000001, and so on.
  • The client obtains the subnode list under/lock and determines whether the subnode created by the client is the subnode with the smallest serial number in the current subnode list. If yes, the client obtains the lock, otherwise, the subnode of the listener/lock changes the message. After obtaining the subnode change notification, repeat this step until the subnode gets the lock;
  • Execute Business Code;
  • After completing the business flow, delete the corresponding sub-node release lock.

The temporary node created in step 1 can ensure that the lock can be released in case of a fault. Consider this scenario: If the subnode currently created by Client A is the node with the smallest serial number, after the lock is obtained, the machine on which the client is located goes down and the client does not actively delete the child node. If a permanent node is created, the lock will never be released, leading to a deadlock; because a temporary node is created, after the client goes down, after a certain period of time, Zookeeper does not receive the heartbeat packet from the client to determine that the session is invalid, and deletes the temporary node to release the lock.

In addition, a careful friend may think about the atomic problem of getting the subnode list in step 2 and setting the listener. Consider this scenario: the sub-node of Client A is/lock-0000000000, and the sub-node of client B is/lock-0000000001. When client B obtains the sub-node list, it finds that it is not the smallest serial number, however, before setting the listener, client a deletes the sub-node/lock-0000000000 after completing the business process. Does the listener set by client B lose this event and thus wait forever? This problem does not exist. Because the listener setting and read operations in the APIS provided by zookeeper are performed in an atomic manner, that is to say, when reading the subnode list, the listener is set at the same time to ensure that no event is lost.

Finally, there is a major optimization point for this algorithm: if there are currently 1000 nodes waiting for the lock, the 1000 clients will be awakened when the clients that have obtained the lock release the lock, in this case, Zookeeper needs to notify 1000 clients, which blocks other operations, in the best case, only the client corresponding to the new minimum node should be awakened. What should I do? When setting event listening, each client should set event listening for the child nodes that are just before it, for example, the subnode list is/lock-0000000000,/lock-0000000001,/lock-0000000002, delete a message from a subnode whose listening sequence number is 1 and that whose listening sequence number is 0. delete a message from a subnode whose listening sequence number is 2 and whose listening sequence number is 1.

The adjusted distributed lock algorithm process is as follows:

  • The client connects to zookeeper and creates a temporary and ordered sub-node under/lock. The sub-node corresponding to the first client is/lock-0000000000, the second is/lock-0000000001, and so on.
  • The client obtains the subnode list under/lock and determines whether the subnode created by the client is the subnode with the smallest serial number in the current subnode list. If yes, the client obtains the lock, otherwise, the listener deletes a message from a subnode that was just in the previous position. After receiving the subnode change notification, repeat this step until the subnode gets the lock;
  • Execute Business Code;
  • After completing the business flow, delete the corresponding sub-node release lock.

Source code analysis of curator

Although the APIS exposed by the native zookeeper client are already very simple, it is still troublesome to implement a distributed lock... We can directly use the zookeeper Distributed Lock implementation provided by the open-source project curator.

We only need to introduce the following package (based on maven ):

<dependency>     <groupId>org.apache.curator</groupId>     <artifactId>curator-recipes</artifactId>     <version>4.0.0</version> </dependency>

Then you can use it! The Code is as follows:

Public static void main (string [] ARGs) throws exception {// create the zookeeper client retrypolicy = new exponentialbackoffretry (1000, 3); curatorframework client = curatorframeworkfactory. newclient ("10.21.41.181: 2181, 10.21.42.47: 2181, 10.21.49.252: 2181", retrypolicy); client. start (); // create a distributed lock. The Root Node path of the lock space is/curator/lock interprocessmutex mutex = new interprocessmutex (client, "/curator/lock"); mutex. acquire (); // gets the lock and performs the business process system. out. println ("Enter mutex"); // completes the business process and releases the lock mutex. release (); // close the client. close ();}

The key core operations are mutex. Acquire () and mutex. Release (), which is too convenient!

Next we will analyze the source code implementation of the lock acquisition. The acquire method is as follows:

/** Get the lock. When the lock is occupied, the wait will be blocked. This operation supports reentrant of the same thread (that is, repeated get the lock ), the number of acquire times must be the same as that of release. * @ Throws exception ZK errors, connection interruptions */@ override public void acquire () throws exception {If (! Internallock (-1, null) {Throw new ioexception ("lost connection while trying to acquire lock:" + basepath );}}

Note that acquire directly throws an exception when there is an exception in the communication with zookeeper and requires the user to retry the policy. Internallock (-1, null) is called in the code, and the parameter indicates permanent blocking wait when the lock is occupied. The internallock code is as follows:

Private Boolean internallock (long time, timeunit unit) throws exception {// reentrant of the same thread is handled here. If the lock has been obtained, in this case, the number of acquire times is increased in the corresponding data structure, and the success thread currentthread = thread is returned directly. currentthread (); lockdata = threaddata. get (currentthread); If (lockdata! = NULL) {// re-entering lockdata. lockcount. incrementandget (); Return true;} // here, the lock string lockpath = internals is actually obtained in zookeeper. attemptlock (time, unit, getlocknodebytes (); If (lockpath! = NULL) {// After the lock is obtained, record the information of the current thread to obtain the lock. When re-importing, you only need to increase the number of times in lockdata to lockdata newlockdata = new lockdata (currentthread, lockpath); threaddata. put (currentthread, newlockdata); Return true;} // The lock is still not obtained when the returned result is blocked. The context processing implies that the zookeeper communication exception returns false ;}

A specific comment is added to the Code without being expanded. Let's take a look at the specific implementation of Zookeeper's lock acquisition:

String attemptlock (long time, timeunit unit, byte [] locknodebytes) throws exception {// parameter initialization, skipped here //... // spin get lock while (! Isdone) {isdone = true; try {// create a temporary and ordered sub-node ourpath = driver in the lock space. createsthelock (client, path, locallocknodebytes); // determines whether to obtain the lock (the smallest subnode serial number). If the lock is obtained, the system returns the result directly, otherwise, the hasthelock = internallockloop (startmillis, millistowait, ourpath);} catch (keeperexception. nonodeexception e) {// For nonodeexception, the Code ensures that the nonodeexception will be thrown only when the session expires. Therefore, retry if (client. getzookeeperclient (). getretrypolicy (). allowretry (retrycount ++, system. currenttimemillis ()-startmillis,
Retryloop. getdefaultretrysleeper () {isdone = false;} else {Throw e ;}}// if the lock is obtained, return the path of the subnode if (hasthelock) {return ourpath ;} return NULL ;}

The code above mainly involves two steps:

  • Driver. createsthelock: create a temporary and ordered subnode, which is easy to implement without expansion. mainly focus on the modes of several nodes: 1) Persistent (permanent); 2) persistent_sequential (permanent and orderly ); 3) ephemeral (temporary); 4) ephemeral_sequential (temporary and ordered ).
  • Internallockloop: waits until the lock is obtained.

Let's take a look at how internallockloop judges the lock and blocks the wait. Here we delete some irrelevant code and only keep the main process:

// Spin until the lock while (client. getstate () = curatorframeworkstate. Started )&&! Havethelock) {// obtain the list of all subnodes and sort the list by sequence number from small to large <string> Children = getsortedchildren (); // judge whether the current subnode is the smallest subnode string sequencenodename = ourpath Based on the serial number. substring (basepath. length () + 1); // + 1 to include the slash predicateresults = driver. getsthelock (client, children, sequencenodename, maxleases); If (predicateresults. getsthelock () {// if it is the smallest subnode, the obtained lock havethelock = true ;} else {// otherwise, obtain the previous subnode string previussequencepath = basepath + "/" + predicateresults. getpathtowatch (); // here, the object monitor is used for thread synchronization. When the lock is not obtained, the system listens to the previous subnode to delete the message and performs wait (), when a subnode is deleted (that is, the lock is released,
The callback will wake up this thread through yyall. This thread continues to spin to determine whether to obtain the lock synchronized (this) {try {// here the getdata () interface is used instead of checkexists () because, if the previous subnode has been deleted, an exception is thrown and the event listener is not set,
Although checkexists can also obtain information about whether a node exists, but with the listener set, this listener will never be triggered. For zookeeper, It is a resource leakage client. getdata (). usingwatcher (watcher ). forpath (previussequencepath); // If the blocked wait time is set, if (millistowait! = NULL) {millistowait-= (system. currenttimemillis ()-startmillis); startmillis = system. currenttimemillis (); If (millistowait <= 0) {dodelete = true; // wait for the time to arrive and delete the corresponding child node break ;} // wait for the appropriate time for wait (millistowait);} else {// always wait for wait () ;}} catch (keeperexception. nonodeexception e) {// when getdata is used to set the listener, if the previous subnode has been deleted, a nonodeexception will be thrown. You only need to spin it once without additional processing }}}}

For more information about the logic, see comments. The event listener set in the Code simply wakes up the current thread to re-spin the judgment when the Event Callback occurs, which is relatively simple and will not be expanded.

Distributed Lock Based on zookeeper

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.