Zookeeper implementation for parsing distributed locks (1), parsing zookeeper

Source: Internet
Author: User
Tags zookeeper client

Zookeeper implementation for parsing distributed locks (1), parsing zookeeper

There are currently three popular solutions for implementing distributed locks: database, Redis, and Zookeeper. This article mainly describes the distributed locks Based on Zookeeper, and the other two will be discussed later. Now let's take a look at how to implement distributed locks using Zookeeper.

What is Zookeeper?

Zookeeper (zk in the industry) is a centralized service that provides configuration management, distributed collaboration, and naming. These features are basic and essential functions of distributed systems, however, it is actually very difficult to implement these functions and maintain consistency and availability while achieving high throughput and low latency. Therefore, zookeeper provides these functions. developers can build their own distributed systems on zookeeper.

Although the implementation of zookeeper is complicated, the model abstraction provided by zookeeper is very simple. Zookeeper provides a multi-level node namespace (called znode). Each node is represented by a path separated by a slash, in addition, each node has a parent node (except the root node), which is very similar to a file system. For example,/foo/doo indicates a znode. Its parent node is/foo, its parent node is/, And/is the root node without a parent node. Different from the file system, these nodes can set associated data. In the file system, only file nodes can store data, but directory nodes cannot. To ensure high throughput and low latency, Zookeeper maintains the tree directory structure in the memory. This feature prevents Zookeeper from being used to store a large amount of data, each node can store up to 1 MB of data.

To ensure high availability, zookeeper needs to be deployed as a cluster, so that as long as most machines in the cluster are available (to tolerate certain machine failures ), zookeeper is still available. When using zookeeper, the client needs to know the Cluster machine list and establish a TCP connection with a machine in the cluster to use the service, the client uses this TCP link to send requests, obtain results, obtain listener events, and send heartbeat packets. If the connection is disconnected, the client can connect to another machine.

The architecture diagram is as follows:


 

The client's read requests can be processed by any machine in the cluster. If the read Requests Register a listener on the node, the listener is also processed by the connected zookeeper machine. For write requests, these requests are sent to other zookeeper machines at the same time and an agreement is reached before the request returns success. Therefore, as the number of zookeeper cluster machines increases, the Read Request throughput increases, but the Write Request throughput decreases.

Ordering is a very important feature in zookeeper. All updates are globally ordered, and each update has a unique timestamp. This timestamp is called zxid (Zookeeper Transaction Id ). The read request will only be in an updated order, that is, the result of the read request will contain the latest zxid of zookeeper.

How to Use zookeeper to implement distributed locks?

Before describing the algorithm flow, let's take a look at several interesting features about nodes in zookeeper:

Ordered node: if a parent node is/lock, we can create a subnode under the parent node. zookeeper provides an optional ordered feature, for example, if we can create a subnode "/lock/node-" and specify the order, zookeeper automatically adds an integer serial number based on the number of subnodes, that is to say, if it is the first created subnode, the generated subnode is/lock/node-0000000000, the next node is/lock/node-0000000001, and so on.

Temporary node: the client can create a temporary node. After the session ends or times out, zookeeper automatically deletes the node.

Event listening: when reading data, we can set event listening for the node at the same time. When the node data or structure changes, zookeeper notifies the client. Currently, zookeeper has four types of events: 1) node creation; 2) Node Deletion; 3) node data modification; 4) subnode change.

The following describes how to use zookeeper to implement distributed locks. Assume that the root node of the lock space is/lock:

The client connects to zookeeper and creates a temporary and ordered sub-node under/lock. The sub-node corresponding to the first client is/lock-0000000000, the second is/lock-0000000001, and so on.

The client obtains the subnode list under/lock and determines whether the subnode created by the client is the subnode with the smallest serial number in the current subnode list. If yes, the client obtains the lock, otherwise, the subnode of the listener/lock changes the message. After obtaining the subnode change notification, repeat this step until the subnode gets the lock;

Execute Business Code;

After completing the business flow, delete the corresponding sub-node release lock.

The temporary node created in step 1 can ensure that the lock can be released in case of a fault. Consider this scenario: If the subnode currently created by Client a is the node with the smallest serial number, after the lock is obtained, the machine on which the client is located goes down and the client does not actively delete the child node. If a permanent node is created, the lock will never be released, leading to a deadlock; because a temporary node is created, after the client goes down, after a certain period of time, zookeeper does not receive the heartbeat packet from the client to determine that the session is invalid, and deletes the temporary node to release the lock.

In addition, a careful friend may think about the atomic problem of getting the subnode list in step 2 and setting the listener. Consider this scenario: the sub-node of Client a is/lock-0000000000, and the sub-node of client B is/lock-0000000001. When client B obtains the sub-node list, it finds that it is not the smallest serial number, however, before setting the listener, client a deletes the sub-node/lock-0000000000 after completing the business process. Does the listener set by client B lose this event and thus wait forever? This problem does not exist. Because the listener setting and read operations in the APIS provided by zookeeper are performed in an atomic manner, that is to say, when reading the subnode list, the listener is set at the same time to ensure that no event is lost.

Finally, there is a major optimization point for this algorithm: if there are currently 1000 nodes waiting for the lock, the 1000 clients will be awakened when the clients that have obtained the lock release the lock, in this case, zookeeper needs to notify 1000 clients, which blocks other operations, in the best case, only the client corresponding to the new minimum node should be awakened. What should I do? When setting event listening, each client should set event listening for the child nodes that are just before it, for example, the subnode list is/lock-0000000000,/lock-0000000001,/lock-0000000002, delete a message from a subnode whose listening sequence number is 1 and that whose listening sequence number is 0. delete a message from a subnode whose listening sequence number is 2 and whose listening sequence number is 1.

The adjusted distributed lock algorithm process is as follows:

The client connects to zookeeper and creates a temporary and ordered sub-node under/lock. The sub-node corresponding to the first client is/lock-0000000000, the second is/lock-0000000001, and so on.

The client obtains the subnode list under/lock and determines whether the subnode created by the client is the subnode with the smallest serial number in the current subnode list. If yes, the client obtains the lock, otherwise, the listener deletes a message from a subnode that was just in the previous position. After receiving the subnode change notification, repeat this step until the subnode gets the lock;

Execute Business Code;

After completing the business flow, delete the corresponding sub-node release lock.

Source code analysis of Curator

Although the APIS exposed by the native zookeeper client are already very simple, it is still troublesome to implement a distributed lock... We can directly use the zookeeper Distributed Lock implementation provided by the open-source project curator.

We only need to introduce the following package (based on maven ):

Org. apache. curator

Curator-recipes

4.0.0

Then you can use it! The Code is as follows:

Publicstaticvoidmain (String [] args) throwsException {

// Create the zookeeper Client

RetryPolicyretryPolicy = newExponentialBackoffRetry (1000,3 );

CuratorFrameworkclient = CuratorFrameworkFactory. newClient ("10.21.41.181: 2181, 10.21.42.47: 2181, 10.21.49.252: 2181", retryPolicy );

Client. start ();

// Create a distributed lock. The Root Node path of the lock space is/curator/lock.

InterProcessMutexmutex = newInterProcessMutex (client, "/curator/lock ");

Mutex. acquire ();

// Obtain the lock for Business Flow

System. out. println ("Enter mutex ");

// Complete the business process and release the lock

Mutex. release ();

// Close the client

Client. close ();

}

The key core operations are mutex. acquire () and mutex. release (), which is too convenient!

Next we will analyze the source code implementation of the lock acquisition. The acquire method is as follows:

/*

* Get the lock. When the lock is occupied, the wait will be blocked. This operation supports reentrant (that is, repeat the lock) of the same thread. The number of acquire requests must be the same as the number of release requests.

* @ Throws Exception ZK errors, connection interruptions

*/

@ Override

Publicvoidacquire () throwsException

{

If (! InternalLock (-1, null ))

{

ThrownewIOException ("Lost connection while trying to acquire lock:" + basePath );

}

}

Note that acquire directly throws an exception when there is an exception in the communication with zookeeper and requires the user to retry the policy. InternalLock (-1, null) is called in the code, and the parameter indicates permanent blocking wait when the lock is occupied. The internalLock code is as follows:

PrivatebooleaninternalLock (longtime, TimeUnitunit) throwsException

{

// Here, the reentrant of the same thread is processed. If the lock has been obtained, the acquire count is increased in the corresponding data structure, and success is returned directly.

ThreadcurrentThread = Thread. currentThread ();

LockDatalockData = threadData. get (currentThread );

If (lockData! = Null)

{

// Re-entering

LockData. lockCount. incrementAndGet ();

Returntrue;

}

// The lock is actually obtained in zookeeper.

StringlockPath = internals. attemptLock (time, unit, getLockNodeBytes ());

If (lockPath! = Null)

{

// After obtaining the lock, record the information of the current thread to obtain the lock. You only need to increase the number of times in the LockData during re-entry.

LockDatanewLockData = newLockData (currentThread, lockPath );

ThreadData. put (currentThread, newLockData );

Returntrue;

}

// The lock still cannot be obtained when the returned result is blocked. The context processing implies a zookeeper communication exception.

Returnfalse;

}

A specific comment is added to the Code without being expanded. Let's take a look at the specific implementation of zookeeper's lock acquisition:

StringattemptLock (longtime, TimeUnitunit, byte [] lockNodeBytes) throwsException

{

// Parameter initialization, Which is omitted here

//...

// Obtain the spin lock

While (! IsDone)

{

IsDone = true;

Try

{

// Create a temporary and ordered subnode in the lock Space

OurPath = driver. createsTheLock (client, path, localLockNodeBytes );

// Determine whether to obtain the lock (the smallest subnode serial number). If the lock is obtained, the system returns the result directly. Otherwise, the lock is blocked and waiting for the notification to be deleted from the previous subnode.

HasTheLock = internalLockLoop (startMillis, millisToWait, ourPath );

}

Catch (KeeperException. nonodemo-tione)

{

// For NoNodeException, the Code ensures that the NoNodeException will be thrown only when the session expires. Therefore, retry Based on the retry policy.

If (client. getZookeeperClient (). getRetryPolicy (). allowRetry (retryCount ++, System. currentTimeMillis ()-startMillis, RetryLoop. getDefaultRetrySleeper ()))

{

IsDone = false;

}

Else

{

Throwe;

}

}

}

// If the lock is obtained, the path of the subnode is returned.

If (hasTheLock)

{

ReturnourPath;

}

Returnnull;

}

The code above mainly involves two steps:

Driver. createsTheLock: create a temporary and ordered subnode, which is easy to implement without expansion. mainly focus on the modes of several nodes: 1) PERSISTENT (permanent); 2) PERSISTENT_SEQUENTIAL (permanent and orderly ); 3) EPHEMERAL (temporary); 4) EPHEMERAL_SEQUENTIAL (temporary and ordered ).

InternalLockLoop: waits until the lock is obtained.

Let's take a look at how internalLockLoop judges the lock and blocks the wait. Here we delete some irrelevant code and only keep the main process:

// Spin until the lock is obtained

While (client. getState () = CuratorFrameworkState. STARTED )&&! HaveTheLock)

{

// Obtain the list of all subnodes and sort them by sequence number from small to large.

Listchildren = getSortedChildren ();

// Determine whether the current subnode is the smallest subnode Based on the serial number

StringsequenceNodeName = ourPath. substring (basePath. length () + 1); // + 1 to include the slash

PredicateResultspredicateResults = driver. getsTheLock (client, children, sequenceNodeName, maxLeases );

If (predicateResults. getsTheLock ())

{

// If it is the smallest subnode, the lock is obtained.

HaveTheLock = true;

}

Else

{

// Otherwise, obtain the previous subnode

Stringpreviussequencepath = basePath + "/" + predicateResults. getPathToWatch ();

// The object monitor is used for thread synchronization. When the previous subnode is deleted and wait () is performed when the lock is not obtained, the current subnode is deleted (that is, the lock is released, the callback will wake up this thread through yyall, and this thread continues to spin to determine whether to get the lock

Synchronized (this)

{

Try

{

// The getData () interface is used here instead of checkExists () because if the previous subnode has been deleted, an exception is thrown and the event listener is not set, while checkExists can also obtain information about whether a node exists, but with the listener set at the same time, this listener will never be triggered, which is a resource leakage for zookeeper.

Client. getData (). usingWatcher (watcher). forPath (previussequencepath );

// If the blocking wait time is set

If (millisToWait! = Null)

{

MillisToWait-= (System. currentTimeMillis ()-startMillis );

StartMillis = System. currentTimeMillis ();

If (millisToWait <= 0)

{

DoDelete = true; // wait for the time to arrive. Delete the corresponding subnode.

Break;

}

// Wait for the corresponding time

Wait (millisToWait );

}

Else

{

// Always wait

Wait ();

}

}

Catch (KeeperException. nonodemo-tione)

{

// When the above getData is used to set the listener, if the previous subnode has been deleted, a NoNodeException will be thrown, which only needs to be spin once without additional processing.

}

}

}

}

For more information about the logic, see comments. The event listener set in the Code simply wakes up the current thread to re-spin the judgment when the Event Callback occurs, which is relatively simple and will not be expanded.

If you want to learn more about distributed knowledge points, add the group: 537775426 (note the information). I will place the distributed knowledge points in the shared area of the group, I will also share some of my many years of experience in the group. I hope my work experience will help you avoid detours on the road to becoming an architect. Build your own technical system and technical cognition comprehensively and scientifically!


 
 

Summary:

The above is the Distributed Lock content based on Zookeeper. In my next article, I will introduce the Distributed Lock Based on Redis. If you are interested, please pay attention to it, obtain the latest data in real time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.