Implementing distributed locks based on Redis

Source: Internet
Author: User
Tags redis server

Background
In many Internet product applications, some scenarios require lock processing, such as: seconds, global Increment ID, floor generation, and so on. Most of the solutions are based on DB, Redis is a single-process single-threaded mode, with queue mode to turn concurrent access into serial access, and multi-client connections to Redis do not have a competitive relationship. Next Redis provides some command setnx,getset, which can facilitate the implementation of distributed locking mechanism.

Redis Command Introduction
Using Redis to implement distributed locks, there are two important functions to introduce

SETNX command (SET if not eXists)
Grammar:
Setnx Key value
Function:
When and only if key does not exist, the value of key is set to value and returns 1, and if the given key already exists, then setnx does nothing and returns 0.

Getset command
Grammar:
Getset Key value
Function:
Sets the value of the given key to value and returns the old value of key, which returns an error when key is present but not a string type, and returns nil when key does not exist.

Get command
Grammar:
GET Key
Function:
Returns the string value associated with key and returns the special value nil if key does not exist.

del command
Grammar:
DEL key [Key ...]
Function:
Deleting a given key or keys, the nonexistent key is ignored.

You're not much of a soldier. Distributed locks, we rely on these four commands. But in the concrete implementation, there are many details that need to be carefully considered, because in the distributed concurrency multi-process, any point of error, will lead to deadlock, hold all processes.

Add Lock implementation

SETNX can be directly locked operation, for example, a keyword foo lock, the client can try to
Setnx Foo.lock <current Unix time>

If 1 is returned, indicating that the client has acquired the lock, it can go down, and after the operation is completed,
DEL Foo.lock

command to release the lock.
If 0 is returned, Foo has been locked by another client, and if the lock is non-clogging, you can choose to return the call. In the case of a blocking call, you need to enter the following retry loop until the lock is successfully acquired or the retry time-out. The ideal is beautiful, the reality is cruel. Only use SETNX lock with competitive conditions, in some specific cases will cause deadlock error.

Handling deadlocks

In the above process, if the client side of the acquisition lock takes too long to execute, the process is killed, or the lock cannot be released because of other abnormal crashes, causing a deadlock. Therefore, need to do the lock to do a timeliness test. Therefore, when we lock, we put the current timestamp as value in this lock, through the current timestamp and the time stamp in the Redis comparison, if a certain difference, think the lock is aging, to prevent lock indefinitely lock down, but, in the case of large concurrency, if simultaneously detect lock failure, and simple rough Delete deadlock , and then locked by SETNX, can result in competitive conditions, where multiple clients acquire locks simultaneously.

C1 gets the lock and crashes. C2 and C3 call Setnx lock back 0, get foo.lock timestamp, pass time stamp, Discover lock timeout.
C2 sends the DEL command to Foo.lock.
C2 sends Setnx to Foo.lock to acquire the lock.
C3 send del command to Foo.lock, at this time C3 send Del, actually del dropped is C2 lock.
C3 sends Setnx to Foo.lock to acquire the lock.

At this point both C2 and C3 acquire the lock, creating a race condition, and if in higher concurrency, more clients may acquire the lock. So, Del Lock operation, can not be used directly in the case of lock timeout, fortunately we have getset method, suppose we now have another client C4, see how to use Getset way to avoid this situation to produce.

C1 gets the lock and crashes. C2 and C3 call Setnx lock back 0, call the Get command to get the Foo.lock timestamp T1, by the time stamp, discover the lock timeout.
C4 sends the Geset command to Foo.lock,
Getset Foo.lock <current Unix time>
And get Foo.lock in the old time stamp T2

If t1=t2, the description C4 gets the timestamp.
If T1!=T2, C4 that there was another client C5 by calling Getset to get the timestamp, C4 did not get the lock. Only sleep down, into the next cycle.

The only problem now is whether C4 sets a new timestamp for the foo.lock to have an effect on the lock. In fact, we can see that the time difference between the C4 and the C5 is very small, and the write-in Foo.lock is all valid and error-free, so there is no effect on the lock.
In order for this lock to be stronger, the client that acquires the lock should call the Get method again to get the T1 when calling the critical business, and compare it to the T0 timestamp written, so that the lock is not known because other conditions are executed del unexpectedly. The above steps and situations are easy to see from other resources. Client-side processing and failure are complicated, not just because the crash is so simple, but also because some operations have been blocked for quite a while, and then the DEL command is tried (but then the lock is on the other client's hand). It may also cause deadlocks due to improper handling. It is also possible that the sleep setting is unreasonable, causing Redis to collapse under large concurrency. The most common problems are

What logic should I take when get returns nil?

The first kind of walk timeout logic
After the C1 client acquires the lock and finishes processing, Del Falls the lock before the Del lock. C2 set a timestamp to Foo.lock through Setnx T0 discovers that a client acquires the lock and enters the get operation.
C2 sends a GET command to Foo.lock to get the return value T1 (nil).
The C2 enters the getset process through t0>t1+expire comparison.
C2 calls Getset sends T0 timestamp to Foo.lock, returning Foo.lock's original value T2
C2 if the t2=t1 is equal, the lock is obtained, if T2!=T1, the lock is not acquired.

The second case goes through the setnx logic.
After the C1 client acquires the lock and finishes processing, Del Falls the lock before the Del lock. C2 set a timestamp to Foo.lock through Setnx T0 discovers that a client acquires the lock and enters the get operation.
C2 sends a GET command to Foo.lock to get the return value T1 (nil).
C2 Loop, enter next setnx logic

Both types of logic seem to be OK, but logically, the first situation has a problem. When get returns nil indicates that the lock is deleted instead of timed out, it should go setnx logic plus lock. The first situation is the problem is that the normal lock logic should go setnx, and now when the lock is lifted, go is getst, if the judging condition is improper, will cause a deadlock, very sad, I was doing when I met, how to see the following problems

What should I do when Getset returns nil?

C1 and C2 clients call the Get interface, C1 returns T1, at which point the C3 network situation is better, fast access to acquire the lock, and execute del delete lock, C2 return T2 (nil), C1 and C2 into the timeout processing logic.
C1 sends the Getset command to Foo.lock to get the return value T11 (nil).
C1 compared to C1 and C11 found that the two different, processing logic that does not acquire the lock.
C2 sends the Getset command to Foo.lock to get the return value T22 (the timestamp of the C1 write).
C2 compared to C2 and C22 found that the two different, processing logic that does not acquire the lock.

At this time C1 and C2 both think not to acquire the lock, in fact, C1 is already acquired lock, but his processing logic does not consider Getset return nil case, just pure with Get and getset value on the row contrast, as to why this situation occurs? One is a multi-client, after each client connects to Redis, the issued command is not sequential, resulting in a seemingly continuous command from a single client, to Redis server, the two commands may have been inserted into a large number of other commands issued by other clients, such as DEL,SETNX. In the second case, the time between multiple clients is not synchronized, or is not a strictly meaningful synchronization.

Issues with timestamps

We see that the value of Foo.lock is timestamp, so in the case of multiple clients, to ensure that the lock is valid, it is necessary to synchronize the time of each server, if there is a difference between the servers. Inconsistent time client, in judging the lock timeout, there will be deviations, resulting in a competitive condition.
Lock timeout or not, strictly rely on time stamp, time stamp itself also has the precision limit, if our time precision is seconds, from lock to execute operation and unlock, general operation can be completed in one second. That way, the case above us is easy to come by. Therefore, it is better to increase the time accuracy to the millisecond level. In this case, the lock at the millisecond level is guaranteed to be secure.

Problems with distributed locks

1: Necessary time-out mechanism: Once the client that gets the lock crashes, there must be an expiration mechanism, otherwise the client drops unable to acquire the lock, causing a deadlock problem.
2: Distributed lock, multi-client timestamp does not guarantee the consistency of strict meaning, so under some specific factors, there may be a lock string situation. To moderate the mechanism, can withstand the small probability of event generation.
3: Only the key processing node lock, good habit is to prepare the relevant resources, such as connection to the database, call lock mechanism to obtain the lock, direct operation, and then release, minimize the time to hold the lock.
4: During the holding lock to check the lock, if it is necessary to strictly rely on the state of the lock, it is best to do a lock check mechanism in the critical steps, but according to our test found that in large concurrency, each check lock operation, will consume a few milliseconds, And our entire lock processing logic is less than 10 milliseconds, the player did not choose to do the lock check.
5:sleep learning, in order to reduce the pressure on the Redis, to obtain the lock attempt, the loop must do a sleep operation. But how much sleep time is the gate of knowledge. You need to make reasonable calculations based on your Redis's QPS, plus the lock processing time.
6: As for why not using the Redis muti,expire,watch mechanism, you can look up a reference to find the reason.

Lock test Data

Sleep not used
First, the lock retries without sleep. One-time request, lock, execute, Unlock time


can see lock and unlock time are fast when we use

ab-n1000-c100 ' http://sandbox6.wanke.etao.com/test/test_sequence.php?tbpm=t '
AB Concurrent 100 Cumulative 1000 requests, the method is under pressure measurement.


We will find that the time to acquire the lock becomes, while holding the lock, the execution time becomes, and the time of the delete lock will be nearly 10ms time, why is this?
1: After holding the lock, our execution logic includes calling the Redis operation again, and Redis execution is noticeably slower in large concurrency scenarios.
2: Lock deletion time is longer, from the previous 0.2ms, into 9.8ms, performance decreased nearly 50 times times.
In this case, we measured a QPS of 49, finally found that the QPS and the total amount of pressure measurement, when we concurrent 100 total 100 requests, QPS get more than 110. When we use sleep,

When using sleep

Single Execution request

We see that performance is comparable when the sleep mechanism is not used. When compressing with the same pressure test conditions

The time to acquire the lock is significantly longer, while the lock release time is significantly shorter, only half of the sleep mechanism is not used. Of course the execution time becomes the result that we re-create the database connection during execution, resulting in longer time. At the same time, we can compare the command execution pressure of Redis.

The middle and the stiletto part is for the time without the sleep mechanism of the pressure mapping, Humpty Dumpty part of the use of sleep mechanism of the pressure map, see the pressure is reduced by about 50%, of course, sleep this way there is a disadvantage of the QPS decreased obviously, in our pressure measurement conditions, only 35, and some requests appear timeout. However, after the combination of various circumstances, we decided to use the sleep mechanism, mainly to prevent the large concurrency in the case of Redis crushing, very not, we have encountered before, so will certainly use the sleep mechanism.

Resources

Http://www.worlduc.com/FileSystem/18/2518/590664/9f63555e6079482f831c8ab1dcb8c19c.pdf
Http://redis.io/commands/setnx
Http://www.blogjava.net/caojianhua/archive/2013/01/28/394847.html

Implementing distributed locks based on Redis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.