Implement distributed locks Based on Redis and distributed locks Based on Redis

Source: Internet
Author: User

Implement distributed locks Based on Redis and distributed locks Based on Redis

Background
In many Internet product applications, some scenarios require locking, such as seckilling, global incremental ID, and floor generation. Most solutions are implemented based on databases. Redis adopts the single-process and single-thread mode and uses the queue mode to convert concurrent access to serial access. In addition, there is no competition between multiple clients for Redis connections. Second, Redis provides some commands, SETNX and GETSET, to facilitate the implementation of the distributed lock mechanism.

Redis command Introduction
There are two important functions to introduce when using Redis to implement distributed locks.

SETNX command (SET if Not eXists)
Syntax:
SETNX key value
Function:
If and only if the key does not exist, set the value of the key to value and return 1. If the given key already exists, SETNX does not take any action and returns 0.

GETSET command
Syntax:
GETSET key value
Function:
Set the value of the given key to value and return the old value of the key. If the key exists but is not of the string type, an error is returned. If the key does not exist, returns nil.

GET command
Syntax:
GET key
Function:
Returns the string value associated with the key. If the key does not exist, the special value nil is returned.

DEL command
Syntax:
DEL key [KEY…]
Function:
Delete one or more given keys. Keys that do not exist are ignored.

Not many soldiers are expensive. Distributed locks rely on these four commands. However, there are still many details in the specific implementation, which need to be carefully considered, because in distributed concurrent multi-process, any error may lead to deadlocks and hold all processes.

Lock implementation

SETNX can be directly locked. For example, if a keyword foo is locked, the client can try
SETNX foo. lock <current unix time>

If 1 is returned, the client has obtained the lock. You can perform the following operations.
DEL foo. lock

Command to release the lock.
If 0 is returned, it indicates that foo has been locked by other clients. If the lock is not blocked, you can choose to return the call. If the call is blocked, enter the following retry cycle until the lock is obtained successfully or the retry times out. The ideal is beautiful, and the reality is cruel. If you only use SETNX to lock a lock with a race condition, a deadlock error may occur in some specific situations.

Handle deadlocks

In the above processing method, if the client that obtains the lock takes a long time to execute, the process is killed, or the lock cannot be released due to other exceptions and crashes, a deadlock will occur. Therefore, the lock must be subject to time-sensitive detection. Therefore, when locking, we store the current timestamp as the value in this lock and compare it with the timestamp in Redis. If the difference is exceeded, it is deemed that the lock has expired, the lock can be prevented from being locked for an indefinite period. However, in the case of high concurrency, if the lock fails to be detected at the same time and the deadlock is deleted in a simple and crude manner, locks through SETNX may lead to competition conditions, that is, multiple clients obtain the lock at the same time.

C1 acquires the lock and crashes. After C2 and C3 call the SETNX lock, return 0 and obtain the foo. lock timestamp. By comparing the timestamp, the lock times out.
C2 sends the DEL command to foo. lock.
C2 sends SETNX to foo. lock to obtain the lock.
C3 sends the DEL command to foo. lock. When C3 sends the DEL command, DEL actually drops the C2 lock.
C3 sends SETNX to foo. lock to obtain the lock.

In this case, both C2 and C3 obtain the lock, which produces a competitive condition. In the case of higher concurrency, more clients may obtain the lock. Therefore, the DEL Lock operation cannot be used directly when the lock times out. Fortunately, we have the GETSET method. Suppose we have another client C4. Let's see how to use the GETSET method, avoid this situation.

C1 acquires the lock and crashes. After C2 and C3 call the SETNX lock and return 0, call the GET command to obtain the foo. lock timestamp T1. By comparing the timestamp, the lock times out.
C4 sends the GESET command to foo. lock,
GETSET foo. lock <current unix time>
And get the old timestamp T2 in foo. lock.

If T1 = T2, C4 obtains the timestamp.
If T1! = T2, which indicates that another client C5 obtained the timestamp by calling GETSET. C4 did not obtain the lock. You can only sleep down and enter the next loop.

The only problem is whether C4 sets the new timestamp of foo. lock to affect the lock. In fact, we can see that the difference between the execution time of C4 and C5 is very small, and the effective time written to foo. lock is incorrect, so there is no impact on the lock.
To make the lock stronger, the client that obtains the lock should call the GET method again to obtain T1 when calling key services, and compare it with the written T0 timestamp, to avoid unexpected unlocking of the lock due to DEL execution in other cases. The above steps and information can be easily seen from other references. The process and failure of the client are very complicated. It is not only a simple crash, but also because some operations are blocked for a long time, then the DEL command is executed (but the lock is on another client ). It may also cause deadlocks due to improper handling. It is also possible that the sleep Settings are unreasonable, resulting in Redis being overwhelmed under high concurrency. The most common problems are:

What logic should GET return to nil?

The first method follows the timeout logic.
After the C1 client acquires the lock and completes the processing, DEL the lock before the DEL lock. C2 uses SETNX to set the timestamp T0 to foo. lock and finds that a client obtains the lock and enters the GET operation.
C2 sends the GET command to foo. lock to obtain the returned value T1 (nil ).
C2 goes to the GETSET process through comparison of T0> T1 + expire.
C2 calls GETSET to send the T0 timestamp to foo. lock, and returns the original value of foo. lock T2.
C2: If T2 = T1 is equal, the lock is obtained. If T2! = T1, no lock is obtained.

The second case follows the setnx logic in a loop.
After the C1 client acquires the lock and completes the processing, DEL the lock before the DEL lock. C2 uses SETNX to set the timestamp T0 to foo. lock and finds that a client obtains the lock and enters the GET operation.
C2 sends the GET command to foo. lock to obtain the returned value T1 (nil ).
C2 loop, entering the next SETNX Logic

Both logics seem to be OK, but in terms of logic processing, there is a problem in the first case. If GET returns nil, the lock is deleted, rather than time-out, and should be locked using the SETNX logic. The first case is that the normal lock logic should go through SETNX. Now, when the lock is removed, GETST is used. Otherwise, a deadlock may occur, I met it when I was doing it. How can I see the following problems?

What should I do when GETSET returns nil?

The C1 and C2 clients call the GET interface, and C1 returns T1. In this case, the C3 network is better. The client quickly obtains the lock and executes DEL to delete the lock. C2 returns T2 (nil ), both C1 and C2 enter the timeout processing logic.
C1 sends the GETSET command to foo. lock to obtain the returned value T11 (nil ).
C1 is different from C1 and C11, and the processing logic determines that no lock is obtained.
C2 sends the GETSET command to foo. lock to obtain the returned value T22 (the timestamp written by C1 ).
C2 is different from C2 and C22, and the processing logic determines that no lock is obtained.

At this time, C1 and C2 both think that they have not obtained the lock. In fact, C1 has already obtained the lock, but its processing logic does not consider the case where GETSET returns nil, just use the GET and GETSET values for comparison. Why does this happen? One is that when multiple clients connect to Redis, the commands issued by each client are not consecutive, resulting in sequential commands seen from a single client, A large number of other clients may have been inserted between these two commands, such as DEL and SETNX. In the second case, the time between multiple clients is not synchronized or is not strictly synchronous.

Timestamp Problems

We can see foo. the lock value is a timestamp. Therefore, to ensure the lock is valid on multiple clients, you must synchronize the time of each server. If the time of each server is different. For clients with inconsistent time, deviations may occur when the lock times out, resulting in competition conditions.
Whether the lock times out or not depends on the timestamp strictly. The timestamp itself also has precision limitations. If our time precision is seconds, from locking to executing operations to unlocking, generally, operations can be completed within one second. In this CASE, we can easily see the CASE above. Therefore, it is best to increase the time precision to milliseconds. In this way, the lock in milliseconds can be ensured to be secure.

Distributed locks

1: The necessary timeout mechanism: Once the client that acquires the lock crashes, there must be an expiration mechanism. Otherwise, the client cannot obtain the lock due to the drop of other clients, resulting in a deadlock problem.
2: distributed locks. timestamps of multiple clients cannot guarantee strict consistency. Therefore, there may be lock strings under certain factors. A moderate mechanism is required to withstand events with a low probability.
3: only lock key processing nodes. It is a good habit to prepare relevant resources. For example, after connecting to the database, call the locking mechanism to obtain the lock, perform operations directly, and then release the lock, minimize the lock hold time.
4: Do you want to CHECK the lock when holding the lock? If you need to strictly rely on the lock status, it is best to CHECK the lock in the key step, but according to our test, in high concurrency, each CHECK lock operation consumes several milliseconds, and our entire lock holding logic is less than 10 milliseconds. The gamer did not choose to CHECK the lock.
5: sleep knowledge. To reduce the pressure on Redis, sleep must be performed between loops when trying to obtain the lock. However, the sleep time is a course of knowledge. You need to perform reasonable calculation based on your Redis QPS and lock holding time.
6: if you do not use Redis's muti, expire, watch and other mechanisms, you can check the reference materials and find the reason.

Lock Test Data

Sleep not used
First, no sleep is performed during lock retry. Single request, lock, execute, unlock time


We can see that the lock and unlock time are very fast, when we use

AB-n1000-c100 'HTTP: // sandbox6.wanke.etao.com/test/test_sequence.php? Tbpm = t'
AB concurrency 100 cumulative 1000 requests, this method for stress testing.


We will find that the time for obtaining the lock becomes, and after holding the lock at the same time, the execution time also becomes, while the time for deleting the lock is nearly 10 ms. Why?
1: after holding the lock, our execution logic includes re-calling the Redis operation. In the case of high concurrency, the execution of Redis is obviously slow.
2: The lock deletion time is longer, from 0.2 ms to 9.8 ms, and the performance is nearly 50 times lower.
In this case, the pressure test QPS is 49, and it is found that the QPS is related to the total pressure test volume. When we concurrency 100 for a total of 100 requests, the QPS gets more than 110. When we use sleep

When Sleep is used

When a single request is executed

We can see that the performance is equivalent to that when the sleep mechanism is not used. When the same pressure test conditions were used for compression

The lock acquisition time is obviously longer, and the lock release time is obviously shorter, which is only half of that without the sleep mechanism. Of course, the execution time becomes because we re-create the database connection during the execution process, resulting in a longer time. At the same time, we can compare the command execution pressure of Redis.

The middle and high parts are the pressure maps when the sleep mechanism is not used. The low part is the pressure map using the sleep mechanism. The pressure is reduced by about 50%. Of course, sleep has another drawback: The QPS drop is obvious. In our stress testing conditions, it is only 35, and some requests have timed out. However, based on various situations, we decided to adopt the sleep mechanism, mainly to prevent Redis from being overwhelmed in the case of high concurrency, which is not good. We have met before, so the sleep mechanism will definitely be used.

References

Http://www.worlduc.com/FileSystem/18/2518/590664/9f63555e6079482f831c8ab1dcb8c19c.pdf
Http://redis.io/commands/setnx
Http://www.blogjava.net/caojianhua/archive/2013/01/28/394847.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.