Http://redis.io/topics/distlock
Distributed locks are a useful technique when different processes need to access shared resources in mutually exclusive ways. There are a lot of three-party libraries and articles that describe how to implement a distributed lock manager with Redis, but the way these libraries are implemented varies greatly, and many simple implementations can achieve better reliability by simply adding a little more complex design. The purpose of this article is to try to present an official authoritative algorithm for implementing a distributed lock manager using Redis, which we call the Redlock, and we believe that this algorithm is more secure and reliable than the usual method. We also want the community to be able to analyze the algorithm together, to provide some feedback, and then we can use this as a basis for designing more complex and reliable algorithms, or better new ones.
Realize
Before describing the specific algorithm, here are the projects that have been implemented as a reference: REDLOCK-RB (Ruby Implementation). There is also a REDLOCK-RB branch that adds features that make it easier to implement distributed locks
- Redlock-py (Python implementation).
- redlock-php (PHP implementation).
- Phpredismutex (more complete implementation of PHP)
- Redsync.go (Go implementation).
- Redisson (Java implementation).
- Redis::D Istlock (Perl implementation).
- Redlock-cpp (c + + implementation).
- Redlock-cs (C#/.net implementation).
- Node-redlock (NodeJS implementation). Includes support for lock extension.
Safety and Reliability Assurance
Before describing our design, we would like to propose three attributes, which in our view are the basis for implementing efficient distributed locks.
- Security attributes: Mutex, no matter what time, only one client can hold the same lock.
- Efficiency attribute A: Does not deadlock, will eventually be locked, even if a lock-holding client is down or a network partition occurs.
- Efficiency attribute B: fault tolerance, as long as most redis nodes are working properly, the client should be able to acquire and release locks.
Why failover-based scenarios are not good enough
To understand what we want to improve, let's look at the current status of most Redis-based distributed lock tripartite libraries. The simplest way to implement a distributed lock with Redis is to create a key value in the instance, which typically has a timeout (this is the time-out feature of Redis), so each lock is eventually freed (see attribute 2 above). When a client wants to release a lock, it only needs to delete the key value. On the face of it, this approach seems to work, but there is a problem here: what if the Redis master node goes down if a single point of failure exists in our system architecture? One might say: Add a Slave node! You can use slave when Master goes down! However, this scenario is obviously not feasible because it does not guarantee the 1th security mutex attribute because Redis replication is asynchronous. In general, there is a clear competitive condition in this scheme (race condition), for example:
- Client A gets the lock on the master node.
- The master node went down before writing the key created by A to slave.
- Slave becomes the master node 4.B also got the same lock as a and a (because there is no information about the lock held in the original slave)
Of course, in some special scenarios, the previously mentioned scheme is completely free of problems, such as during the outage, multiple clients allow simultaneous lock, if you can tolerate this problem, then use this replication-based solution is completely no problem, Otherwise, we recommend that you follow the scenario described in this article.
Using the correct implementation of single instance
Before we talk about how to use other scenarios to break the limits of a single-instance scenario, let's look at whether there is any way to fix this simple scenario, because this scenario is expected to work if you can tolerate competitive conditions, and a single-instance implementation of distributed locks is the basis of the algorithm we'll be talking about later. To obtain a lock, use the following command: Set resource_name my_random_value NX PX 30000 The function of this command is to set the value of this key when only the key is absent (the effect of the NX option). The timeout is set to 30000 milliseconds (the function of the PX option) and the value of this key is set to "My_random_value". This value must remain unique in all clients that acquire the lock request. Basically this random value is used to ensure that the lock can be safely released, and we can use the following Lua script to tell REDIS: delete this key when and only if the key exists and the value is the value I expect.
if redis.call("get",KEYS[1]) == ARGV[1] then return redis.call("del",KEYS[1]) else return 0 end
This is important, because this avoids accidentally deleting the locks that other clients get, for example, a client gets a lock, is blocked by an operation for a long time, and then automatically releases the lock after the timeout period, then the client then tries to delete the lock that has actually been taken by another client. So the simple use of the DEL command may cause a client to delete the other client's lock, with the above script to ensure that each customer order with a random string ' signature ', so that each lock can only be deleted by the client to obtain the lock.
What should this random string be generated from? I assume that this is a 20-byte size string generated from/dev/urandom, but you can actually get a higher rate scheme to ensure that the string is unique enough. For example, you can use the RC4 encryption algorithm to generate a pseudo-random stream from/dev/urandom. There are simpler scenarios, such as using a millisecond UNIX timestamp with a client ID, which may not be secure enough, but it may be sufficient in most environments.
The timeout period for the key value, also known as "lock valid time". This is the auto-release time of the lock and the time a client can perform a task before other clients can preempt the lock, which is calculated from the point at which the lock was acquired. So now we have a very good way to get and release locks, in a non-distributed, single-point, guaranteed never-fail environment in this way without any problems, next we look at the inability to guarantee these conditions in a distributed environment what we should do.
Redlock algorithm
In the distributed version of the algorithm we assume that we have n Redis master nodes, which are completely independent and we do not have any replication or other implicit distributed coordination algorithms. We have described how to safely acquire and release locks in a single node environment. So we should naturally use this method to acquire and release locks in each single node. In our example we set N to 5, which is a relatively reasonable number, so we need to run 5 master nodes on different computers or virtual machines to ensure that they do not go down at the same time in most cases. A client needs to do the following to acquire the lock:
1. Gets the current time (in milliseconds).
2. Take turns to request a lock on N nodes with the same key and random values, in which case the client requests a lock on each master, and there is a much smaller timeout than the total lock release time. For example, if the lock auto-release time is 10 seconds, the timeout for each node lock request can be 5-50 milliseconds, which prevents a client from blocking for a long time on a master node that is down, and if a master node is unavailable, We should try the next master node as soon as possible.
3. The client calculates the time taken to acquire the lock in the second step, and the lock is considered successful only if the client successfully acquires the lock on most master nodes (3 in this case) and the total time spent does not exceed the lock release time.
4. If the lock is successful, the lock auto-release time is now the initial lock release time minus the time taken to acquire the lock.
5. If the lock acquisition fails, either because a successful lock is not more than half (n/2+1) or because the total consumption time exceeds the lock release time, the client releases the lock on each master node, which is the lock that he thought did not succeed.
Is the algorithm asynchronous?
The algorithm is based on the assumption that although there is no synchronous clock that can cross the process, but the different process times are moving at almost the same speed, this hypothesis is not necessarily entirely accurate, but the difference in the speed of the automatic release of the lock is negligible compared to the time-to-time differences between the different process times. This hypothesis is like a computer in the real world: every computer has a local clock, but we can say that in most cases the time difference between computers is very small. Now we need to refine our lock exclusion rules more, only if the client can do the work within T time to ensure that the lock is valid (see the 3rd step of the algorithm), T's calculation rule is the lock failure time T1 minus a delta value to compensate for the differences between the different processes (typically only a few milliseconds) If you want to learn more about similar systems based on finite clock differences, you can refer to this interesting article: Leases:an efficient fault-tolerant mechanism for distributed file cache consistency. 》
Failed retry
When a client acquires a lock failure, the client should retry after a random delay, in order to avoid the fact that no one will be able to get the lock when the client retries at the same time. In the same way, the faster the client tries to acquire a lock on most redis nodes, the less likely it is to have multiple clients competing for lock and retry time windows, so the most perfect scenario is that the client should send a set command to all REDIS nodes simultaneously in a multi-channel manner. It is important to stress here that if a client does not acquire a lock on a majority of nodes, it is necessary to release the lock on the node that acquires the lock successfully, so there is no need to wait until the key expires to regain the lock (but if the network partition occurs and the client cannot connect to the Redis node, Loss of system availability waiting for key timeout period)
Release lock
Releasing the lock is simple, because only the lock must be released at all nodes, regardless of whether the lock was successfully acquired before the node.
Proof of security
Is this algorithm safe or not? We can look at situations in different scenarios to understand why this algorithm is safe. Before we begin, let's assume that the client can acquire a lock on most nodes so that all nodes contain a key that has the same survival time. However, it is important to note that this key is set at a different point in time, so these keys will also time out at different times, but we assume that the worst case of the first key is set at T1 time (the time the client connects to the first server), The last key is set at T2 time (the client receives the last server time to return the results), starting from T2 time, we can confirm that the earliest time-out key will also exist at least min_validity=ttl-(T2-T1)-clock_drift, The TTL is the lock time-out, (T2-T1) is the time-consuming time to acquire the lock at the latest, Clock_drift is the difference between different processes, and this is used to compensate for the previous (T2-T1). Other keys will not time out after this point in time, so we can be sure that these keys are at least at the same time before this point.
In the time period when the key of most nodes is set, other clients cannot preempt this lock because it is impossible to acquire the lock successfully on N/2+1 client if the key of n/2+1 client is already present, so if a lock gets successful, It is not possible to regain the lock successfully at the same time (otherwise it violates the principle of distributed lock mutex), and then we also want to make sure that multiple clients trying to acquire the lock at the same time will not succeed at the same time. If a client acquires most of the node locks that are close to or exceeds the maximum effective time of the lock (that is, the TTL value we set for the set operation), then the system will assume that the lock is invalid while releasing the locks on those nodes, so we just need to consider the case where the majority of the node locks are less time-consuming than valid. In this case, according to our previous proof, in min_validity time, no client can regain the lock successfully, so multiple clients can successfully acquire the result of the lock at the same time, only occurs when the majority of the node acquires the lock much more than the TTL time. In fact, these locks will fail in this case. We are very excited and welcome someone who can provide a formulaic proof of this algorithm security, or discover any bugs.
Performance demonstration
The performance of this system is mainly based on the following three main features:
1. Lock auto-release feature (automatically released after timeout), a lock can be retrieved after a certain period of time.
2. The client usually releases the lock after the lock is no longer needed or when the task is completed, so we don't have to wait for the timeout to get the lock again.
3. When a client needs to retry to acquire a lock, the client waits for a period of time, and the waiting time is relatively longer than the time we regain most locks, which reduces the probability of deadlocks when different clients compete to lock resources.
However, we have to lose the TTL availability time on the network partition, so if the network partition continues to occur, this unavailability will persist. This situation occurs every time a client acquires a lock and is partitioned by the network before releasing the lock.
Basically, if a continuous network partition occurs, the system will continue to be unavailable.
Performance, failure recovery, and Fsync
Many users who use Redis as a lock server require not only low latency when acquiring locks and release locks, but also high throughput, that is, the number of locks that can be acquired and freed in a unit of time. In order to achieve this requirement, it is necessary to use multiplexing to communicate with n servers to reduce latency (or to use false multiplexing, that is, to set the socket to non-blocking mode, send all commands, and then read the returned command. Assume that the network round-trip latency between the client and the different Redis service nodes is not quite the same.
Then if we want the system to recover automatically, we also need to consider the problem of information persistence.
To better describe the problem, let's assume that our Redis is configured to be non-persistent, that a client has 3 locks in a total of 5 nodes, and that the three nodes that have been acquired into the lock are then restarted, so that we have 3 nodes to acquire the lock (the one with the restart plus two) This allows other clients to get the lock again, which violates the lock exclusion principle we said before.
It would be much better if we enabled the AOF persistence feature. For example, we can send the shutdown command to upgrade a Redis server and restart it because the Redis time-out is implemented at the semantic level, so when the server is switched off, the timeout is counted, and all of our requirements are met. And then this is based on what we're doing is a normal shutdown, but what if it's an unplanned outage? If Redis is configured by default to perform a Fsync sync file-to-disk operation on disk once per second, it is possible that our lock key will be lost after a reboot. Theoretically, if we want to ensure the security of the lock in the case of all service restarts, we need to set it in persistent settings to always perform the fsync operation, but this in turn will cause performance far worse than other systems with the same level of tradition used to implement distributed locks. Then the problem is not as bad as the first thing we looked at, basically, as long as a service node does not participate in all the locks that are still in use after the outage restarts, so that the security of the used lock set is maintained when the service node restarts, This ensures that the locks being used are held by all nodes that are not restarted. In order to satisfy this condition, we just have to let an instance of the outage restart, at least in the maximum TTL time we use to be in an unusable state, after this time, all the active locks will be released automatically. A policy that uses a delay restart can basically guarantee security without the need for any Redis persistence feature, and then note that this will inevitably affect the availability of the system. For example, if most of the nodes in the system are down, the entire system is in a globally unavailable state during the TTL time (global unavailable means that no locks are acquired).
Extended locks to make the algorithm more reliable
If the client's work is made up of small steps, it is possible to use a smaller default lock validity time, and extend the algorithm to implement a lock extension mechanism. Basically, if the client discovers that the lock is about to time out during the calculation, the client can send a LUA script to all service instances to allow the server to lengthen the lock, as long as the key of the lock is present and the value is equal to the value that the client obtains. The client should be able to regain the lock only if it cannot be extended during the expiration time (basically this is the same as the algorithm that acquires the lock) However, this does not change the algorithm in nature, so the maximum number of re-acquisition locks should be set to a reasonable size, otherwise performance will inevitably be affected.
Redis Implements distributed locks