A distributed lock is a very useful primitive in many environments, and it is the only way to share resources between mutually exclusive operations of a process. There are a number of development libraries and blogs that describe how to implement DLM (distributed Lock Manager) using Redis, but each development library uses a different approach, and many libraries are implemented in simple, low reliable ways compared to more complex designs and implementations.
This article attempts to provide a more standard algorithm for implementing distributed locks using Redis. We propose an algorithm called Relock, which implements a DLM (distributed lock Management) that we think is more secure than a single instance of vanilla. We want the community to analyze it and provide feedback as an implementation of a more complex or alternative design.
Realize
Before you say specific algorithms, here are some specific implementations for your reference.
- REDLOCK-RB (Ruby implementation).
- redlock-php (PHP implementation).
- Redsync.go (Go implementation).
- Redisson (Java implementation).
Security and Activity Assurance
From the minimum guaranteed granularity of effective distributed locks, our model uses only 3 attributes, as follows:
1. Attribute security: Mutually exclusive lines. At any time, only one client can get the lock.
2. Active attribute A: deadlock free. Even if a client already owns a lock that is corrupted or fragmented, it may request additional locks.
3. Active attribute B: fault tolerance. As long as most redis nodes are available, the client can acquire and release locks.
Why implementation based on fault tolerance is not enough
To understand the improvements we have made, we must first analyze the current practice of distributed locks based on Redis.
The easiest way to use Redis to lock resources is to create a pair of key-value values. Using the Redis timeout mechanism, the key is created to have a certain lifetime, so it will eventually be released. And when the client wants to release, delete key directly on the line.
In general this works well, but there is a problem: this is a single point of the system. What if the Redis main node hangs? Of course, we can add a child node, the main node can be switched when the problem. Unfortunately, this scenario is not feasible because the redis of the master-copy is asynchronous, and we cannot use it to implement mutually exclusive security features.
This is obviously a race condition for the model:
- Client A has acquired a lock on the primary node.
- The primary node is dead, and the write synchronization to the node is not complete.
- is promoted from a node to a primary node.
- Client B obtains the same lock as a. Note that the lock security is compromised!
Sometimes this works fine in some cases, for example, when an error occurs, multiple clients can get the same lock. If this is exactly what you want, you can use the master-copy scheme. Otherwise, we recommend using the method described in this article.
The correct implementation scheme of single instance
Before attempting to address the pitfalls of the single instance scenario described above, let us make sure that this simple scenario is correct, because it is acceptable for some programs and is the basis of the distributed scenario we are about to describe.
To get the lock, the method is this:
Copy Code code as follows:
SET resource_name my_random_value NX PX 30000
This instruction sets the value of the key to take effect only if it does not exist (NX option) and to set its lifetime to 30000 milliseconds (px option). The value associated with the key is "My_random_value". This value must be unique across all clients and all lock requests.
The main purpose of using random values is to be able to release the lock securely, combining the same processing logic: Deleting the key is worthwhile and only if it exists and its value is what we expect. Look at the following LUA code:
Copy Code code as follows:
If Redis.call ("Get", keys[1]) = = Argv[1] Then
Return Redis.call ("Del", Keys[1])
Else
return 0
End
This is important to avoid accidentally deleting locks created by other clients. For example, a client obtains a lock, but its processing time is longer than the effective length of the lock, and then it deletes the lock, which may then be obtained by another client. Simply deleting is not safe enough, and it is likely that the locks on other clients will be deleted. With the above code, each lock has a unique random value, so it is only deleted if the value is still the value set by the client.
So how do you generate this random value? We're using 20 bytes read from/dev/urandom, but you can also find a simpler way to do so as long as you're satisfied with the task. For example, you can use/dev/urandom to initialize the RC4 algorithm and then use it to generate a random number stream. The simpler approach is to combine UNIX timestamps and client IDs, which is not secure, but is sufficient for many environments.
The time of the key we refer to is "the effective time of the lock". It represents two cases, one is the length of the lock's automatic release, and the other is the length of time that a client occupies the lock before another client acquires the lock, which is limited to a window that starts after the lock is fetched.
Now we have a good way to get and release the lock. In a single instance distributed system, this method is safe as long as the node is not dead. So let's extend this concept to a distributed system, where there is no guarantee.
Redlock algorithm
In the distributed version of this algorithm, we assume that there are n Redis master nodes. These nodes are independent of each other, so we do not use replication or other implicit synchronization mechanisms. We have described how to safely acquire a lock in a single instance case. We also point out that this algorithm will use this method to acquire and release locks from single-instance instances. In the following example, we set n=5 (which is a relatively modest value) so that we need to run 5 Redis master nodes on different physical machines or virtual machines to ensure that their errors are as independent as possible.
To obtain a lock, the client performs the following actions:
- Gets the current time, in milliseconds.
- Attempts to obtain the lock from all n instances in a sequential fashion, using the same key value and the same random value value. When acquiring a lock from each instance, the client sets a connection timeout that is much shorter than the lock's automatic release time. For example, if the lock's automatic release time is 10 seconds, the connection timeout is probably between 5 and 50 milliseconds. This avoids the need to stop the client when the Redis node is hung up: If a node is not responding in time, it should be transferred to the next node as soon as possible.
- The client calculates the length of time taken to acquire all locks by using the current time minus the timestamp in step 1. When and only if the client can obtain a lock from a majority of nodes (at least 3), and the time spent is less than the lock's validity period, the lock is considered to have been obtained.
- If the lock is obtained, its final effective length is recalculated to its original time length minus the time taken to acquire the lock in step 3.
- If the lock fetch fails (either without locking the n/2+1 node, or when the lock's final effective length is negative), the client unlocks all instances, even for those instances that have not been locked successfully.
is the algorithm asynchronous?
The algorithm relies on such an assumption that it is not (based on) synchronous clock processing, each processing is still using the local time, it is only roughly the same rate of operation, so that it will have a small error, compared to it will have a small automatic opening and closing clock time. This assumes a real world of computers: Each computer has a local clock, and usually we use a different computer to have a very small clock difference.
Based on this view, we need to better point out our common mutual exclusion rules: this is to ensure that the client can maintain a state lock for a long time, it will terminate their work during the active time (obtained in step 3), minus some time (minus some milliseconds to compensate for processing time difference).
To understand that the system needs a range of time difference to get more information, this paper is a good reference: Leases:an efficient fault-tolerant mechanism for Distributed file cache Consistency.
Retry when failed
When a client cannot acquire a lock, it should try again after a random delay to avoid multiple clients trying to acquire the lock at the same time (this could cause a crash and no one will win). Similarly, the faster the client tries to acquire a lock in most cases, the less the broken window (the less it needs to retry), so in practice the client should try to send the set command to multiple instances in a multiplexed manner.
Emphasizing that the customer is worth the failure to acquire the primary lock, release (or part) to obtain the lock as soon as possible, so there is no need to wait for the key to get the lock to expire (but if the network partition changes and the client cannot communicate with Redis, the explicit prompt and the wait timeout are required)
Release lock
Releasing the lock is simple, just release all instances of the lock, although the client believes that the ability to successfully lock a given instance.
Security parameters
Is it safe to ask an algorithm? Then you can try to understand what's going on in different situations. We assume that the client can get a lock in most cases to start with, all instances containing the same key for the same lifetime. Because the keys are set at different times, the keys will also timeout at different times. However, if the first node is established at the latest at the T1 time (i.e. before the first server of the sample contact), the last key is established at T2 time (the time of reply from the previous server). It can be determined that the first key will survive before timing out at least min_validity=ttl-(T2-T1)-clock_drift. All other keys will expire after the key will be set at least once at this time.
In more than half of the key is set this time, another client could not get the lock, if the N/2+1 key already exists, N/2+1 set NX operation will not succeed. So a lock is acquired, and it is not possible to be repeatedly acquired at the same time (breach of mutex).
However, we also want multiple clients not to succeed at the same time when acquiring locks.
If a client locks most instances over the maximum valid time of the lock (TTL base setting), it will consider the lock invalid and unlock it. So we only consider the case where most instances of the lock are acquired during the active time. This situation has been discussed above and no client will regain the lock for min_validity. So multiple clients can lock the n/2+1 instance at the same time (at the end of Step 2 "time") so that the lock is invalidated when most instances of the lock have exceeded the TTL time.
Can you provide a formal proof that there are enough similar algorithms in existence or find bugs? Then we would appreciate it.
Proof of Survival
The survivability of the system is based on the following three main features:
- Automatic lock Release (key expires): Eventually all key will be able to be locked back;
- In general, if the client has not successfully acquired the lock, or acquired the lock and completed the work, it will release the lock in time, so that we do not have to wait for the key to be automatically released to regain it.
- Before the client acquires the lock again, it waits a period of time, which is much longer than the acquisition lock itself, in order to minimize the probability of a brain crack condition caused by resource competition.
However, in the case of fragmented networks, we have to pay the availability cost equivalent to the "TTL" time, and if the network continues to fragment, we have to pay this price indefinitely. This occurs when the client acquires a lock and the network disconnects before the lock is removed.
Basically, if the network continues to fragment indefinitely, the system will not be available indefinitely.
performance, recovery, and file synchronization
Many users use Redis as a locking server that requires high performance, and can successfully perform a large number of fetch/release lock operations per second, depending on the delayed dynamic acquisition and release of Locks. In order to meet these requirements, a multiplexing strategy is to collaborate with N-redis servers to reduce latency (or to help poor people, that is, to put ports in non-blocking mode, send all commands, delay reading all commands, It is assumed that the round-trip time between the client and each Redis instance is similar.
However, if we are aiming to implement a recovery model for a faulty system, there is another way of thinking that is relevant to persistence.
Consider this fundamental issue, assuming that we are not configuring Redis persistence at all. One client needs to lock 3 of 5 instances. One allows the client to obtain the lock reboot, although we can again for some resources lock 3 instances, but other clients can also lock it, violates exclusive lock security.
If we enable AOF persistence, the situation will be considerably improved. For example, we can upgrade a server by sending shutdown and restart it. Because the duration of the Redis is set through semantics, the virtual time is still going to pass when the server shuts down, and all our needs are met. Anyway, all the transactions will work as long as the server shuts down completely. What happens if a power outage occurs? If Redis is configured, files are written to disk synchronously by default, and it is likely that our data will be lost after the reboot. Theoretically, if we want to secure the lock after any instance restarts, we need to ensure that fsync=always is set in the persistence configuration. This will lose performance on the same level of the CP system, traditionally used in a more secure way to allocate locks.
Anyway, things look better than when we first glimpsed them. Basically, the security of the algorithm is preserved, even when an instance is restarted after a failure, and it will no longer participate in any currently active lock allocations. Therefore, when the instance restarts, the current settings for all active locks are obtained from the locked instance, except that it is rejoin the system.
To ensure this, we only need to do one instance, after exceeding the maximum TTL, crashes, is not available, then it takes time to get the key to all the existing locks, and when the instance crashes, it becomes invalid and automatically released.
Using a delayed reboot can basically achieve security without even having to take advantage of any redis persistence features, but there are additional side effects. For example, if a large number of instances crashes and the system becomes globally unavailable, the TTL (where the global meaning is that there is no resource available at all, and all resources are locked at this time).
Make the algorithm more reliable: Extend the lock
If the client's job execution is made up of small steps, it can use a smaller lock by default at the default time and extend the algorithm to implement a lock extension mechanism. When the validity of a lock is close to a low value, it is usually the client that is centered in the operation. When the lock is acquired, the possible extension of the lock by sending a LUA script to all instances, this instance is the key to the extended TTL, and if the key exists, its value is the random value of the client copy.
The client should consider only the retrieval of the lock, and if it can be extended, the lock will enter a large number of instances within the valid time (the basic algorithm is very similar to the use of acquiring locks).
Although this does not change the algorithm technically, the maximum number of attempts to acquire the lock is limited, otherwise it violates one of the properties in the activity.