Before we used the timing tasks are only deployed on a single machine, in order to solve a single point of the problem, in order to ensure a task, only one machine to execute, you need to consider the lock problem, so took the time to study the problem. How do you implement a distributed lock? This article mainly introduces the Redis implementation of distributed locking Method example, small series feel very good, and now share to everyone, but also for everyone to do a reference, hope to help everyone.
The essence of the lock is mutual exclusion, to ensure that at any time a client can hold the same lock, if you consider using Redis to implement a distributed lock, the simplest solution is to create a key value in the instance, release the lock, the key value is deleted. But a reliable and perfect distributed lock requires more detail to consider, so let's take a look at how to write a correct distributed lock.
Single Version distributed lock SETNX
So we implement a simple lock directly based on the Redis setnx (SET if not eXists) command. Directly on the pseudo code
Acquisition of Locks:
SET resource_name my_random_value NX PX 30000
Release of the Lock:
If Redis.call ("Get", keys[1]) = = Argv[1] then return Redis.call ("Del", Keys[1]) else return 0 end
Several details need to be noted:
First, we need to set the setting time-out when we get the lock. The timeout is set to prevent the client from crashing, or the lock is held after a network problem occurs. The system is dead locked.
Using the SETNX command, ensure that the query and write two steps are atomic
At the time of the lock release we judged keys[1]) = = Argv[1], where keys[1 is taken from the Redis value,argv[1] is the my_random_value generated above. The above judgments are made to ensure that locks are released by the holder of the lock. We assume that this step is not verified:
Client A acquires the lock, and the post thread hangs. The time is greater than the lock expiration time.
After the lock expires, client B acquires the lock.
After client a resumes, the related events are processed and the DEL command is initiated to Redis. The lock is released.
Client C acquires the lock. At this time two clients in one system hold the lock.
The key to this problem is that the lock held by Client B is released by client A.
The release of the lock must use a LUA script to ensure the atomicity of the operation. The release of the lock contains a GET, judge, Del three steps. If the atomic nature of the three steps is not guaranteed, the distributed lock will have concurrency problems.
Note the above details, a single Redis node distributed lock is achieved.
There is still a single point of Redis in this distributed lock. You might say that Redis is a master-slave architecture, and it's good to switch to slave when a failure occurs, but Redis replication is asynchronous.
If you get the lock on client A on master.
Master is down before master synchronizes the data to slave.
Client B Gets the lock again from the slave.
This is due to the master's downtime, resulting in the simultaneous holding of locks by multiple people. If your system is available for a short period of time, there are multiple people holding the lock. This simple solution will solve the problem.
But if this problem is solved. The official Redis offers a redlock solution.
The realization of Redlock
In order to solve the problem of Redis single point. The Redis author proposes a redlock solution. The scheme is very ingenious and concise.
The core idea of Redlock is to use multiple Redis master at the same time to be redundant, and these nodes are completely independent, and do not need to synchronize the data between these nodes.
Suppose we have n redis nodes, n should be an odd number greater than 2. Redlock implementation steps:
Get current time
Use the method mentioned above to get the Redis lock for n nodes in turn.
If the number of locks acquired is greater than (n/2+1), and the acquired time is less than the lock's effective time (lock validity times), it is considered to have acquired a valid lock. The lock auto-release time is the time that the initial lock-release time is subtracted from the previous acquisition of the lock.
If the number of acquired locks is less than (n/2+1), or if the lock is not obtained enough within the time of validity (lock validity), it is considered to have failed to acquire the lock. This is the time to send a release lock message to all nodes.
The implementation of the release lock is simple. All Redis nodes are wanted to initiate the release operation regardless of whether the lock was previously acquired successfully.
There are several details to note as well:
The interval between retries to acquire a lock should be a random range rather than a fixed time. This prevents multiple clients from simultaneously sending a lock operation to the Redis cluster to avoid competing at the same time. Get the same number of locks at the same time. (Although the probability is very low)
If a master node fails, the time interval for the reply should be greater than the lock's effective time.
Suppose there are a,b,c three Redis nodes.
Client Foo Gets the A, b two locks.
This time B goes down and all memory data is lost.
B-node reply.
This time the client bar acquires the lock and gets to b,c two nodes.
At this point another two clients acquire the lock.
So if the recovery time will be greater than the lock effective time, you can avoid the above situation. At the same time, if the performance requirements are not high, you can even turn on the redis persistence option.
Summarize
After understanding the implementation of Redis distributed, in fact, most of the distributed systems actually feel that the principle is very simple, but in order to ensure the reliability of distributed systems need to pay attention to a lot of details, trivial anomalies.
The Redlock algorithm realizes the distributed lock is simple and efficient, the idea is quite ingenious.
But will redlock be safe? I will also write an article to discuss the problem. Please look forward to it.