At present, almost a lot of large-scale Web sites and applications are distributed deployment, data consistency in distributed scenarios has always been a relatively important topic. The distributed Cap theory tells us that "none of the distributed systems can meet both consistency (consistency), availability (availability), and partition fault tolerance (Partition tolerance) at most, at the same time." "So many systems will have to make a choice at the beginning of the design. In the vast majority of the Internet scene, the need to sacrifice strong consistency in exchange for the high availability of the system, often only need to ensure "final consistency", as long as the final time is acceptable to the user within the scope.
In many scenarios, in order to ensure the final consistency of data, we need a lot of technical solutions to support, such as distributed transactions, distributed locks and so on. Sometimes, we need to make sure that a method can only be executed by the same thread at the same time. In a stand-alone environment, Java provides many concurrent processing-related APIs, but these APIs are not available in distributed scenarios. This means that simple Java APIs do not provide the ability to distribute locks. So there are many schemes for the implementation of distributed locks at present.
For the implementation of distributed locks, there are several scenarios that are commonly used today:
Implementation of distributed lock based on database based on cache (Redis,memcached,tair) implement distributed lock based on zookeeper
Before we analyze these implementations, let's think about how the distributed locks we need should be. (In this case, the method lock is the same as the resource lock)
It is guaranteed that the same method can only be executed by one thread on a single machine at a time in a distributed deployment application cluster.
This lock is a re-entry lock (to avoid deadlocks)
This lock is best a blocking lock (consider whether you want to do this according to your business needs)
A high availability lock and release lock feature
Better performance for lock and release locks
Implementation of distributed lock based on database table based on database
The simplest way to implement a distributed lock is to create a lock table directly, and then implement it by manipulating the data in that table.
When we want to lock a method or resource, we add a record to the table and delete the record when we want to release the lock.
Create such a database table:
When we want to lock a method, execute the following sql:
Because we have a method_name
unique constraint, if there are multiple requests committed to the database at the same time, the database will ensure that only one operation can be successful, then we can assume that the successful operation of the thread obtained the method lock, can execute the method body content.
When the method finishes executing and you want to release the lock, you need to execute the following sql:
The above simple implementation has the following problems:
1, this lock strongly relies on database availability, the database is a single point, once the database hangs, will cause the business system is not available.
2, this lock has no failure time, once the unlock operation fails, it will cause the lock record in the database, other threads can no longer be locked.
3, this lock can only be non-blocking, because the data insert operation, once inserted failure will be directly error. A thread that does not acquire a lock does not enter the queued queue, and the lock operation is triggered again if the lock is to be acquired again.
4. This lock is non-reentrant, and the same thread cannot obtain the lock again until the lock is released. Because the data already exists in the data.
Of course, there are other ways we can solve the problem.
- is the database a single point? Two databases, two-way synchronization before data. Once hung off quickly switch to the standby.
- No expiry time? Just do a timed task to clean up the timeout data in the database at a certain time.
- Non-blocking? Engage a while loop until the insert succeeds and returns successfully.
- Not re-entry? Add a field in the database table, record the current lock machine's host information and thread information, then the next time to acquire a lock query the database, if the current machine host information and thread information in the database can be found, directly assign the lock to him can be.
Exclusive lock based on database
In addition to adding or removing records from the data table, you can actually implement a distributed lock with the help of the locks in the data.
We also used the database table that we just created. Distributed locks can be implemented through exclusive locks on the database. MySQL-based InnoDB engine, you can use the following methods to implement the lock operation:
After the query statement is added for update
, the database will add an exclusive lock to the database table during the query (here are a few more words, when the InnoDB engine is locked, row-level locks are used only when the index is retrieved), or table-level locks are used. Here we want to use row-level lock, we need to add an index to Method_name, it is worth noting that this index must be created as a unique index, otherwise there will be multiple overloaded methods can not be accessed between the problem. Overloaded methods, it is recommended to add the parameter type. )。 When a record is added to an exclusive lock, other threads can no longer add an exclusive lock on the row record.
We can assume that a thread that obtains an exclusive lock acquires a distributed lock, and when the lock is acquired, the business logic of the method can be executed, after the method is executed, and then unlocked by the following methods:
connection.commit()
release the lock by operation.
This method effectively solves the problem of the above mentioned inability to release locks and blocking locks.
- Blocking locks? The
for update
statement returns immediately after execution succeeds and is blocked until the execution fails until it succeeds.
- Service down after lockout, unable to release? Using this method, the database will release the lock itself after the service goes down.
However, it is still not possible to solve the database single point and reentrant problem directly.
There may also be another problem, although we use method_name
a unique index and display use for update
to use row-level locks. However, MySQL optimizes the query, even if the index field is used in the condition, but whether the index is used to retrieve the data is determined by the cost of MySQL judging different execution plans, and if MySQL thinks that the full table sweep is more efficient, such as for small tables, it will not use the index, in which case InnoDB will use a table lock instead of a row lock. If this happens, it's tragic ...
Another problem is that we want to use exclusive locks for distributed lock lock, then an exclusive lock for a long time without committing, will occupy the database connection. Once a similar connection becomes more, it is possible to burst the database connection pool
Summarize
Summarize the use of the database to implement distributed locks, both of which are dependent on the database of a table, one is the existence of records in the table to determine whether there is currently a lock exists, and the other is through the database of exclusive locks to implement the distributed lock.
The advantages of implementing distributed locks in database
It's easy to understand with a database directly.
Disadvantages of implementing distributed locks in database
There will be a variety of problems that will complicate the whole solution in the process of solving the problem.
Operational databases require a certain amount of overhead, and performance issues need to be considered.
Using a database row-level lock is not necessarily reliable, especially when our lock table is not large.
Implementing distributed locks based on cache
Compared to a database-based implementation of distributed locking schemes, caching-based implementations will perform better in terms of performance. And many caches can be deployed in a cluster, solving a single point of problem.
There are a lot of mature cache products, including redis,memcached and our company's internal tair.
This paper takes Tair as an example to analyze the scheme of implementing distributed locks using caching. There are many related articles about Redis and memcached on the web, and there are some mature frameworks and algorithms that can be used directly.
Implementation of distributed locks based on Tair is actually similar to Redis, where the main implementation is TairManager.put
implemented using methods.
There are also several problems with the above implementation methods:
1, this lock does not have the expiration time, once the unlocking operation fails, it will cause the lock record to be in the tair, other threads cannot obtain the lock again.
2, this lock can only be non-blocking, regardless of success or failure are directly returned.
3, this lock is non-reentrant, after a thread obtains the lock, before releasing the lock, can not obtain the lock again, because the key used in the Tair already exists. The put operation can no longer be performed.
Of course, there is also a way to solve.
- No expiry time? The Put method of Tair supports incoming expiration time, and the data is automatically deleted after the arrival time.
- Non-blocking? While repeating execution.
- Non-reentrant? After a thread acquires the lock, it saves the current host information and thread information and checks to be the owner of the current lock before the next fetch.
However, how long do I set the expiry time as well? How to set the expiration time is too short, the method does not wait for the execution, the lock is automatically released, then it will produce concurrency problems. If the set time is too long, other threads that acquire the lock may have to wait a little longer. The problem is that using a database to implement distributed locks also exists
Summarize
You can use caching instead of a database to implement distributed locks, which can provide better performance, while many cache services are deployed in a cluster to avoid a single point of problem. And many caching services provide methods that can be used to implement distributed locks, such as Tair's Put method, Redis's Setnx method, and so on. Also, these caching services provide support for automatic deletion of expired data, and you can set the time-out to control the release of the lock directly.
Benefits of implementing distributed locks using caching
Good performance, more convenient to achieve.
Disadvantages of implementing distributed locks using caching
It is not very reliable to control the lock expiration time by time-out.
Implementation of distributed locks based on zookeeper
A distributed lock can be implemented based on zookeeper temporary ordered nodes.
The general idea is that when each client locks a method, it generates a unique instantaneous ordered node in the directory of the specified node corresponding to the method on zookeeper. The way to determine whether to acquire a lock is simply to judge the smallest of the ordinal nodes in the order. When releasing the lock, simply delete the instantaneous node. At the same time, it avoids the deadlock problem caused by the failure of the lock due to service outage.
See if zookeeper can solve the problem mentioned earlier.
Lock not released? The use of zookeeper can effectively solve the problem of the lock cannot be released, because when the lock is created, the client creates a temporary node in ZK, and once the client acquires the lock and then suddenly hangs out (the session connection disconnects), the temporary node is automatically deleted. Other clients can get the lock again.
Non-blocking lock? With zookeeper, a blocking lock can be implemented, the client can create a sequential node in ZK, and the node is bound to the listener, once the node changes, zookeeper notifies the client, the client can check that the node you created is not the lowest ordinal of all current nodes, if so, You can then execute the business logic by acquiring the lock yourself.
No re-entry? The use of zookeeper can also effectively solve the problem of non-reentrant, the client in the creation of the node, the current client's host information and thread information directly to the node, the next time you want to acquire a lock and the current smallest node in the data compared to a bit. If the same as your own information, then you get to the lock directly, if not the same, create a temporary sequential node, participate in the queue.
A single point of issue? The use of zookeeper can effectively solve the single point problem, ZK is a cluster deployment, as long as more than half of the cluster of machines survive, you can provide services outside.
The Zookeeper third-party library curator client can be used directly, which encapsulates a reentrant lock service.
The Interprocessmutex provided by curator is the implementation of distributed locks. Acquire method The user acquires the lock, and the release method is used to release the lock.
The distributed locks implemented using ZK seem to fully conform to all of our expectations for a distributed lock at the beginning of this article. However, in fact, the zookeeper implementation of distributed locks in fact there is a disadvantage, that is, performance may not be as high as the cache service. Because each time a lock is created and the lock is released, the instantaneous node is dynamically created and destroyed to implement the lock function. The Create and delete nodes in ZK can only be executed by the leader server, and then the data will not be on all follower machines.
In fact, the use of zookeeper also has the potential to bring concurrency problems, but it is not common. Consider this situation, due to network jitter, the client can be a ZK cluster session connection is broken, then ZK thought the client hangs, will delete the temporary node, then other clients can get to the distributed lock. can cause concurrency problems. This is not a common problem because ZK has a retry mechanism, and once the ZK cluster does not detect the heartbeat of the client, it retries, and the curator client supports multiple retry policies. It is not possible to delete the temporary node until after several retries. (So it's also important to choose a suitable retry strategy to find a balance between the granularity of the lock and the concurrency.) )
Summarize
Advantages of using zookeeper to implement distributed locks
Solve the problem of single point, non-reentrant problem, nonblocking problem and lock can't release effectively. Simple to implement.
Disadvantages of using zookeeper to implement distributed locks
Performance is better than using caching to implement distributed locks. The principles of ZK need to be understood.
Comparison of three different schemes
There are several ways in which none of these can be perfect. Just like the cap, in terms of complexity, reliability, performance, etc. can not be met at the same time, so, according to different application scenarios to choose the most suitable for their own is the kingly.
From the perspective of difficulty in understanding (from low to high)
Database > Cache > Zookeeper
From the complexity angle of implementation (low to High)
Zookeeper >= Cache > database
From the performance point of view (high to low)
Cache > Zookeeper >= database
From the reliability point of view (high to low)
Zookeeper > Cache > Database
Several realization ways of distributed lock