A word on the distributed lock, process lock, wire lock

Last Update:2017-09-09 Source: Internet

Author: User

Tags zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

　In the development of distributed cluster system, the thread lock is often not able to support the use of all scenarios, and a new technology scheme distributed lock must be introduced.

　　thread Lock: Everyone is not unfamiliar, mainly used to give methods, code block lock. When a method or block of code uses a lock, at most one thread at the same time executes the code. When there are multiple threads accessing the Lock method/code block of the same object, only one thread executes at the same time, and the remaining threads must wait for the current thread to execute before executing the code snippet. However, the remaining threads are able to access the non-locking code blocks in the object.

　　process Lock: also to control access to a shared resource by multiple processes in the same operating system, only because of program independence, processes cannot control access to resources by other processes, but can use the local system's semaphore control (operating system basics).

　　Distributed Locks: use distributed locks to control access to resources by multiple processes when multiple processes are not in the same system.

The original text is discussed with the author: http://www.cnblogs.com/intsmaze/p/6384105.html

What is a distributed lock and how is it implemented?

Intsmaze said simply that implementing a distributed lock must rely on third-party storage media to store information such as lock metadata. For example, a distributed cluster to operate a row of data, the serial number of this data is unique, then we will use the serial number as a lock ID, when a process to manipulate the data, first to the third-party storage media to see if the lock ID exists, if not present, then write the lock ID, and then the operation of the data When other processes want to access this data, the lock ID of the data is first viewed on the third-party storage media, and in some cases it is assumed that the data is currently in use by other processes, and the third-party storage media is constantly polled to see if the lock is released by other processes, and when the process finishes manipulating the data, The process removes the lock ID from the third-party storage media so that the other polling process can gain control of the lock.

Say so much, add a bit, thread lock, process locks, distributed locks are the same, but the scope of the effect of different sizes. Range Size: Distributed lock--greater than---process lock--greater than--the thread lock. It is also possible to use a distributed lock in the case of a thread lock, and it is possible to use a process lock with a thread lock. Just the larger the scope, the greater the complexity of the technology.

The pain points of distributed locks have never been felt for many years in the development career of Java EE!!!

About distributed locks, have had Java EE development experience will say, the system in order to deal with high concurrency, will be set up a Tomcat cluster, cluster services are access to the same database, there are multiple servers simultaneously modify the same database data operations, but we do not use distributed locks in the server? According to the above interpretation of distributed locks, the JVM process on two different systems accesses the same resource of the database at the same time, we should use distributed lock for control.

There's nothing wrong with that, but we've forgotten the nature of the database . If two servers are simply accessing (through a URL) and manipulating a single row of data on a server's hard disk, we must use a distributed lock. However, because the data accessed by these two servers is stored in the database (the database itself is a service program, multi-threaded receiving requests from the external System), the two server requests through the network IO sent to the database server, and then the request to the database service process processing, The database server is multi-threaded to receive requests and processing, this time about a table a row of data multithreaded access control is controlled by the database service (that is, the database service code in the thread of lock processing), which is the database server row lock, and other characteristics, Because the end of the database has a lock operation on the external requests of multiple systems, we do not need to develop distributed locks on the application server.

If you want to update the database's multi-row data at the same time, the database row lock is not guaranteed. this time we are going to use a distributed lock, yes this time can be used, note I use is OK. why is that possible? Because the database itself provides this mechanism, the transaction, and his isolation level. Of course you can also use a distributed lock without the transaction provided by the database.

Does the design of distributed locks need to be considered for business?

The design of distributed locks is not entirely beautiful, only for some business scenarios, if you want to use all business, must fully understand the business needs of reasonable design, as for the reason and you mybatis in the development of the Java two cache namespace as a unit to pay attention to the business problems when the same.

Intsmaze using distributed locks, we will lock the second third row of a table as an ID, and if we have the same operation to update the second third row of the table, we won't let him modify it, we must get him to the lock. But if there is an action that only modifies the second row, then he gets the action on that row, and waits until the database releases the lock on the row before the operation. So distributed locks are not available anywhere, only in certain scenarios. For example, a business system does not have an operation that modifies the second row individually.

Distributed locks for hbase storage systems

In the actual development scenario, we will distribute the HBase operations, HBase as an excellent non-memory database, the traditional database provides the concept of transactions, but hbase transactions are row-level transactions, can guarantee row-level data atomicity, consistency, isolation and durability, Which is commonly referred to as ACID properties. in order to realize transactional characteristics, HBase employs various concurrency control strategies, including various locking mechanisms, MVCC mechanisms, and so on. Because HBase supports only row-level things, hbase itself cannot provide acid support when the business needs to concurrently manipulate two or more rows of records.

Database access is too large in addition to master and slave how can load pressure?

The database creates a thread for each request from the client that must obtain a row lock on the row for a specific row of data modification, and the other client thread must wait for the previous thread to release the lock before it is allowed to modify the data. If many threads of a client are modifying a row of data, the thread that does not get the lock will constantly poll on the database-side machine, increasing the pressure on the database side.

We can use distributed locks to implement the polling of waiting for a database row lock on each client machine , which avoids the constant polling of the database thread. For example, the client in order to send a database of data on a row of the operation of the request, on the client machine to engage in the lock, do not acquire a lock, it will not be like the database side to send operation requests, so that the database side does not have polling pressure. Of course, the introduction of distributed locks must be combined with the needs of the business to design, otherwise, the naming of the lock ID will result in inconsistent data read, data expiration and other issues.

Use that third-party media to store distributed locks?

The current popular is zookeeper and Redis, both have advantages, Redis popular memory cache, and can be horizontal expansion while also improve the request load, in the face of high and distributed lock data read and write requests can be high-speed response, while there are aof, Sentinel mechanism can prevent the problem caused by the loss of the distributed lock data of an outage.

Zookeeper I prefer, because he is the implementation of the distributed consistency algorithm Paxos algorithm, the face of high load requests without pressure, while a certain outage does not affect the distributed lock data consistency, and with the monitoring mechanism, when a program released a lock, Other programs can be notified in a timely manner to gain control over the distributed lock, where polling implementations do not need to be developed.

Introduction to the distributed lock and the thread locks from a year ago in the editor, there has been no time to introduce to everyone in a plain and clear way. In many forums, I found that many new entrants to the big Data area mentioned distributed locks, but did not have a deep understanding of the distributed lock and thread lock scene, so that in many cases the obvious line lock can be done by the introduction of distributed locks, so that the overall system design more complex.

In addition, zookeeper I think is a great technology, although in the Big data field as a coordinator of a certain framework appears, causing many developers to ignore his greatness. But I want to say, in the current hot micro-services, in fact, with the help of zookeeper to achieve a lot of functions, such as distributed locks, Configuration center.

A word on the distributed lock, process lock, wire lock

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More