How to Use locks intelligently

Source: Internet
Author: User
How to Use locks intelligently

Tips for using locks efficiently in Java

Document options
<Tr
Valign = "TOP"> <TD width = "8"> Src = "// www.ibm.com/ I /c.gif"/> </TD> <TD width = "16"> Height = "16" src = "// www.ibm.com/ I /c.gif"/> </TD> <TD class = "small"
Width = "122"> <p> <SPAN class = "Ast"> Javascript is not displayed
</Span> </P> </TD> </tr>


Print this page

Send this page as an email

Sample Code

Level: Intermediate

Dai Xiaojun
Daixiaoj@cn.ibm.com
), Software engineer, IBM China Software Development Center
Gan Zhi
Ganzhi@cn.ibm.com
), Senior Software Engineer, IBM China Software Development Center
Zhang Yue
Yuezbj@cn.ibm.com
), Software engineer, IBM China Software Development Center

July 10, 2009

Lock
As a mechanism used to protect the critical section, lock is widely used in multi-threaded programs. In Java
The synchronized keyword is still in the Java. util. Concurrent package.
Reentrantlock is a powerful tool in the hands of multi-threaded application developers. However, powerful tools are often used as a double-edged sword. Excessive or incorrect use of locks will lead to the performance of multi-threaded applications.
Drop. This problem is becoming increasingly apparent today when the multi-core platform becomes mainstream.


Competitive locks are the main cause of performance bottlenecks in multi-threaded applications

Zone
The impact of divided competitive locks and non-competitive locks on performance is very important. If a lock is used by only one thread from start to end, then the JVM
Ability to optimize most of the losses it brings. If a lock has been used by multiple threads but only one thread tries to obtain the lock at any time, the overhead of the lock will be higher. We call the above two locks
Non-competitive lock. The most serious impact on performance occurs when multiple threads attempt to obtain the lock at the same time. In this case, the JVM cannot be optimized, and switching from user to kernel usually occurs. Modern
JVM has made a lot of Optimizations to non-competitive locks so that it will hardly affect performance. The following are common optimizations.

  • If a lock object can only be accessed by the current thread, other threads cannot obtain the lock and synchronize it. Therefore, JVM can remove requests for this lock.
  • Escape analysis can identify whether local object references are exposed in the heap. If not, you can change the local object reference to a local thread ).
  • The compiler can also perform lock coarsening ). Merge adjacent synchronized blocks with the same locks to reduce the acquisition and release of unnecessary locks.

Therefore, do not worry about the overhead caused by non-competitive locks. Pay attention to the performance optimization in the key zones that actually have a lock competition.



Back to Top

Methods To reduce lock Competition

Many developers try to minimize the use of locks because they are worried about performance loss caused by synchronization, and even do not use lock protection for some critical zones that seem to have an extremely low probability of errors. This will not improve performance, but will introduce debugging errors that are difficult to debug. These errors are usually very low in probability and are difficult to reproduce.

Therefore, to ensure the correctness of the program, the first step to solve the performance loss caused by the synchronization belt is not to remove the lock, but to reduce the lock competition. Generally, there are three ways to reduce the lock competition: reduce the lock holding time, reduce the frequency of the Request lock, or use other coordination mechanisms to replace the exclusive lock. These three methods contain many best practices, which are described in the following sections.

Avoid time consumption calculation in the critical section

Generally, the technology that turns code into thread-safe is to add a "big lock" to the entire function ". For example, in Java, declare the entire method as synchronized. However, what we need to protect is the sharing status of the object, not the code.

JlM report comments


% Miss: Percentage of failed lock requests
Gets: Total number of lock requests = fast + slow + rec
Nonrec: the total number of non-recursive lock requests
Slow: Number of times no lock is obtained for non-recursion (the thread is blocked)
Fast: Number of non-recursive locks obtained = norec-slow
REC: Number of recursive locks
Tier2: Number of Inner-layer loops to obtain the lock on a platform that supports layer-3 spin locks.
Tier3: number of times that the lock is in the outer loop to obtain the lock on a platform that supports three-layer spin locks.
% Util: Lock utilization time = total lock holding time/sampling time
AVER-HTM: average lock hold time = total lock hold time/total number of non-recursive locks

Pass
Long lock hold limits the scalability of applications. Brian Goetz in Java concurrency in practice
As mentioned in the book, if an operation holds a lock for more than 2 ms and requires this lock for each operation, no matter how many idle processors are there, the application throughput will not exceed the throughput per second.
500 operations. If you can reduce the lock holding time to 1 millisecond, you can increase the lock-related throughput to 1000 per second.
. In fact, here We conservatively estimate the overhead of holding the lock for a long time because it does not involve the overhead brought about by the competition of computing locks. For example, busy waiting and line switching caused by failed lock acquisition will be a waste.
CPU time. The most effective way to reduce the possibility of lock competition is to shorten the lock holding time as much as possible. This can be achieved by removing the code that does not require lock protection from the synchronization block,
Especially those that are expensive and potentially congested, such as I/O operations.

In example 1, we use JlM (Java
Lock Monitor) to view the lock usage in Java. Foo1 uses synchronized to protect the entire function. foo2 only protects variables.
Maph. Aver_htm shows the holding time of each lock. We can see that after the irrelevant statement is removed from the synchronization block, the lock holding time is reduced, and the program execution time is also shortened.

Example 1. Avoid time consumption calculation in the critical section



Import java. util. Map;
Import java. util. hashmap;

Public class timeconsuminglock implements runnable {
Private final map <string, string> maph = new hashmap <string, string> ();

Private int opnum;
Public timeconsuminglock (INT on)
{
Opnum = on;
}

Public synchronized void foo1 (int K)
{
String key = integer. tostring (k );
String value = Key + "value ";
If (null = key)
{
Return;
} Else {
Maph. Put (Key, value );
}
}

Public void foo2 (int K)
{
String key = integer. tostring (k );
String value = Key + "value ";
If (null = key)
{
Return;
} Else {
Synchronized (this ){
Maph. Put (Key, value );
}
}
}

Public void run ()
{
For (INT I = 0; I <opnum; I ++)
{
// Foo1 (I); // time consuming
Foo2 (I); // This will be better
}
}
}

Results from JlM report

Result of using foo1

MON-NAME [08121048] timeconsuminglock @ d7968db8 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 5318465 5318465 35 0 349190349 38 8419428

Execution time: 16106 milliseconds

Result of using foo2

The MON-NAME [d594c53c] timeconsuminglock @ d6dd67b0 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 5635938 5635938 71 0 373087821 27 8968423

Execution time: 12157 milliseconds

Split lock and split lock

Drop
Another way to compete for low locks is to reduce the frequency of thread request locks. Lock splitting and lock striping)
Yes. Independent status variables should be protected using independent locks. Sometimes developers mistakenly use a lock to protect all state variables. These technologies reduce the lock granularity
Now we have improved scalability. However, these locks need to be carefully allocated to reduce the risk of deadlocks.

If a lock guards multiple independent state variables, you may be able to split the lock so that each lock can guard different variables to improve scalability. With this change, the request frequency for each lock is reduced. Split locks can effectively convert most of them into non-competitive locks to improve both performance and scalability.

In
In Example 2, we split the lock originally used to protect two independent object variables into two locks separately protecting each object variable. In the JlM result, we can see the original lock.
Splittinglock @ d6dd3078 becomes two locks: Java/util/hashset @ d6dd7be0 and
Java/util/hashset @ d6dd7be0. And the number of gets requests and the degree of competition (slow, tier2,
Tier3) are greatly reduced. Finally, the execution time of the program is reduced from 12981 ms to 4797 Ms.

When the competition for a lock is fierce, it is likely to be split into two locks, which are highly competitive. Although this allows two threads to execute concurrently, there are some minor improvements to scalability. But it still cannot significantly improve the concurrency of multiple processors in the same system.

Minute
Locks can be extended and divided into several sets of lock blocks, and they belong to mutually independent objects. In this case, the locks are separated. For example, concurrenthashmap
The implementation uses an array containing 16 locks, each of which protects 1/16 of hashmap. Assume that hash
Even Distribution of values, which will reduce the number of requests for locks to about 1/16 of the original. This technology enables concurrenthashmap to support 16 concurrent jobs.
Writer. When the High-load access of a multi-processor system requires better concurrency, the number of locks can also increase.

In Example 3, we simulate
Use of the separation lock in concurrenthashmap. Use four locks to protect different parts of the array. In the JlM result, we can see the original lock.
Strippinglock @ d79962d8 is changed to four locks, such as Java/lang/object @ d79964b8. And the degree of competition of the lock
(Tier2, Tier3) are greatly reduced. Finally, the execution time of the program is reduced from 5536 ms to 1857 Ms.

Example 2. Split locks



Import java. util. hashset;
Import java. util. Set;

Public class splittinglock implements runnable {
Private final set <string> Users = new hashset <string> ();
Private final set <string> queries = new hashset <string> ();
Private int opnum;
Public splittinglock (INT on ){
Opnum = on;
}

Public synchronized void adduser1 (string U ){
Users. Add (U );
}

Public synchronized void addquery1 (string q ){
Queries. Add (Q );
}

Public void adduser2 (string U ){
Synchronized (users ){
Users. Add (U );
}
}

Public void addquery2 (string q ){
Synchronized (queries ){
Queries. Add (Q );
}
}

Public void run (){
For (INT I = 0; I <opnum; I ++ ){
String user = new string ("user ");
User + = I;
Adduser1 (User );

String query = new string ("query ");
Query + = I;
Addquery1 (query );
}
}
}

Results from JlM report

Use adduser1 and addquery1 results

MON-NAME [d5848cb0] splittinglock @ d6dd3078 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 9004711 9004711 101 0 482982391 44 10996987

Execution time: 12981 milliseconds

Use adduser2 and addquery2 results

MON-NAME [d5928c98] Java/util/hashset @ d6dd7be0 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 1875510 1875510 38 0 108706364 14 2546875

MON-NAME [d5928c98] Java/util/hashset @ d6dd7be0 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 272365 272365 0 0 15154239 352397 1 3042

Execution time: 4797 milliseconds

Example 3. Separation lock



Public class strippinglock implements runnable {
Private final object [] locks;
Private Static final int n_locks = 4;
Private final string [] share;
Private int opnum;
Private int n_anum;

Public strippinglock (INT on, int anum ){
Opnum = on;
N_anum = anum;
Share = new string [n_anum];
Locks = new object [n_locks];
For (INT I = 0; I <n_locks; I ++)
Locks [I] = new object ();
}

Public synchronized void put1 (INT indx, string K ){
Share [indx] = K; // acquire the object lock
}


Public void put2 (INT indx, string K ){
Synchronized (locks [indx % n_locks]) {
Share [indx] = K; // acquire the corresponding lock
}
}

Public void run ()
{
// The expensive put
/* For (INT I = 0; I <opnum; I ++)
{
Put1 (I % n_anum, integer. tostring (I + 1 ));
}*/
// The cheap put
For (INT I = 0; I <opnum; I ++)
{
Put2 (I % n_anum, integer. tostring (I + 1 ));
}
}
}

Results from JlM report

Put1 result

MON-NAME [08121228] strippinglock @ d79962d8 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 4830690 4830690 460 0 229538313 18 5010789

Execution time: 5536 milliseconds

Put2 result

MON-NAME [08121388] Java/lang/object @ d79964b8 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 4591046 4591046 1517 0 151042525 13 3016162

MON-NAME [08121330] Java/lang/object @ d79964c8 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 1717579 1717579 523 0 50596994 5 958796

MON-NAME [081213e0] Java/lang/object @ d79964d8 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 1814296 1814296 536 0 58043786 5 1113454

MON-NAME [08121438] Java/lang/object @ d79964e8 (object)
% Miss gets nonrec slow rec tier2 Tier3 % util aver_htm
0 3126427 3126427 901 0 96627408 1857005 1979

Execution time: 1857 milliseconds

Avoid hotspot domain

In
In some applications, we use a shared variable to cache commonly used computing results. This shared variable needs to be modified for each update operation to ensure its validity. For example
Size, counter, and reference of the head node of the linked list. In multi-threaded applications, the shared variable must be locked. This kind of optimization method commonly used in single-threaded applications will become one of
"Hot field)
To limit scalability. If a queue is designed to maintain high throughput during multi-threaded access, you can consider not updating the queue size during each queue and outbound operation.
To avoid this problem in concurrenthashmap, an independent counter is maintained in the array of each shard, and the lock protection is separated, instead of maintaining a global count.

Alternative to exclusive locks

The third technique used to mitigate the performance impact of a competitive lock is to discard the use of an exclusive lock and use a more efficient concurrent method to manage the sharing status. Such as concurrent containers, read-write locks, immutable objects, and atomic variables.

Java. util. Concurrent. locks. readwritelock implements a multi-reader-single writer lock: Multiple readers can access Shared Resources concurrently, but the writer must obtain the lock exclusively. For data structures where most operations are read operations, readwritelock provides better concurrency than an exclusive lock.

Atomic variables provide methods to avoid lock contention caused by "hot point domain" updates, such as counters, sequence generators, or updates referenced by the head node of the linked list data structure.

In Example 4, we use atomic operations to update each element of the array to avoid exclusive locks. The execution time of the program is reduced from 23550 ms to 842 Ms.

Example 4. arrays operated by atoms



Import java. util. Concurrent. Atomic. atomiclongarray;

Public class atomiclock implements runnable {
Private Final long d [];
Private Final atomiclongarray;
Private int a_size;

Public atomiclock (INT size ){
A_size = size;
D = new long [size];
A = new atomiclongarray (size );
}

Public synchronized void set1 (INT idx, long Val ){
D [idx] = val;
}

Public synchronized long get1 (INT idx ){
Long ret = d [idx];
Return ret;
}

Public void set2 (INT idx, long Val ){
A. addandget (idx, Val );
}

Public long get2 (INT idx ){
Long ret = A. Get (idx );
Return ret;
}

Public void run (){
For (INT I = 0; I <a_size; I ++ ){
// The slower operations
// Set1 (I, I );
// Get1 (I );

// The quicker operations
Set2 (I, I );
Get2 (I );
}
}
}

Set1 and get1 results
Execution time: 23550 milliseconds

Set2 and get2 results
Execution time: 842 milliseconds

Use concurrent containers

Slave
Java. util. Concurrent
The package provides a highly efficient thread-safe concurrent container. Concurrent containers ensure thread security and optimize common operations when a large number of threads are accessed. These containers are suitable for multithreading running on multi-core platforms.
It is used in applications and has high performance and high scalability. The amino project provides more efficient concurrent containers and algorithms.

Use immutable data and Thread Local Data

Immutable data remains unchanged throughout its lifecycle, so you can safely copy one copy in each thread for fast reading.

Threadlocal
Data is only locked by the thread itself. Therefore, data sharing between different threads does not occur. Threallocal
It can be used to improve many existing shared data. For example, the object pool shared by all threads and the waiting queue can become the exclusive Object pool and waiting queue of each thread. Use
Work-stealing scheduler instead of the traditional FIFO-queue schedue
Data example.



Back to Top

Conclusion

Lock
Is an indispensable tool for developing multi-threaded applications. As the multi-core platform becomes the mainstream today, correct use of locks will become a basic skill for developers. Despite lock-free programming and
Transactional memory
It has already appeared in the field of view of software developers. In the visible future, lock-based programming is still the most important parallel programming skill. We hope that the methods proposed in this article can help you use the lock correctly.
.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.