|
<Tr Valign = "TOP"> <TD width = "8"> Src = "// www.ibm.com/ I /c.gif"/> </TD> <TD width = "16"> Height = "16" src = "// www.ibm.com/ I /c.gif"/> </TD> <TD class = "small" Width = "122"> <p> <SPAN class = "Ast"> Javascript is not displayed </Span> </P> </TD> </tr>
|
|
Print this page
|
|
|
Send this page as an email
|
|
|
Sample Code
|
|
|
Level: Intermediate Dai Xiaojun Daixiaoj@cn.ibm.com ), Software engineer, IBM China Software Development Center Gan Zhi Ganzhi@cn.ibm.com ), Senior Software Engineer, IBM China Software Development Center Zhang Yue Yuezbj@cn.ibm.com ), Software engineer, IBM China Software Development Center July 10, 2009
Lock As a mechanism used to protect the critical section, lock is widely used in multi-threaded programs. In Java The synchronized keyword is still in the Java. util. Concurrent package. Reentrantlock is a powerful tool in the hands of multi-threaded application developers. However, powerful tools are often used as a double-edged sword. Excessive or incorrect use of locks will lead to the performance of multi-threaded applications. Drop. This problem is becoming increasingly apparent today when the multi-core platform becomes mainstream.
Competitive locks are the main cause of performance bottlenecks in multi-threaded applications
Zone The impact of divided competitive locks and non-competitive locks on performance is very important. If a lock is used by only one thread from start to end, then the JVM Ability to optimize most of the losses it brings. If a lock has been used by multiple threads but only one thread tries to obtain the lock at any time, the overhead of the lock will be higher. We call the above two locks Non-competitive lock. The most serious impact on performance occurs when multiple threads attempt to obtain the lock at the same time. In this case, the JVM cannot be optimized, and switching from user to kernel usually occurs. Modern JVM has made a lot of Optimizations to non-competitive locks so that it will hardly affect performance. The following are common optimizations.
- If a lock object can only be accessed by the current thread, other threads cannot obtain the lock and synchronize it. Therefore, JVM can remove requests for this lock.
- Escape analysis can identify whether local object references are exposed in the heap. If not, you can change the local object reference to a local thread ).
- The compiler can also perform lock coarsening ). Merge adjacent synchronized blocks with the same locks to reduce the acquisition and release of unnecessary locks.
Therefore, do not worry about the overhead caused by non-competitive locks. Pay attention to the performance optimization in the key zones that actually have a lock competition.
Methods To reduce lock Competition
Many developers try to minimize the use of locks because they are worried about performance loss caused by synchronization, and even do not use lock protection for some critical zones that seem to have an extremely low probability of errors. This will not improve performance, but will introduce debugging errors that are difficult to debug. These errors are usually very low in probability and are difficult to reproduce. Therefore, to ensure the correctness of the program, the first step to solve the performance loss caused by the synchronization belt is not to remove the lock, but to reduce the lock competition. Generally, there are three ways to reduce the lock competition: reduce the lock holding time, reduce the frequency of the Request lock, or use other coordination mechanisms to replace the exclusive lock. These three methods contain many best practices, which are described in the following sections. Avoid time consumption calculation in the critical section
Generally, the technology that turns code into thread-safe is to add a "big lock" to the entire function ". For example, in Java, declare the entire method as synchronized. However, what we need to protect is the sharing status of the object, not the code.
|
JlM report comments
% Miss: Percentage of failed lock requests Gets: Total number of lock requests = fast + slow + rec Nonrec: the total number of non-recursive lock requests Slow: Number of times no lock is obtained for non-recursion (the thread is blocked) Fast: Number of non-recursive locks obtained = norec-slow REC: Number of recursive locks Tier2: Number of Inner-layer loops to obtain the lock on a platform that supports layer-3 spin locks. Tier3: number of times that the lock is in the outer loop to obtain the lock on a platform that supports three-layer spin locks. % Util: Lock utilization time = total lock holding time/sampling time AVER-HTM: average lock hold time = total lock hold time/total number of non-recursive locks |
|
Pass Long lock hold limits the scalability of applications. Brian Goetz in Java concurrency in practice As mentioned in the book, if an operation holds a lock for more than 2 ms and requires this lock for each operation, no matter how many idle processors are there, the application throughput will not exceed the throughput per second. 500 operations. If you can reduce the lock holding time to 1 millisecond, you can increase the lock-related throughput to 1000 per second. . In fact, here We conservatively estimate the overhead of holding the lock for a long time because it does not involve the overhead brought about by the competition of computing locks. For example, busy waiting and line switching caused by failed lock acquisition will be a waste. CPU time. The most effective way to reduce the possibility of lock competition is to shorten the lock holding time as much as possible. This can be achieved by removing the code that does not require lock protection from the synchronization block, Especially those that are expensive and potentially congested, such as I/O operations. In example 1, we use JlM (Java Lock Monitor) to view the lock usage in Java. Foo1 uses synchronized to protect the entire function. foo2 only protects variables. Maph. Aver_htm shows the holding time of each lock. We can see that after the irrelevant statement is removed from the synchronization block, the lock holding time is reduced, and the program execution time is also shortened. Example 1. Avoid time consumption calculation in the critical section
Import java. util. Map; Import java. util. hashmap;
Public class timeconsuminglock implements runnable { Private final map <string, string> maph = new hashmap <string, string> ();
Private int opnum; Public timeconsuminglock (INT on) { Opnum = on; }
Public synchronized void foo1 (int K) { String key = integer. tostring (k ); String value = Key + "value "; If (null = key) { Return; } Else { Maph. Put (Key, value ); } }
Public void foo2 (int K) { String key = integer. tostring (k ); String value = Key + "value "; If (null = key) { Return; } Else { Synchronized (this ){ Maph. Put (Key, value ); } } }
Public void run () { For (INT I = 0; I <opnum; I ++) { // Foo1 (I); // time consuming Foo2 (I); // This will be better } } }
Results from JlM report
Result of using foo1
MON-NAME [08121048] timeconsuminglock @ d7968db8 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 5318465 5318465 35 0 349190349 38 8419428
Execution time: 16106 milliseconds
Result of using foo2
The MON-NAME [d594c53c] timeconsuminglock @ d6dd67b0 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 5635938 5635938 71 0 373087821 27 8968423
Execution time: 12157 milliseconds
|
Split lock and split lock
Drop Another way to compete for low locks is to reduce the frequency of thread request locks. Lock splitting and lock striping) Yes. Independent status variables should be protected using independent locks. Sometimes developers mistakenly use a lock to protect all state variables. These technologies reduce the lock granularity Now we have improved scalability. However, these locks need to be carefully allocated to reduce the risk of deadlocks. If a lock guards multiple independent state variables, you may be able to split the lock so that each lock can guard different variables to improve scalability. With this change, the request frequency for each lock is reduced. Split locks can effectively convert most of them into non-competitive locks to improve both performance and scalability. In In Example 2, we split the lock originally used to protect two independent object variables into two locks separately protecting each object variable. In the JlM result, we can see the original lock. Splittinglock @ d6dd3078 becomes two locks: Java/util/hashset @ d6dd7be0 and Java/util/hashset @ d6dd7be0. And the number of gets requests and the degree of competition (slow, tier2, Tier3) are greatly reduced. Finally, the execution time of the program is reduced from 12981 ms to 4797 Ms. When the competition for a lock is fierce, it is likely to be split into two locks, which are highly competitive. Although this allows two threads to execute concurrently, there are some minor improvements to scalability. But it still cannot significantly improve the concurrency of multiple processors in the same system. Minute Locks can be extended and divided into several sets of lock blocks, and they belong to mutually independent objects. In this case, the locks are separated. For example, concurrenthashmap The implementation uses an array containing 16 locks, each of which protects 1/16 of hashmap. Assume that hash Even Distribution of values, which will reduce the number of requests for locks to about 1/16 of the original. This technology enables concurrenthashmap to support 16 concurrent jobs. Writer. When the High-load access of a multi-processor system requires better concurrency, the number of locks can also increase. In Example 3, we simulate Use of the separation lock in concurrenthashmap. Use four locks to protect different parts of the array. In the JlM result, we can see the original lock. Strippinglock @ d79962d8 is changed to four locks, such as Java/lang/object @ d79964b8. And the degree of competition of the lock (Tier2, Tier3) are greatly reduced. Finally, the execution time of the program is reduced from 5536 ms to 1857 Ms. Example 2. Split locks
Import java. util. hashset; Import java. util. Set;
Public class splittinglock implements runnable { Private final set <string> Users = new hashset <string> (); Private final set <string> queries = new hashset <string> (); Private int opnum; Public splittinglock (INT on ){ Opnum = on; }
Public synchronized void adduser1 (string U ){ Users. Add (U ); }
Public synchronized void addquery1 (string q ){ Queries. Add (Q ); }
Public void adduser2 (string U ){ Synchronized (users ){ Users. Add (U ); } }
Public void addquery2 (string q ){ Synchronized (queries ){ Queries. Add (Q ); } }
Public void run (){ For (INT I = 0; I <opnum; I ++ ){ String user = new string ("user "); User + = I; Adduser1 (User );
String query = new string ("query "); Query + = I; Addquery1 (query ); } } }
Results from JlM report
Use adduser1 and addquery1 results
MON-NAME [d5848cb0] splittinglock @ d6dd3078 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 9004711 9004711 101 0 482982391 44 10996987
Execution time: 12981 milliseconds
Use adduser2 and addquery2 results
MON-NAME [d5928c98] Java/util/hashset @ d6dd7be0 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 1875510 1875510 38 0 108706364 14 2546875
MON-NAME [d5928c98] Java/util/hashset @ d6dd7be0 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 272365 272365 0 0 15154239 352397 1 3042
Execution time: 4797 milliseconds
|
Example 3. Separation lock
Public class strippinglock implements runnable { Private final object [] locks; Private Static final int n_locks = 4; Private final string [] share; Private int opnum; Private int n_anum;
Public strippinglock (INT on, int anum ){ Opnum = on; N_anum = anum; Share = new string [n_anum]; Locks = new object [n_locks]; For (INT I = 0; I <n_locks; I ++) Locks [I] = new object (); }
Public synchronized void put1 (INT indx, string K ){ Share [indx] = K; // acquire the object lock }
Public void put2 (INT indx, string K ){ Synchronized (locks [indx % n_locks]) { Share [indx] = K; // acquire the corresponding lock } }
Public void run () { // The expensive put /* For (INT I = 0; I <opnum; I ++) { Put1 (I % n_anum, integer. tostring (I + 1 )); }*/ // The cheap put For (INT I = 0; I <opnum; I ++) { Put2 (I % n_anum, integer. tostring (I + 1 )); } } }
Results from JlM report
Put1 result
MON-NAME [08121228] strippinglock @ d79962d8 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 4830690 4830690 460 0 229538313 18 5010789
Execution time: 5536 milliseconds
Put2 result
MON-NAME [08121388] Java/lang/object @ d79964b8 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 4591046 4591046 1517 0 151042525 13 3016162
MON-NAME [08121330] Java/lang/object @ d79964c8 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 1717579 1717579 523 0 50596994 5 958796
MON-NAME [081213e0] Java/lang/object @ d79964d8 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 1814296 1814296 536 0 58043786 5 1113454
MON-NAME [08121438] Java/lang/object @ d79964e8 (object) % Miss gets nonrec slow rec tier2 Tier3 % util aver_htm 0 3126427 3126427 901 0 96627408 1857005 1979
Execution time: 1857 milliseconds
|
Avoid hotspot domain
In In some applications, we use a shared variable to cache commonly used computing results. This shared variable needs to be modified for each update operation to ensure its validity. For example Size, counter, and reference of the head node of the linked list. In multi-threaded applications, the shared variable must be locked. This kind of optimization method commonly used in single-threaded applications will become one of "Hot field) To limit scalability. If a queue is designed to maintain high throughput during multi-threaded access, you can consider not updating the queue size during each queue and outbound operation. To avoid this problem in concurrenthashmap, an independent counter is maintained in the array of each shard, and the lock protection is separated, instead of maintaining a global count. Alternative to exclusive locks
The third technique used to mitigate the performance impact of a competitive lock is to discard the use of an exclusive lock and use a more efficient concurrent method to manage the sharing status. Such as concurrent containers, read-write locks, immutable objects, and atomic variables. Java. util. Concurrent. locks. readwritelock implements a multi-reader-single writer lock: Multiple readers can access Shared Resources concurrently, but the writer must obtain the lock exclusively. For data structures where most operations are read operations, readwritelock provides better concurrency than an exclusive lock. Atomic variables provide methods to avoid lock contention caused by "hot point domain" updates, such as counters, sequence generators, or updates referenced by the head node of the linked list data structure. In Example 4, we use atomic operations to update each element of the array to avoid exclusive locks. The execution time of the program is reduced from 23550 ms to 842 Ms. Example 4. arrays operated by atoms
Import java. util. Concurrent. Atomic. atomiclongarray;
Public class atomiclock implements runnable { Private Final long d []; Private Final atomiclongarray; Private int a_size;
Public atomiclock (INT size ){ A_size = size; D = new long [size]; A = new atomiclongarray (size ); }
Public synchronized void set1 (INT idx, long Val ){ D [idx] = val; }
Public synchronized long get1 (INT idx ){ Long ret = d [idx]; Return ret; }
Public void set2 (INT idx, long Val ){ A. addandget (idx, Val ); }
Public long get2 (INT idx ){ Long ret = A. Get (idx ); Return ret; }
Public void run (){ For (INT I = 0; I <a_size; I ++ ){ // The slower operations // Set1 (I, I ); // Get1 (I );
// The quicker operations Set2 (I, I ); Get2 (I ); } } }
Set1 and get1 results Execution time: 23550 milliseconds
Set2 and get2 results Execution time: 842 milliseconds
|
Use concurrent containers
Slave Java. util. Concurrent The package provides a highly efficient thread-safe concurrent container. Concurrent containers ensure thread security and optimize common operations when a large number of threads are accessed. These containers are suitable for multithreading running on multi-core platforms. It is used in applications and has high performance and high scalability. The amino project provides more efficient concurrent containers and algorithms. Use immutable data and Thread Local Data
Immutable data remains unchanged throughout its lifecycle, so you can safely copy one copy in each thread for fast reading. Threadlocal Data is only locked by the thread itself. Therefore, data sharing between different threads does not occur. Threallocal It can be used to improve many existing shared data. For example, the object pool shared by all threads and the waiting queue can become the exclusive Object pool and waiting queue of each thread. Use Work-stealing scheduler instead of the traditional FIFO-queue schedue Data example.
Conclusion
Lock Is an indispensable tool for developing multi-threaded applications. As the multi-core platform becomes the mainstream today, correct use of locks will become a basic skill for developers. Despite lock-free programming and Transactional memory It has already appeared in the field of view of software developers. In the visible future, lock-based programming is still the most important parallel programming skill. We hope that the methods proposed in this article can help you use the lock correctly. . |