|
<Tr Valign = "TOP"> <TD width = "8"> Src = "// www.ibm.com/ I /c.gif"/> </TD> <TD width = "16"> Height = "16" src = "// www.ibm.com/ I /c.gif"/> </TD> <TD class = "small" Width = "122"> <p> <SPAN class = "Ast"> Javascript is not displayed </Span> </P> </TD> </tr>
|
|
Print this page
|
|
|
Send this page as an email
|
|
|
Level: elementary Brian Goetz Brian@quiotix.com ), Chief Consultant, quiotix Corp September 28, 2003
Eaglea's util.concurrent In addition to many other useful concurrent constructor blocks, a package also contains some major Collection types. List And Map High-performance, thread-safe implementation. In this month's Java Theory and Practice , Briangoetz shows you how to use ConcurrentHashMap Replace Hashtable Or synchronizedMap And how many concurrent programs will benefit. You can Forum Share your thoughts with the author and other readers (you can also click Discussion Enter the Forum ).
The first associated collection class in the Java class library is Hashtable , It is JDK Part of 1.0. Hashtable It provides an easy-to-use, thread-safe, and associated map function, which is of course convenient. However, thread security comes at a cost ―― Hashtable All methods are synchronized. In this case, non-competitive synchronization will cause a considerable performance cost. Hashtable Successor HashMap Is a part of the collection framework in JDK. It provides an unsynchronized base class and a synchronous package. Collections.synchronizedMap To solve the thread security problem. By separating basic functions from thread security, Collections.synchronizedMap Users who need to be synchronized can have synchronization, but users who do not need to be synchronized do not have to pay the price for synchronization. Hashtable And synchronizedMap The simple method for obtaining synchronization (Synchronization Hashtable Or synchronized Map Each method in the package object) has two major shortcomings. First, this method is an obstacle to scalability, because only one thread can access the hash table at a time. At the same time, this is still insufficient to provide real thread security, and many common hybrid operations still require additional synchronization. Although such get() And put()
Such simple operations can be safely completed without additional synchronization, but there are still some common operation sequences. For example, iteration or put-if-absent (Put If empty) requires external synchronization to avoid data contention. Conditional thread security
Synchronized set package synchronizedMap And synchronizedList Is also called Conditional thread security -All A single operation is thread-safe, but an operation sequence composed of multiple operations may result in data contention, because the control flow in the Operation Sequence depends on the results of the previous operation. Listing 1 The first section shows the public put-if-absent statement block-if an entry is not Map . Unfortunately, In containsKey() Method return put() During the call period, another thread may also insert a value with the same key. If you want to ensure that only one insert is performed, you need Map m The synchronization block for synchronization wraps this pair of statements. Listing 1 Other examples are related to iteration. In the first example, List.size() The results may become invalid during loop execution, because another thread can delete entries from this list. If the time is not appropriate, one entry is deleted by another thread after the last iteration of the loop. , Then List.get() Returns null , And doSomething() It is likely to throw NullPointerException Exception. So what measures can be taken to avoid this situation? If you are iterating List The other thread also May be accessing this List , You must use synchronized Block List Packed, In List 1. List . Although this solution solves the problem of data contention, it pays more for concurrency because it locks the entire process during iteration. List Other threads are blocked so that they cannot access this list for a long time. The Collection framework introduces an iterator used to traverse a list or other sets, thus optimizing the process of iteration of elements in a set. However java.util The iterator implemented in the Collection class can easily crash, that is, if a thread is Iterator When traversing the set, another thread also modifies this Set, then the following Iterator.hasNext() Or Iterator.next() The call will throw ConcurrentModificationException Exception. Take In this example ConcurrentModificationException Exception, you must Use List l Synchronized on synchronized Block List To lock the entire List . (Alternatively, you can call List.toArray() , In The array is iterated without synchronization, but it is costly if the list is large ). Listing 1. common competition conditions in the synchronized Map
Map m = Collections.synchronizedMap(new HashMap()); List l = Collections.synchronizedList(new ArrayList()); // put-if-absent idiom -- contains a race condition // may require external synchronization if (!map.containsKey(key)) map.put(key, value); // ad-hoc iteration -- contains race conditions // may require external synchronization for (int i=0; i<list.size(); i++) { doSomething(list.get(i)); } // normal iteration -- can throw ConcurrentModificationException // may require external synchronization for (Iterator i=list.iterator(); i.hasNext(); ) { doSomething(i.next()); }
|
The illusion of trust
synchronizedList And synchronizedMap The conditional thread security also brings a hidden risk. ―― Developers will assume that these sets are all synchronized, so they are all thread-safe, so they will be negligent in correctly synchronizing hybrid operations. The result is that although these programs work normally when the load is low, they will start to throw NullPointerException Or ConcurrentModificationException 。
Scalability problems
Scalability refers to the performance of an application's throughput when the workload and available processing resources increase. A Scalable program can use more processors, memory, or I/O bandwidth Handle larger workloads. Locking a shared resource to obtain exclusive access will lead to a scalability bottleneck-it makes other threads unable to access that resource, even if there is an idle processor that can call Some threads do not help. To achieve scalability, we must eliminate or reduce our dependency on exclusive resource locks. Synchronous collection wrapper and earlier Hashtable And Vector Class brings a bigger problem: they are in a single lock . This means that only one thread can access the set at a time. If one thread is reading one Map , Then all others want to read or write this Map The thread must wait. The most common Map Operation, get() And put() , May be more processing than on the surface-when traversing the bucket of a hash table to find a specific key, get() Must be called for a large number of candidate buckets Object.equals() . If hashCode() Functions cannot evenly distribute values in the entire hash table, or there are a large number of hash conflicts. Therefore, some bucket chains are much longer than other chains, traversing a long Hash Chain and calling a certain percentage of elements on the hash chain equals() It is a very slow task. Under the preceding conditions get() And put() The high cost is not only the slow access process, but when a thread is traversing the hash chain, all other threads are locked out and cannot access this Map . (A hash table stores objects in a bucket based on a key called hash. Hash Value is a number calculated from the value of an object. Each hash Value creates a new bucket. To search for an object, you only need to calculate the hash value of the object and search for the corresponding bucket. By quickly finding the corresponding bucket, you can reduce the number of objects to be searched. Note by the translator) get() Execution may take a lot of time, and in some cases, the conditional thread security issue that has been discussed earlier will make it much worse. Listing 1 The contention condition demonstrated in is often used to keep the lock for a single set for a long time after a single operation is completed. If you want to keep the lock on the set during the entire iteration, other threads will stay out of the lock for a long time and wait for unlocking. Instance: A simple Cache
Map One of the most common applications in server applications is to implement cache。 Server applications may need to cache file content, generated pages, database query results, Dom Trees Related to parsed XML files, and many other types of data. The main purpose of cache is to reuse the results obtained from the previous processing. To reduce service time and increase throughput. A typical feature of the cache workload is that the retrieval is much larger than the update, so (ideally) the cache can provide excellent get() Performance. However The cache that hinders performance is not as good as the cache. If you use synchronizedMap To implement a cache, you introduce a potential scalability bottleneck in your application. Because only one thread can access Map , This Some threads include Map The thread that extracts a value from and (key, value) Insert a thread into the map. Reduce lock Granularity
Improve HashMap Concurrency also provides thread security. One way is to abolish the use of a lock on the entire table, while the use of a lock on each bucket of the hash table (or, more often, a lock pool is used, and each lock is responsible for protecting several buckets) . This means that multiple threads can access one Map Instead of competing for a single set range lock. This method can directly improve the scalability of insert, search, and remove operations. Unfortunately, this concurrency is at a certain cost-This makes the entire Methods for performing operations on a set (for example size() Or isEmpty() ) Implementation is more difficult, because these methods require a lot of locks at a time, and there is a risk of returning incorrect results. However, in some cases, such as implementing cache, this is a good compromise-Because retrieval and insertion operations are frequent size() And isEmpty() The operation is much less.
Concurrenthashmap
util.concurrent Package ConcurrentHashMap Class (also appears in JDK In December 1.5 java.util.concurrent Package) is Map Than synchronizedMap It provides much better concurrency. Multiple read operations can almost always be executed concurrently, while simultaneous read and write operations can also be executed concurrently, concurrent write operations can still be performed concurrently from time to time (related classes also provide the concurrency of multiple similar read threads, but only one active write thread is allowed) 。ConcurrentHashMap Designed to Optimize search operations; in fact, successful get() After the operation is complete, there are usually no locked resources. It takes some skill to obtain thread security without using the lock, and the Java Memory Model (Java Memory Model. ConcurrentHashMap Implementation, add util.concurrent Other parts of the package have been taken into consideration by concurrent experts who study correctness and thread security. In the next month's article, we will look ConcurrentHashMap Implementation Details. ConcurrentHashMap A higher concurrency is achieved by slightly easing its commitment to the caller. The search operation can return the value inserted by the most recently completed insert operation, you can also return the value that is added by the concurrent insert operation at the pace (but will never return a meaningless result ). By ConcurrentHashMap.iterator() Returned Iterators Returns at most one element at a time and never throws ConcurrentModificationException But it may not reflect the insert or remove operations after the iterator is built. In When a set is iterated, the thread security can be provided without the table range lock. In any application that does not rely on locking the entire table to prevent updates, you can use ConcurrentHashMap To replace synchronizedMap Or Hashtable . The above improvements enable ConcurrentHashMap Ability to provide Ratio Hashtable High scalability, and there is no need to lose efficiency for many types of public cases (such as shared cache. How much better?
Table 1 Hashtable And ConcurrentHashMap The scalability is roughly compared. During each running process, N Threads concurrently execute an endless loop in which these threads Hashtable Or ConcurrentHashMap To retrieve random keys. Value. put() There is a 80% retrieval failure rate during operations, and a 1% retrieval success rate during operations. The test platform is a dual-processor Xeon system, and the operating system is Linux. The data shows the running time of 10,000,000 iterations in milliseconds. Concurrenthashmap Operations are standardized into one thread for statistics. As you can see, when the number of threads increases, ConcurrentHashMap Performance continues to rise, while Hashtable The performance immediately drops with the emergence of contention locks. Compared to common server applications, the number of threads in this test seems a little small. However, because every thread is constantly operating on the table, this is basically the same as the contention of more threads using the table in the actual environment. Table 1. Comparison of hashtable and concurrenthashmap in terms of scalability
Number of threads |
Concurrenthashmap |
Hashtable |
1 |
1.00 |
1.03 |
2 |
2.59 |
32.40 |
4 |
5.58 |
78.23 |
8 |
13.21 |
163.48 |
16 |
27.58 |
341.21 |
32 |
57.27 |
778.41 |
Copyonwritearraylist
In concurrent applications that traverse operations much more than insert or remove operations CopyOnWriteArrayList Class substitution ArrayList . If it is used to store a listener list, such as in an AWT or swing application or in a common JavaBean, this situation is very common (related CopyOnWriteArraySet Use one CopyOnWriteArrayList To achieve Set Interface) . If you are using a common ArrayList To store a listener list, as long as the list is variable and may be accessed by multiple threads, you You must either lock the entire list during the iteration or during the clone operation before the iteration. Both of these methods have high overhead. When you perform a list operation that will change the list, CopyOnWriteArrayList Instead of creating a new copy for the List, its iterator will certainly be able to return the list status when the iterator is created without throwing ConcurrentModificationException . You do not need to clone the list or lock the list during iteration before performing an iteration on the list. Set the list, because the list copy displayed by the iterator remains unchanged. In other words, CopyOnWriteArrayList Contains a mutable reference to an immutable array. Therefore, as long as the reference is retained, you can get the benefits of immutable thread security without locking. Set the list.
Conclusion
Synchronous collection class Hashtable And Vector And the synchronized package class. Collections.synchronizedMap And Collections.synchronizedList , Is Map And List Provides basic conditional thread security implementation. However, some factors make them unsuitable for applications with high concurrency-their The single lock feature of the Set range is an obstacle to scalability. In many cases, a set must be locked for a long period of time to prevent ConcurrentModificationException S exception. ConcurrentHashMap And CopyOnWriteArrayList The implementation provides higher concurrency and thread security, but it only provides a discount on the caller's promise. ConcurrentHashMap And CopyOnWriteArrayList Not when you use HashMap Or ArrayList But they are designed to optimize some specific public solutions. Many concurrent applications will benefit from their use. |