Java Theory and Practice: Concurrent collection Classes

Source: Internet
Author: User
Tags time in milliseconds wrappers concurrentmodificationexception

Java Theory and Practice: Concurrent collection Classes
util.concurrent In addition to many other useful concurrent building blocks, the Douglea package contains some of the major collection types List and Map high-performance, thread-safe implementations. In this month's Java theory and Practice , Briangoetz shows you ConcurrentHashMap Hashtable synchronizedMap How many concurrent programs will benefit from replacing or.   You can share your thoughts with the author and other readers in this article (you can also click on the discussion at the top or bottom of the article to enter the forum).

The first associated collection class that appears in the Java class Library is the one that Hashtable is part of JDK 1.0. Hashtableprovides an easy-to-use, thread-safe, and associated map feature, which is of course convenient. However, thread safety comes at a cost- Hashtable all methods are synchronized. At this point, a non-competitive synchronization can result in considerable performance costs. Hashtableis a successor to the HashMap collection framework in JDK1.2, which Collections.synchronizedMap resolves the thread safety problem by providing an unsynchronized base class and a synchronous wrapper. By separating the basic functionality from thread security, Collections.synchronizedMap allowing users who need synchronization to have synchronization, users who do not need synchronization pay a price for synchronization.

Hashtableand synchronizedMap The simple method taken to get synchronization (each method in sync or in a Hashtable synchronized Map wrapper object) has two major deficiencies. First, this approach is a barrier to scalability because only one thread can access the hash table at a time. At the same time, this is still not enough to provide true thread safety, and many common hybrid operations still require additional synchronization. While get() simple operations such as and and put() the like can be done safely without the need for additional synchronization, there are some common sequences of operations, such as iterations or put-if-absent (empty), which require external synchronization to avoid data contention.

Conditional thread Security

Synchronous collection Wrappers synchronizedMap and synchronizedList , sometimes referred to as conditional ground security -all individual operations are thread-safe, but a sequence of operations consisting of multiple operations can cause data contention because the control flow in the action sequence depends on the result of the preceding operation. The first fragment in Listing 1 shows the common put-if-absent statement block-If an entry Map is not in, add this entry. Unfortunately, during the containsKey() time the method is returned to the put() method being called, there may be another thread that also inserts a value with the same key. If you want to make sure that you have only one insert, you need to wrap the pair of statements with a Map m synchronization block that synchronizes.

The other examples in Listing 1 are related to iterations. In the first example, the result of list.size () may become invalid during the execution of a loop, because another thread can remove an entry from the list. If the timing is not right, when an entry is deleted by another thread just after the last iteration of the loop, list.get () returns null , and dosomething () The is likely to throw a nullpointerexception exception. So what measures can be taken to avoid this situation? If another thread may be accessing this list while you are iterating over a list , you must use a synchronized block to make this The list wraps up and synchronizes on the list 1 to lock the entire list . This solves the problem of data contention, but costs more in terms of concurrency, because locking the entire list during an iteration blocks other threads from accessing the list for a long period of time.

The collection framework introduces an iterator that iterates through a list or other collection, optimizing the process of iterating over the elements in a collection. However, an java.util iterator implemented in a collection class can easily crash, that is, if Iterator another thread is modifying the collection while one thread is traversing the collection, then the next Iterator.hasNext() or Iterator.next() call throws an ConcurrentModificationException exception. Take just this example, if you want to prevent an ConcurrentModificationException exception, then when you are iterating, you must List l wrap it up with a block that is synchronized on it synchronized , and List lock the whole List . (Alternatively, you can call List.toArray() to iterate over the array in the case of a different step, but this is expensive if the list is larger).

Listing 1: Common competition conditions in a synchronized map      Map m = collections.synchronizedmap (New HashMap ());     List L = Collections.synchronizedlist (New ArrayList ());    //Put-if-absent idiom--Contains a race condition    //may require external synchronization     if (! Map.containskey (key))        Map.put (key, value);    //Ad-hoc Iteration--Contains race conditions    //may require external synchronization      for (int i=0; i<list.size (); i++) {       dosomething (List.get (i));     }    //Normal iteration--can throw concurrentmodificationexception     /may require external synchronization     for (Iterator i=list.iterator (); I.hasnext (); ) {       dosomething (I.next ());    &nbsP }   

The Illusion of trust

synchronizedListand synchronizedMap the provided conditional thread security also poses a potential hazard ―― developers will assume that because these collections are synchronized, they are thread-safe so they will be negligent in the event of a proper synchronization of the hybrid operation. The result is that even though these programs work on a lighter load, they start to throw NullPointerException or ConcurrentModificationException

back to top of page

Scalability Issues

Scalability refers to the performance of an application's throughput as it increases in workload and available processing resources. A scalable program can handle larger workloads appropriately by using more processor, memory, or I/O bandwidth. Locking a shared resource for exclusive access this approach creates a scalability bottleneck-it makes it impossible for other threads to access that resource, even if there are idle processors that can invoke those threads. To achieve scalability, we have to eliminate or reduce our reliance on exclusive resource locks.

The larger problem with synchronized collection wrappers and earlier Hashtable and Vector classes is that they are synchronized on a single lock. This means that only one thread can access the collection at a time, and if one thread is reading one Map , then all other threads that want to read or write Map will have to wait. The most common Map operations, get() and put() , may be more processing than on the surface-when traversing a bucket of a hash table to find a particular key, a get() large number of candidate buckets must be called Object.equals() . If the function used by the key class hashCode() does not distribute value evenly across the entire hash table, or if there is a large number of hash collisions, then some bucket chains will be much longer than others, while traversing a long hash chain and a percentage of elements on the hash chain are called It's a equals() slow thing. In the above conditions, the get() cost of the call and put() the problem is not only the slow access process, but also, when the thread is traversing that hash chain, all other threads are locked out and cannot access this Map .

(a hash table stores objects in buckets based on a numeric keyword (key) called Hash.) The hash value is a number calculated from the value in the object. Each of the different hash value creates a new bucket. To find an object, you only need to calculate the hash value of the object and search for the corresponding bucket. By quickly finding buckets, you can reduce the number of objects you need to search. Translator Note)

get()Execution can take a lot of time, and in some cases, a conditional thread-safety issue that has been discussed earlier can make the problem much worse. The race condition shown in Listing 1 often causes a lock on a single collection to persist for a longer period of time after a single operation has finished executing. If you want to keep the lock on the collection during the entire iteration, the other threads will stay out of the lock for a long time, waiting to be unlocked.

Example: a simple cache

MapOne of the most common applications in server applications is the implementation of a cache。 server application that may require caching of file content, generated pages, results of database queries, Dom trees associated with parsed XML files, and many other types of data. The main purpose of the cache is to reuse the results from the previous processing to reduce service time and increase throughput. A typical feature of the cache workload is that the retrieval is much larger than the update, so that (ideally) the cache provides very good get() performance. However, using a cache that interferes with performance is not as good as caching at all.

If you use it synchronizedMap to implement a cache, you introduce a potential scalability bottleneck in your application. Because only one thread can be accessed Map at a time, these threads include those threads that are going to Map fetch a value from and those that want to insert a new (key, value) pair into the map.

Reduce lock granularity

HashMapone way to improve concurrency while also providing thread safety is to revoke the use of a lock on the entire table, using a lock on each bucket of the hash table (or, more commonly, using a lock pool where each lock is responsible for protecting several buckets). This means that multiple threads can access a Map different part of the same time without having to contend with a single collection-wide lock. This approach can directly improve the scalability of the insert, retrieve, and remove operations. Unfortunately, this concurrency is at a cost-which makes it more difficult to implement some methods (such as or) that manipulate the entire collection size() , isEmpty() because these methods require a lot of locks to be acquired at once, and there is a risk of returning incorrect results. However, for some scenarios, such as implementing the cache, this is a good tradeoff-because the retrieval and insertion operations are more frequent, size() and the isEmpty() operations are much less.

back to top of page

Concurrenthashmap

The Concurrenthashmap class in the

util.concurrent Package (also appears in JDK 1.5, java.util.concurrent package) is a thread-safe implementation of the Map , which provides much better concurrency than synchronizedmap . Multiple reads can almost always be performed concurrently, while read and write operations can often be performed concurrently, while simultaneous writes can still be made concurrently (the related classes also provide concurrency of a similar number of read threads, but only one active write thread is allowed) . The Concurrenthashmap is designed to optimize the retrieval operation; In fact, a successful get () operation does not usually have a locked resource at all. Getting thread safety without a lock requires some finesse, and requires a deep understanding of the details of the Java memory model. The Concurrenthashmap implementation, along with other parts of the util.concurrent package, has been addressed by concurrency experts who have studied correctness and thread safety. In the next month's article, we'll look at the details of the implementation of Concurrenthashmap .

ConcurrentHashMapHigher concurrency is achieved by slightly loosening its commitment to the caller. The retrieval operation can return the value inserted by the most recently completed insert operation, or it can return a value added to the step of a concurrent insert operation (but will never return a meaningless result). Returns ConcurrentHashMap.iterator() Iterators a maximum of one element at a time and never throws an ConcurrentModificationException exception, but may or may not reflect an insert or remove operation that occurs after the iterator is built. When iterating over a collection, there is no need for a table-scoped lock to provide thread safety. In any application that does not rely on locking the entire table to prevent updates, you can use ConcurrentHashMap to override synchronizedMap or Hashtable .

These improvements make ConcurrentHashMap it possible to provide Hashtable much higher scalability, and there is no need to lose efficiency for many types of common cases, such as shared caches.

Well, how much?

Table 1 provides Hashtable a rough comparison of the scalability of the and ConcurrentHashMap . During each run, n threads execute a dead loop concurrently, in which the threads Hashtable retrieve the random key value from one or the other, ConcurrentHashMap discovering that put() there is a 80% retrieval failure rate when performing an operation and 1% Success rate of the search. The test platform is a dual-processor Xeon system, and the operating system is Linux. The data shows the elapsed time in milliseconds of 10,000,000 iterations, which is counted when the ConcurrentHashMap的 operation is normalized to a single thread. As you can see, performance continues to rise when threads are increased to multiple, ConcurrentHashMap while Hashtable performance is dropped immediately as the race lock situation arises.

The number of threads in this test looks a little less than the typical server application. However, because each thread is constantly manipulating the table, this is basically the same as contention for a larger number of threads using the table in real-world situations.

Table 1.Hashtable and Concurrenthashmap in terms of scalability comparison

Number of Threads     Concurrenthashmap Hashtable 11.001.0322.5932.4045.5878.23813.21163.481627.58341.213257.27778.41

back to top of page

Copyonwritearraylist

In concurrent applications where traversal operations are significantly more than insertions or removal operations, classes are generally CopyOnWriteArrayList substituted ArrayList . This is common if it is used to hold a list of listeners (listener), such as in an AWT or swing application, or in a common JavaBean (related to CopyOnWriteArraySet using one CopyOnWriteArrayList to implement an Set interface).

If you are using a generic ArrayList to hold a list of listeners, as long as the list is mutable and may be accessed by more than one thread, you must either lock the entire list during an iterative operation or during the cloning operation before the iteration, both of which are expensive. When performing an operation on a list that causes a change in the list, CopyOnWriteArrayList instead of creating a completely new copy of the list, its iterators will certainly be able to return the state of the list when the iterator is created, without throwing it ConcurrentModificationException . You do not have to clone the list before iterating over the list or lock the list during the iteration because the copy of the list that the iterator sees is constant. In other words, it CopyOnWriteArrayList contains a mutable reference to an immutable group, so you can get the benefits of immutable thread security without locking the list as long as you keep that reference.

back to top of page

Conclusion

Synchronizes the collection class Hashtable and Vector , as well as the synchronized wrapper Collections.synchronizedMap class Collections.synchronizedList , and Map List provides the basic conditional thread-safe implementation. However, some factors make them not suitable for highly concurrent applications-their collection-scoped single-lock feature is a barrier to scalability, and many times a collection must be locked for a longer period of time to prevent ConcurrentModificationException s-exceptions. ConcurrentHashMapand CopyOnWriteArrayList implementations provide higher concurrency while preserving thread safety, but with a discount on the commitment of its callers. ConcurrentHashMapand are CopyOnWriteArrayList not HashMap ArrayList necessarily useful anywhere you use them, but they are designed to optimize specific public solutions. Many concurrent applications will benefit from the use of them.

(Forward http://hi.baidu.com/netpet/blog/item/ 30c701f41819466cdcc474cf.html)

Java Theory and Practice: Concurrent collection Classes

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.