collection classes in Java-concurrency

Source: Internet
Author: User
Tags time in milliseconds wrappers concurrentmodificationexception

The first associated collection class that appears in the Java class Library is Hashtable, which is part of JDK 1.0. Hashtable provides an easy-to-use, thread-safe, and associated map feature, which is of course convenient. However, thread safety is a cost-――hashtable all methods are synchronized. At this point, a non-competitive synchronization can result in considerable performance costs. Hashtable's successor, HashMap, appears as part of the collection framework in JDK1.2, which resolves the thread safety problem by providing an unsynchronized base class and a synchronized wrapper collections.synchronizedmap. By separating the basic functionality from thread security, Collections.synchronizedmap allows users who need synchronization to have synchronization, while users who do not need synchronization pay a price for synchronization.

The simple approach taken by Hashtable and Synchronizedmap to get synchronization (each method in a synchronous Hashtable or in a synchronized Map wrapper object) has two major deficiencies. First, this approach is a barrier to scalability because only one thread can access the hash table at a time. At the same time, this is still not enough to provide true thread safety, and many common hybrid operations still require additional synchronization. Although simple operations such as get () and put () can be done safely without the need for additional synchronization, there are some common sequences of operations, such as iterations or put-if-absent (empty), which require external synchronization to avoid data contention.

Conditional Thread Security

Synchronized collection wrappers Synchronizedmap and synchronizedlist, sometimes referred to as conditional ground security--all individual operations are thread-safe, but a sequence of operations consisting of multiple operations can result in data contention, Because the control flow in the action sequence depends on the result of the preceding operation. The first fragment in Listing 1 shows the common put-if-absent statement block-If an entry is not in Map, add this entry. Unfortunately, the time the ContainsKey () method returns to the put () method is called, there may be another thread that also inserts a value with the same key. If you want to ensure that there is only one insertion, you need to wrap the pair of statements with a synchronization block that synchronizes the Map m.

The other examples in Listing 1 are related to iterations. In the first example, the result of List.size () may become invalid during the execution of a loop, because another thread can remove an entry from the list. If the timing is not right, when an entry is deleted by another thread just after the last iteration of the Loop, List.get () returns NULL, and DoSomething () is likely to throw a NullPointerException exception. So what measures can be taken to avoid this situation? If another thread may be accessing the list while you are iterating over a list, you must wrap the list with a synchronized block in the iteration, synchronizing the list 1 to lock the entire list. This solves the problem of data contention, but it pays more for concurrency, because locking the entire list during an iteration blocks other threads from accessing the list for a long period of time.

The collection framework introduces an iterator that iterates through a list or other collection, optimizing the process of iterating over the elements in a collection. However, an iterator implemented in the Java.util collection class can easily crash, that is, if a thread is traversing a collection through a Iterator, and another thread is modifying the collection, then the next iterator.hasnext () or The Iterator.next () call throws a Concurrentmodificationexception exception. Take just this example, if you want to prevent concurrentmodificationexception exceptions, then when you are iterating, you must wrap the list with a synchronized block that is synchronized on the list L to lock The entire List. (Alternatively, you can call List.toarray () to iterate over the array in the case of a different step, but it is expensive if the list is larger).

Listing 1. Common competition conditions in a synchronized map

    map m = collections.synchronizedmap (New hashmap ());     list l = collections.synchronizedlist (New arraylist ());     // put-if-absent idiom -- contains a race condition     // may require external synchronization    if  (! Map.containskey (Key))       map.put (key, value);     //  ad-hoc iteration -- contains race conditions    // may  require external synchronization    for  (int i=0; i< List.size ();  i++)  {      dosomething (List.get (i));     }    // normal iteration -- can throw  Concurrentmodificationexception    // may require external synchronization    for  (Iterator i =list.iterator ();  i.hasnext (); )  {      dosomething (I.next ());     }

The Illusion of trust

The conditional threading security provided by Synchronizedlist and Synchronizedmap also poses a hidden danger-the developer assumes that because these collections are synchronized, they are thread-safe, so that they will be negligent in synchronizing the mixed operation correctly. The result is that even though these programs work on a lighter load, they start to throw nullpointerexception or concurrentmodificationexception once the load is heavier.

Scalability Issues

Scalability refers to the performance of an application's throughput as it increases in workload and available processing resources. A scalable program can handle larger workloads appropriately by using more processor, memory, or I/O bandwidth. Locking a shared resource for exclusive access this approach creates a scalability bottleneck-it makes it impossible for other threads to access that resource, even if there are idle processors that can invoke those threads. To achieve scalability, we have to eliminate or reduce our reliance on exclusive resource locks.

The larger problem with synchronized collection wrappers and earlier Hashtable and Vector classes is that they synchronize on a single lock. This means that only one thread can access the collection at a time, and if a thread is reading a map, then all other threads that want to read or write the map must wait. The most common MAP operations, get () and put (), may be more processed than on the surface-when a bucket of a hash table is traversed in order to find a particular key, get () must call Object.Equals () on a large number of candidate buckets. If the hashcode () function used by the key class does not distribute the value evenly across the entire hash table, or if there is a large number of hash collisions, then some bucket chains will be much longer than the other chains. It is a slow thing to traverse a long hash chain and call Equals () on a percentage of the hash chain. In the above conditions, the high cost of calling get () and put () is not only the slowness of the access process, but also, when the thread is traversing that hash chain, all other threads are locked out and cannot access the MAP.

(a hash table stores objects in buckets based on a numeric keyword (key) called Hash.) The hash value is a number calculated from the value in the object. Each of the different hash value creates a new bucket. To find an object, you only need to calculate the hash value of the object and search for the corresponding bucket. By quickly finding buckets, you can reduce the number of objects you need to search. Translator Note)

Get () can take a lot of time to execute, and in some cases a conditional thread security issue that has been discussed earlier can make the problem much worse. The race condition shown in Listing 1 often causes a lock on a single collection to persist for a longer period of time after a single operation has finished executing. If you want to keep the lock on the collection during the entire iteration, the other threads will stay out of the lock for a long time, waiting to be unlocked.

Example: a simple cache

One of the most common applications of MAP in server applications is to implement a cache. Server apps may need to cache file content, generated pages, results of database queries, Dom trees associated with parsed XML files, and many other types of data. The main purpose of the cache is to reuse the results from the previous processing to reduce service time and increase throughput. A typical feature of the cache workload is that the retrieval is much larger than the update, so (ideally) the cache can provide very good get () performance. However, using a cache that interferes with performance is not as good as caching at all.

If you use Synchronizedmap to implement a cache, you introduce a potential scalability bottleneck in your application. Because only one thread can access the map at a time, these threads include those threads that are going to fetch a value from the map and those that are inserting a new (key, value) pair into the map.

Reduce lock granularity

One way to increase the concurrency of HASHMAP while also providing thread safety is to revoke the use of a lock on the entire table, using a lock on each bucket of the hash table (or, more commonly, using a lock pool where each lock is responsible for protecting several buckets). This means that multiple threads can access different parts of a Map at the same time without having to contend with a single collection-wide lock. This approach can directly improve the scalability of the insert, retrieve, and remove operations. Unfortunately, this concurrency is paid for at some cost-this makes it more difficult to implement some methods (such as size () or isEmpty ()) that manipulate the entire collection, because these methods require a lot of locks to be acquired at once, and there is a risk of returning incorrect results. However, for some scenarios, such as implementing the cache, this is a good tradeoff-because the retrieval and insertion operations are more frequent, and the size () and isEmpty () operations are much less.

Concurrenthashmap

The Concurrenthashmap class in the Util.concurrent package (which will also appear in the Java.util.concurrent package in JDK 1.5) is a thread-safe implementation of the MAP, compared to Synchronizedmap, It provides much better concurrency. Multiple reads can almost always be performed concurrently, while read and write operations can often be performed concurrently, while simultaneous write operations can still be performed concurrently (the related classes also provide concurrency of a similar number of read threads, but only one active write thread is allowed). Concurrenthashmap is designed to optimize the retrieval operation; In fact, a successful get () operation does not usually have a locked resource at all. Getting thread safety without a lock requires some finesse, and requires a deep understanding of the details of the Java memory model. The implementation of CONCURRENTHASHMAP, together with the rest of the Util.concurrent package, has been addressed by concurrency experts who have studied correctness and thread safety. In the next month's article, we'll look at the details of Concurrenthashmap's implementation.

The concurrenthashmap gets higher concurrency by slightly loosening its commitment to the caller. The retrieval operation can return the value inserted by the most recently completed insert operation, or it can return a value added to the step of a concurrent insert operation (but will never return a meaningless result). The iterators returned by Concurrenthashmap.iterator () returns at most one element at a time and never throws an Concurrentmodificationexception exception. However, it may or may not reflect the insert or remove operations that occurred after the iterator was built. When iterating over a collection, there is no need for a table-scoped lock to provide thread safety. In any application that does not rely on locking an entire table to prevent updates, you can use Concurrenthashmap to replace Synchronizedmap or Hashtable.

These improvements enable CONCURRENTHASHMAP to provide much higher scalability than Hashtable, and there is no need to lose efficiency for many types of common cases, such as shared caches.

Well, how much?

Table 1 provides a rough comparison of the scalability of Hashtable and Concurrenthashmap. During each run, N threads execute a dead loop concurrently, in which the threads retrieve the random key value from a Hashtable or concurrenthashmap and find that there is a 80% retrieval failure rate when the put () operation is executed. There is a 1% success rate of retrieval when performing operations. The test platform is a dual-processor Xeon system, and the operating system is Linux. The data shows the elapsed time in milliseconds of 10,000,000 iterations, which is counted in the case of standardizing concurrenthashmap operations to a single thread. As you can see, the performance of the Concurrenthashmap continues to rise as the thread increases to multiple, while the performance of the Hashtable is immediately lowered as the contention lock occurs.

The number of threads in this test looks a little less than the typical server application. However, because each thread is constantly manipulating the table, this is basically the same as contention for a larger number of threads using the table in real-world situations.

Table 1.Hashtable and Concurrenthashmap in terms of scalability comparison

Number of threads Concurrenthashmap Hashtable

1 1.00 1.03

2 2.59 32.40

4 5.58 78.23

8 13.21 163.48

341.21 27.58

778.41 57.27

Copyonwritearraylist

In concurrent applications where traversal operations are significantly more than insertions or removal operations, the Copyonwritearraylist class is generally used instead of ArrayList. This is common if it is used to hold a list of listeners (listener), such as in an AWT or swing application, or in a common JavaBean (the associated Copyonwritearrayset uses a Copyonwritearraylist to implement the Set interface).

If you are using a normal ArrayList to hold a list of listeners, as long as the list is mutable and may be accessed by more than one thread, you must either lock the entire list during an iterative operation or during the cloning operation before the iteration, both of which are expensive. When performing an operation on a list that causes a change in the list, Copyonwritearraylist does not create a completely new copy of the list, and its iterator will definitely return the state of the list when the iterator is created, without throwing Concurrentmodificationexception. You do not have to clone the list before iterating over the list or lock the list during the iteration because the copy of the list that the iterator sees is constant. In other words, Copyonwritearraylist contains a mutable reference to an immutable group, so you can get the benefits of immutable thread security without locking the list as long as you keep that reference.

Conclusion

Synchronized collection classes Hashtable and vectors, as well as synchronized wrapper classes Collections.synchronizedmap and Collections.synchronizedlist, provide basic conditional threading for MAP and List Secure implementation. However, some factors make them not suitable for highly concurrent applications-their collection-scoped single-lock feature is a barrier to scalability, and many times a collection must be locked for a longer period of time to prevent Concurrentmodificationexception s exception. The Concurrenthashmap and copyonwritearraylist implementations provide higher concurrency while preserving thread safety, but with a little discount on the commitment of their callers. Concurrenthashmap and copyonwritearraylist are not necessarily useful anywhere you use HASHMAP or ArrayList, but they are designed to optimize specific public solutions. Many concurrent applications will benefit from the use of them.


collection classes in Java-concurrency

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.