Java Theory and Practice: Concurrent collection class

Source: Internet
Author: User
Tags concurrentmodificationexception
Java Theory and Practice:
Concurrent collection class

Concurrenthashmap and copyonwritearraylist provide thread security and improved scalability.

Document options
<Tr
Valign = "TOP"> <TD width = "8"> Src = "// www.ibm.com/ I /c.gif"/> </TD> <TD width = "16"> Height = "16" src = "// www.ibm.com/ I /c.gif"/> </TD> <TD class = "small"
Width = "122"> <p> <SPAN class = "Ast"> Javascript is not displayed
</Span> </P> </TD> </tr>


Print this page

Send this page as an email

Level: elementary

Brian Goetz
Brian@quiotix.com
), Chief Consultant, quiotix Corp

September 28, 2003

Eaglea's
util.concurrent
In addition to many other useful concurrent constructor blocks, a package also contains some major Collection types.
List
And
Map
High-performance, thread-safe implementation. In this month's
Java Theory and Practice
, Briangoetz shows you how to use
ConcurrentHashMap
Replace
Hashtable
Or
synchronizedMap
And how many concurrent programs will benefit. You can
Forum
Share your thoughts with the author and other readers (you can also click
Discussion
Enter the Forum ).


The first associated collection class in the Java class library is
Hashtable
, It is JDK
Part of 1.0.
Hashtable
It provides an easy-to-use, thread-safe, and associated map function, which is of course convenient. However, thread security comes at a cost ――
Hashtable
All methods are synchronized.
In this case, non-competitive synchronization will cause a considerable performance cost.
Hashtable
Successor
HashMap
Is a part of the collection framework in JDK. It provides an unsynchronized base class and a synchronous package.
Collections.synchronizedMap
To solve the thread security problem.
By separating basic functions from thread security,
Collections.synchronizedMap
Users who need to be synchronized can have synchronization, but users who do not need to be synchronized do not have to pay the price for synchronization.

Hashtable
And
synchronizedMap
The simple method for obtaining synchronization (Synchronization
Hashtable
Or synchronized
Map
Each method in the package object) has two major shortcomings. First, this method is an obstacle to scalability, because only one thread can access the hash table at a time.
At the same time, this is still insufficient to provide real thread security, and many common hybrid operations still require additional synchronization. Although such
get()
And

put()

Such simple operations can be safely completed without additional synchronization, but there are still some common operation sequences.
For example, iteration or put-if-absent (Put If empty) requires external synchronization to avoid data contention.

Conditional thread security

Synchronized set package
synchronizedMap
And
synchronizedList
Is also called
Conditional thread security
-All
A single operation is thread-safe, but an operation sequence composed of multiple operations may result in data contention, because the control flow in the Operation Sequence depends on the results of the previous operation.
Listing 1
The first section shows the public put-if-absent statement block-if an entry is not
Map
. Unfortunately,
In
containsKey()
Method return
put()

During the call period, another thread may also insert a value with the same key. If you want to ensure that only one insert is performed, you need
Map m
The synchronization block for synchronization wraps this pair of statements.

Listing 1
Other examples are related to iteration. In the first example,
List.size()

The results may become invalid during loop execution, because another thread can delete entries from this list. If the time is not appropriate, one entry is deleted by another thread after the last iteration of the loop.
, Then
List.get()
Returns
null
, And
doSomething()

It is likely to throw
NullPointerException
Exception. So what measures can be taken to avoid this situation? If you are iterating
List

The other thread also
May be accessing this
List
, You must use
synchronized
Block
List
Packed,
In
List
1.
List
. Although this solution solves the problem of data contention, it pays more for concurrency because it locks the entire process during iteration.
List
Other threads are blocked so that they cannot access this list for a long time.

The Collection framework introduces an iterator used to traverse a list or other sets, thus optimizing the process of iteration of elements in a set. However
java.util

The iterator implemented in the Collection class can easily crash, that is, if a thread is
Iterator
When traversing the set, another thread also modifies this
Set, then the following
Iterator.hasNext()
Or
Iterator.next()
The call will throw
ConcurrentModificationException
Exception. Take
In this example
ConcurrentModificationException
Exception, you must
Use

List l
Synchronized on
synchronized
Block
List
To lock the entire
List
. (Alternatively, you can call
List.toArray()
, In
The array is iterated without synchronization, but it is costly if the list is large ).

Listing 1. common competition conditions in the synchronized Map


    Map m = Collections.synchronizedMap(new HashMap());
List l = Collections.synchronizedList(new ArrayList());
// put-if-absent idiom -- contains a race condition
// may require external synchronization
if (!map.containsKey(key))
map.put(key, value);
// ad-hoc iteration -- contains race conditions
// may require external synchronization
for (int i=0; i<list.size(); i++) {
doSomething(list.get(i));
}
// normal iteration -- can throw ConcurrentModificationException
// may require external synchronization
for (Iterator i=list.iterator(); i.hasNext(); ) {
doSomething(i.next());
}

The illusion of trust

synchronizedList
And
synchronizedMap
The conditional thread security also brings a hidden risk.
――
Developers will assume that these sets are all synchronized, so they are all thread-safe, so they will be negligent in correctly synchronizing hybrid operations. The result is that although these programs work normally when the load is low, they will start to throw
NullPointerException

Or
ConcurrentModificationException



Back to Top

Scalability problems

Scalability refers to the performance of an application's throughput when the workload and available processing resources increase. A Scalable program can use more processors, memory, or I/O bandwidth
Handle larger workloads. Locking a shared resource to obtain exclusive access will lead to a scalability bottleneck-it makes other threads unable to access that resource, even if there is an idle processor that can call
Some threads do not help. To achieve scalability, we must eliminate or reduce our dependency on exclusive resource locks.

Synchronous collection wrapper and earlier
Hashtable
And
Vector
Class brings a bigger problem: they are in a single lock
. This means that only one thread can access the set at a time. If one thread is reading one
Map
, Then all others want to read or write this
Map
The thread must wait. The most common
Map
Operation,
get()

And
put()
, May be more processing than on the surface-when traversing the bucket of a hash table to find a specific key,
get()
Must be called for a large number of candidate buckets
Object.equals()
. If
hashCode()
Functions cannot evenly distribute values in the entire hash table, or there are a large number of hash conflicts. Therefore, some bucket chains are much longer than other chains, traversing a long Hash Chain and calling a certain percentage of elements on the hash chain

equals()
It is a very slow task. Under the preceding conditions
get()
And
put()

The high cost is not only the slow access process, but when a thread is traversing the hash chain, all other threads are locked out and cannot access this
Map
.

(A hash table stores objects in a bucket based on a key called hash. Hash Value is a number calculated from the value of an object. Each hash
Value creates a new bucket. To search for an object, you only need to calculate the hash value of the object and search for the corresponding bucket. By quickly finding the corresponding bucket, you can reduce the number of objects to be searched. Note by the translator)

get()
Execution may take a lot of time, and in some cases, the conditional thread security issue that has been discussed earlier will make it much worse.
Listing 1
The contention condition demonstrated in is often used to keep the lock for a single set for a long time after a single operation is completed. If you want to keep the lock on the set during the entire iteration, other threads will stay out of the lock for a long time and wait for unlocking.

Instance: A simple Cache

Map
One of the most common applications in server applications is to implement
cache。
Server applications may need to cache file content, generated pages, database query results, Dom Trees Related to parsed XML files, and many other types of data. The main purpose of cache is to reuse the results obtained from the previous processing.
To reduce service time and increase throughput. A typical feature of the cache workload is that the retrieval is much larger than the update, so (ideally) the cache can provide excellent
get()
Performance. However
The cache that hinders performance is not as good as the cache.

If you use
synchronizedMap
To implement a cache, you introduce a potential scalability bottleneck in your application. Because only one thread can access
Map
, This
Some threads include
Map
The thread that extracts a value from and
(key,
value)

Insert a thread into the map.

Reduce lock Granularity

Improve
HashMap
Concurrency also provides thread security. One way is to abolish the use of a lock on the entire table, while the use of a lock on each bucket of the hash table (or, more often, a lock pool is used, and each lock is responsible for protecting several buckets)
. This means that multiple threads can access one
Map
Instead of competing for a single set range lock. This method can directly improve the scalability of insert, search, and remove operations. Unfortunately, this concurrency is at a certain cost-This makes the entire
Methods for performing operations on a set (for example

size()
Or
isEmpty()
) Implementation is more difficult, because these methods require a lot of locks at a time, and there is a risk of returning incorrect results. However, in some cases, such as implementing cache, this is a good compromise-Because retrieval and insertion operations are frequent

size()
And
isEmpty()
The operation is much less.



Back to Top

Concurrenthashmap

util.concurrent
Package
ConcurrentHashMap
Class (also appears in JDK
In December 1.5
java.util.concurrent
Package) is
Map
Than
synchronizedMap
It provides much better concurrency. Multiple read operations can almost always be executed concurrently, while simultaneous read and write operations can also be executed concurrently, concurrent write operations can still be performed concurrently from time to time (related classes also provide the concurrency of multiple similar read threads, but only one active write thread is allowed)
。ConcurrentHashMap
Designed to Optimize search operations; in fact, successful

get()
After the operation is complete, there are usually no locked resources. It takes some skill to obtain thread security without using the lock, and the Java Memory Model (Java
Memory Model.
ConcurrentHashMap
Implementation, add
util.concurrent
Other parts of the package have been taken into consideration by concurrent experts who study correctness and thread security. In the next month's article, we will look
ConcurrentHashMap
Implementation Details.

ConcurrentHashMap
A higher concurrency is achieved by slightly easing its commitment to the caller. The search operation can return the value inserted by the most recently completed insert operation, you can also return the value that is added by the concurrent insert operation at the pace (but will never return a meaningless result ). By
ConcurrentHashMap.iterator()
Returned
Iterators
Returns at most one element at a time and never throws
ConcurrentModificationException
But it may not reflect the insert or remove operations after the iterator is built. In
When a set is iterated, the thread security can be provided without the table range lock. In any application that does not rely on locking the entire table to prevent updates, you can use
ConcurrentHashMap
To replace
synchronizedMap
Or
Hashtable
.

The above improvements enable
ConcurrentHashMap
Ability to provide Ratio
Hashtable
High scalability, and there is no need to lose efficiency for many types of public cases (such as shared cache.

How much better?

Table 1
Hashtable
And
ConcurrentHashMap
The scalability is roughly compared. During each running process,
N
Threads concurrently execute an endless loop in which these threads
Hashtable
Or
ConcurrentHashMap
To retrieve random keys.
Value.
put()
There is a 80% retrieval failure rate during operations, and a 1% retrieval success rate during operations. The test platform is a dual-processor Xeon system, and the operating system is Linux. The data shows the running time of 10,000,000 iterations in milliseconds.
Concurrenthashmap
Operations are standardized into one thread for statistics. As you can see, when the number of threads increases,
ConcurrentHashMap
Performance continues to rise, while
Hashtable
The performance immediately drops with the emergence of contention locks.

Compared to common server applications, the number of threads in this test seems a little small. However, because every thread is constantly operating on the table, this is basically the same as the contention of more threads using the table in the actual environment.

Table 1. Comparison of hashtable and concurrenthashmap in terms of scalability

 

Number of threads Concurrenthashmap Hashtable
1 1.00 1.03
2 2.59 32.40
4 5.58 78.23
8 13.21 163.48
16 27.58 341.21
32 57.27 778.41



Back to Top

Copyonwritearraylist

In concurrent applications that traverse operations much more than insert or remove operations
CopyOnWriteArrayList
Class substitution
ArrayList
. If it is used to store a listener list, such as in an AWT or swing application or in a common JavaBean, this situation is very common (related
CopyOnWriteArraySet
Use one
CopyOnWriteArrayList
To achieve
Set
Interface)
.

If you are using a common
ArrayList
To store a listener list, as long as the list is variable and may be accessed by multiple threads, you
You must either lock the entire list during the iteration or during the clone operation before the iteration. Both of these methods have high overhead. When you perform a list operation that will change the list,
CopyOnWriteArrayList
Instead of creating a new copy for the List, its iterator will certainly be able to return the list status when the iterator is created without throwing
ConcurrentModificationException
. You do not need to clone the list or lock the list during iteration before performing an iteration on the list.
Set the list, because the list copy displayed by the iterator remains unchanged. In other words,
CopyOnWriteArrayList
Contains a mutable reference to an immutable array. Therefore, as long as the reference is retained, you can get the benefits of immutable thread security without locking.
Set the list.



Back to Top

Conclusion

Synchronous collection class
Hashtable
And
Vector
And the synchronized package class.

Collections.synchronizedMap
And
Collections.synchronizedList
, Is
Map

And
List
Provides basic conditional thread security implementation. However, some factors make them unsuitable for applications with high concurrency-their
The single lock feature of the Set range is an obstacle to scalability. In many cases, a set must be locked for a long period of time to prevent
ConcurrentModificationException
S exception.

ConcurrentHashMap
And
CopyOnWriteArrayList
The implementation provides higher concurrency and thread security, but it only provides a discount on the caller's promise.
ConcurrentHashMap

And
CopyOnWriteArrayList
Not when you use
HashMap
Or
ArrayList
But they are designed to optimize some specific public solutions. Many concurrent applications will benefit from their use.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.