Reprinted from http://blog.csdn.net/liuzhengkang/article/details/2916620
Collections are the most commonly used data structures in programming. When it comes to concurrency, it is almost always supported by a collection of such advanced data structures. For example, two threads need to access an intermediate critical section (Queue) at the same time, such as caching as a copy of an external file (HASHMAP). This article focuses on the concurrenthashmap of the 3 concurrent collection types (concurrent,copyonright,queue) in jdk1.5, allowing us to understand them in a detailed and theoretical way, allowing us to benefit from deep project development. before Tiger, one of the data structures we used the most was HashMap and Hashtable. As we all know, HashMap is not synchronized, while Hashtable uses synchronized, the direct impact is optional, we can use the HashMap in single-threaded to improve efficiency, and multi-threading with Hashtable to ensure security. when we enjoy the convenience of the JDK, it also bears the unfortunate consequences. Through analysis Hashtable know, synchronized is for the whole hash table, that is, each time lock the whole table to let the thread monopoly, security behind is a huge waste, the eye unique Doug Lee immediately took out the solution----concurrenthashmap. The main difference between concurrenthashmap and Hashtable is the granularity of the lock and how it is locked. on the left is the way Hashtable is implemented---Lock the entire hash table, while the right side is the CONCURRENTHASHMAP implementation---lock bucket (or segment). Concurrenthashmap divides the hash table into 16 buckets (the default), and common operations such as Get,put,remove lock only the buckets that are currently needed. Imagine that the original can only one thread into, now can simultaneously 16 write threads (write threads need to lock, and read threads are almost unrestricted, then mentioned), concurrency is obvious. even more surprising is the Concurrenthashmap read concurrency, because most of the time the read is not used to lock, so the read operation is almost complete concurrent operation, and write operation lock is very granular granularity, Faster than before (this is more apparent in buckets). You need to lock the entire table only if you are working with size. And inIterations, Concurrenthashmap uses a different iteration of the fast-failing iterator that differs from the traditional collection (see the previous article, "JAVA API Memo---collection"), which we call a weakly consistent iterator. In this iterative approach, when iterator is created, the collection is no longer thrown concurrentmodificationexception and replaced by new data at the time of the change without affecting the original data. The iterator is completed and then replaced with the new data, so that the iterator thread can use the original old data, and the write thread can also be the concurrent completion of the change, more importantly, this ensures that multiple threads concurrent execution continuity and extensibility, is the key to performance improvement. Next, let's take a look at some of the important methods in Concurrenthashmap, knowing that the implementation mechanism is more useful when used. The main entity class in concurrenthashmap is three: Concurrenthashmap (whole hash table), Segment (bucket), Hashentry (node), The relationship between the above figure can be seen. get method (Note that the method here is for the bucket, because the biggest improvement of CONCURRENTHASHMAP is to refine the granularity to the bucket), first determine whether the current bucket of data number is 0, For 0 Naturally it is not possible to get to what, only return null, this avoids unnecessary searches and avoids errors with minimal cost. Then get the head node (the method will be covered below) is based on the hash and key to determine whether it is the specified value, if it is and the value is not NULL to find, direct return; The program is very simple, but there is a confusing place, this return Readvalueunderlock (E What is it used for? Study its code and return a value after it is locked. But there's already a V V = e.value Gets the value of the node, is this return Readvalueunderlock (e) superfluous? In fact, this is entirely for the sake of concurrency, where V is empty, it is possible that a thread is changing the node, and the previous get operation is not locked, according to the Bernstein condition, read write or write after reading will cause inconsistent data, so here to re-lock this E to read again, To ensure that the correct value is obtained, it is obliged to admire Doug Lee's rigor of thinking. The entire get operation is locked only in very few cases, relative to the previous hasHtable, concurrency is inevitable ah!
V Get (Object key, int hash) { if (count! = 0) {//Read-volatile Hashentry e = GetFirst (hash); while (E! = null) { if (E.hash = = Hash && key.equals (E.key)) { V v = e.value; if (v! = null) return v; Return Readvalueunderlock (e); Recheck } e = E.next; } } return null; } |
V Readvalueunderlock (Hashentry e) { Lock (); try { return e.value; } finally { Unlock (); } } |
Put operation to lock up the entire segment, which is of course for concurrency security, modify data can not be carried out concurrently, must have a judge whether the limit of the statement to ensure that capacity is insufficient to be able to rehash, and more difficult to understand is this sentence int index = hash & ( TAB.LENGTH-1), the original segment inside is the real Hashtable, that is, each segment is a traditional sense of hashtable, such as, from the structure of the two can see the difference, Here is to find out where the entry in the table, and then get the entry is the first node of the chain, if e!=null, the description found, this is to replace the value of the node (onlyifabsent = = false), otherwise, We need a new entry, its successor is first, and let Tab[index] point to it, what does it mean? The fact is that the new entry is inserted into the chain head, and the rest is very easy to understand.
V Put (K key, int hash, V value, Boolean onlyifabsent) { Lock (); try { int c = count; if (c + + > Threshold)//ensure capacity Rehash (); hashentry[] tab = table; int index = hash & (tab.length-1); Hashentry first = (hashentry) Tab[index]; Hashentry e = first; while (E! = null && (E.hash! = Hash | |!key.equals (E.KEY))) e = E.next; V OldValue; if (E! = null) { OldValue = E.value; if (!onlyifabsent) E.value = value; } else { OldValue = null; ++modcount; Tab[index] = new Hashentry (key, hash, first, value); Count = C; Write-volatile } return oldValue; } finally { Unlock (); } } |
The remove operation is very similar to put, but be aware of the difference, what is the middle for loop for? (* mark) from the code point of view, is to locate all the entry after cloning and spell back to the front, but it is necessary? Each time you delete an element, you clone the previous element again? This is actually determined by the invariance of the entry, looking closely at the entry definition, and discovering that all other attributes except value are decorated with final, which means that after the next field is set for the first time, it can no longer be changed, instead of cloning all its previous nodes. As to why entry should be set to invariance, this is not necessary to synchronize the access of immutability to save time, for more information on immutability, see the previous article, "Some programming tips for thread advanced---threading"
V remove (object key, int hash, object value) { Lock (); try { int c = count-1; hashentry[] tab = table; int index = hash & (tab.length-1); Hashentry first = (hashentry) Tab[index]; Hashentry e = first; while (E! = null && (E.hash! = Hash | |!key.equals (E.KEY))) e = E.next; V oldValue = null; if (E! = null) { V v = e.value; if (value = = NULL | | value.equals (v)) { OldValue = v; All entries following removed node can stay In list, but all preceding ones need to be Cloned. ++modcount; Hashentry Newfirst = E.next; * for (hashentry p = first; P! = e; p = p.next) * Newfirst = new Hashentry (P.key, P.hash, Newfirst, P.value); Tab[index] = Newfirst; Count = C; Write-volatile } } return oldValue; } finally { Unlock (); } } |
static final class Hashentry { final K Key; final int hash; volatile V value; final hashentry next; hashentry (K key, int hash, Hashentry Next, V value) { this.key = key ; this.hash = hash; this.next = next; this.value = value; } } |
Above, the analysis of a few of the simplest operation, confined to space, here no longer to rehash or iterator and other implementation of the discussion, interested can refer to SRC.
Then there is actually a question about how concurrenthashmap and HashMap compare performance. This has been measured http://www.ibm.com/developerworks/cn/java/j-jtp07233/in Brian Goetz's article .
Reproduced Concurrenthashmap principle Analysis