A set is the most commonly used data structure in programming. When it comes to concurrency, it is almost always inseparable from the support of advanced data structures such as collections. For example, two threads need to access an intermediate critical zone (Queue) at the same time. For example, cache is often used as a copy of external files (hashmap ). This article mainly analyzes concurrenthashmap in the three concurrent set types (concurrent, copyonright, queue) of jdk1.5, so that we can understand them in principle in detail, it benefits us a lot in deep project development. Before tiger, one of the most used data structures is hashmap and hashtable. We all know that synchronization is not considered in hashmap, while hashtable uses synchronized. The direct impact is selectable. We can use hashmap in a single thread to improve efficiency, hashtable is used to ensure the security of multiple threads. When we enjoy the benefits that JDK brings, we also bear the unfortunate consequences it brings. By analyzing hashtable, we can see that synchronized is targeted at the entire hash table, that is, every time the entire table is locked for thread exclusive, security is a huge waste, doug Lee, the distinct eye, immediately came up with the solution-concurrenthashmap. The main difference between concurrenthashmap and hashtable is the granularity of the lock and how to lock it. On the left is the hashtable implementation method-lock the entire hash table; on the right is the concurrenthashmap implementation method-lock bucket (or segment ). Concurrenthashmap divides the hash table into 16 buckets (default value). Common Operations such as get, put, and remove only lock the buckets currently used. Imagine that only one thread can enter, but now 16 write threads can enter at the same time (the write thread needs to be locked, and the read thread is almost unrestricted, which will be mentioned later ), the increase in concurrency is obvious. What's even more surprising is the concurrenthashmap's read concurrency, because it is not used to lock the read operation most of the time, so the read operation is almost completely concurrent, And the write operation lock granularity is very fine, it is faster than before (this is more obvious when there are more buckets ). The entire table must be locked only when you perform operations such as size. During iteration, concurrenthashmap uses another iteration method different from the traditional set's fast failure iterator (see the previous article Java API memo-set, we call it a Weak Consistent iterator. In this iteration mode, when the iterator is created and the Set changes, concurrentmodificationex is no longer thrown. Ception. Instead, the new data does not affect the original data when it is changed. After iterator completes, it replaces the header pointer with the new data, in this way, the iterator thread can use the old data, while the write thread can also complete changes concurrently. More importantly, this ensures the continuity and scalability of concurrent execution of multiple threads, it is the key to performance improvement. Next, let's take a look at several important methods in concurrenthashmap. After knowing the implementation mechanism, we will be more confident in using it. The main entity classes in concurrenthashmap are three: concurrenthashmap (the entire hash table), segment (bucket), and hashentry (node). The relationship between them can be seen from the figure above. Get method (note that the analysis method here is for the bucket, because the biggest improvement of concurrenthashmap is to refine the granularity to the bucket ), first, it determines whether the number of data in the current bucket is 0. If it is 0, it is impossible to get anything. Only null is returned. This avoids unnecessary searches and the minimum cost to avoid errors. After obtaining the header node (the method will be involved below), it will judge whether it is a specified value based on the hash and key one by one. If it is and the value is not empty, it indicates it is found and a direct return is returned; the program is very simple, but there is a confusing point. What is the return readvalueunderlock (e) used? Study its code and return a value after locking. But here we already have a V v = E. Value to get the value of the node. Is this return readvalueunderlock (e) An alternative? In fact, this is entirely for concurrency consideration. When V is empty, it may be that a thread is changing nodes, but previous get operations are not locked. According to the Bernstein condition, data inconsistency may occur after reading or writing. Therefore, you need to lock this E and read it again to ensure that the correct value is obtained, here we have to admire Doug Lee's rigor. The entire get operation will only be locked in rare cases. Compared with the previous hashtable, concurrency is inevitable!
V get (Object key, int hash ){ If (count! = 0) {// read-volatile Hashentry E = getfirst (hash ); While (E! = NULL ){ If (E. Hash = hash & Key. Equals (E. Key )){ V v = E. value; If (V! = NULL) Return V; Return readvalueunderlock (E); // recheck } E = E. Next; } } Return NULL; |