Java Theory and Practice: building a better HashMap

Last Update:2017-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Concurrenthashmap is part of Doug Lea's Util.concurrent package, which offers a higher degree of concurrency than Hashtable or synchronizedmap. Also, for most successful get () operations, it tries to avoid a complete lock, and the result is a very good throughput for concurrent applications. This month, Briangoetz carefully analyzed the code of the Concurrenthashmap and explored how Doug Lea achieved such a remarkable result without losing thread safety.

In the July issue of Java Theory and Practice ("Concurrent Collections Classes"), we simply reviewed the bottlenecks in scalability and discussed how to achieve higher concurrency and throughput with shared data structures. Sometimes the best way to learn is to analyze the results of an expert, so this month we will analyze the implementation of CONCURRENTHASHMAP in Doug Lea's Util.concurrent package. JSR 133 will specify a version of Concurrenthashmap that is optimized for the Java memory model (JMM) and will be included in JDK 1.5 's java.util.concurrent package. The version in Util.concurrent has passed thread-safety audits in both the old and new memory models.

Optimized for throughput

Concurrenthashmap uses several techniques to achieve high levels of concurrency and to avoid locking, including using multiple write locks for different hash bucket (buckets) and using jmm uncertainties to minimize the time that locks are maintained-or to avoid acquiring locks at all. It is optimized for most general usages, and these usages tend to retrieve a value that is likely to already exist in the map. In fact, most successful get () operations do not require any locks to run at all. (Warning: Don't try to do this yourself!) To be smarter than JMM is not as easy as it looks. The Util.concurrent class is written by concurrent experts and is subject to rigorous peer review in JMM security. ）

Multiple write locks

We can recall that the main obstacle to the scalability of Hashtable (or alternative collections.synchronizedmap) is that it uses a map range (Map-wide) lock, in order to ensure insertion, The integrity of the delete or retrieve operation must maintain such a lock, and sometimes even to ensure the integrity of the iterative traversal operation. Thus, as long as the lock is maintained, the other threads are fundamentally blocked from accessing the MAP, which limits concurrency even if the processor is idle and inaccessible.

Concurrenthashmap rejects a single map-wide lock and replaces it with a collection of 32 locks, each of which is responsible for protecting a subset of the hash bucket. Locks are used primarily by variable operations (put () and remove ()). Having 32 separate locks means that up to 32 threads can modify the map at the same time. This does not necessarily mean that the number of threads that write the map concurrently is less than 32 o'clock, and that the other writes will not be blocked ――32 is a theoretical number of concurrent restrictions for write threads, but may not actually be able to achieve this value. However, the 32 is still much better than 1 and is sufficient for most applications running on the computer system of the current generation. & #160

Map-scoped operations

There are 32 separate locks, each of which protects a subset of the hash bucket, so that all 32 locks must be obtained for exclusive access to the map. Some map-scoped operations, such as size () and IsEmpty (), may be able to not lock the entire map at once (by appropriately qualifying the semantics of these operations), but some operations, such as map rearrangement (enlarging the number of hash bucket, redistribution elements as the map grows), Exclusive access must be guaranteed. The Java language does not provide an easy way to get a variable size lock collection. It is very rare to do so, and when this happens, recursive methods can be used to achieve it.

JMM Overview

Before entering the implementation of put (), get () and remove (), let's take a quick look at JMM. JMM governs how a thread's action (read and write) on memory affects the way other threads Act on memory. The Java Language Specification (JLS) allows some memory operations to be immediately visible to all other threads because of the increased performance of memory access by using processor registers and preprocessing cache. There are two language mechanisms that can be used to ensure consistency ――synchronized and volatile across thread memory operations.

According to JLS, "in the absence of explicit synchronization, an implementation is free to update main memory, and the order taken in the update may be unexpected." "It means that, if there is no synchronization, some sort of write operation in a given thread may render a different order for another different thread, and the time that an update of a memory variable is propagated from one thread to another is unpredictable."

Although the most common reason for using synchronization is to guarantee atomic access to key parts of the code, there are actually three separate functions-atomicity, visibility, and order-that are synchronized. Atomicity is very simple-synchronizing a reentrant (reentrant) mutex to prevent more than one thread from executing a block of code protected by a given monitor at the same time. Unfortunately, most articles focus on atomicity only, ignoring other aspects. But synchronization also plays an important role in JMM, causing the JVM to execute memory barriers (memory barrier) when it obtains and releases the monitor.

After a thread obtains a monitor, it performs a read barrier-which invalidates all the variables in the cached thread-local memory (such as the processor cache or processor registers), which causes the processor to reread the variables used by the synchronized code block from main memory. Similarly, when you release the monitor, the thread executes a write barrier (write barrier), which writes all the modified variables back to main memory. The combination of mutexes and memory barriers means that as long as you follow the correct synchronization rules in programming (that is, you use synchronization whenever you write a variable that may be accessed later by another thread, or when you read a variable that might eventually be modified by another thread), each The thread will get the correct value for the shared variable it uses.

If there is no synchronization when accessing a shared variable, something strange can happen. Some changes can be reflected immediately through threads, while others take some time (this is due to the nature of the associated cache). As a result, without synchronization you cannot guarantee that the contents of the memory are consistent (the related variables may be inconsistent with each other) or that the current memory content (some values may be obsolete) is not guaranteed. The common way to avoid this danger (and also the recommended method) is, of course, to use synchronization correctly. However, in some cases, such as in some of the most widely used library classes like Concurrenthashmap, additional expertise and effort may be required in the development process (probably many times more than normal development) to achieve higher performance.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More