Principle and Implementation of HashMap without locks

Source: Internet
Author: User
Tags rehash

In "vaccine: Dead loop of Java HashMap", we can see that java. util. HashMap cannot be directly applied to multi-threaded environments. HashMap is applicable to the multi-threaded environment in the following ways:

In terms of implementation details, the above methods have more or less used mutex locks. Mutex locks can cause thread blocking, reduce operation efficiency, and lead to a series of problems such as deadlocks and priority flip.

CAS (Compare And Swap) is a feature provided by the underlying hardware that can atomically determine And change a value. Some CAS applications are described in detail in "Implementation of lock-free queues.

Atomic operations in Java

In the java. util. concurrent. atomic package, Java provides us with a lot of convenient atomic types, which are fully based on CAS operations at the underlying layer.

For example, if we want to implement a global public counter, we can:

 
 
  1. privateAtomicInteger counter =newAtomicInteger(3); 
  2.  
  3. publicvoidaddCounter() { 
  4.  
  5.     for(;;) { 
  6.  
  7.         intoldValue = counter.get(); 
  8.  
  9.         intnewValue = oldValue +1; 
  10.  
  11.         if(counter.compareAndSet(oldValue, newValue)) 
  12.  
  13.             return; 
  14.  
  15.     } 
  16.  

The compareAndSet method checks whether the existing counter value is oldValue. If yes, It is set to the new value newValue. If the operation succeeds, true is returned. Otherwise, the operation fails and false is returned.

When calculating the new counter value, compareAndSwap fails if other threads change the counter value. At this point, we only need to add a loop outside and keep trying this process. In the end, the counter value will be successfully + 1. In fact, AtomicInteger has already defined the incrementAndGet and decrementAndGet methods for common + 1/-1 operations. In the future, we only need to call it)

In addition to AtomicInteger, the java. util. concurrent. atomic package also provides the AtomicReference and AtomicReferenceArray types, which represent atomic references and atomic reference array references respectively ).

Implementation of lockless linked list

Before implementing the lockless HashMap, let's take a look at the simple implementation method of the lockless list.

Take the insert operation as an example:

However, in the middle of the operation, it is possible that other threads directly insert some nodes in A and B, assuming D). If we do not make any judgment, other threads may be lost when inserting nodes. As shown in figure 3) we can use the CAS operation to determine whether it still points to B when assigning values to the next pointer of node, if the next pointer of node A changes, retry the entire insert operation. The Code is as follows:

 
 
  1. privatevoidlistInsert(Node head, Node c) { 
  2.  
  3.  
  4.     for(;;) { 
  5.  
  6.  
  7.         Node a = findInsertionPlace(head), b = a.next.get(); 
  8.  
  9.  
  10.         c.next.set(b); 
  11.  
  12.         if(a.next.compareAndSwap(b,c)) 
  13.  
  14.             return; 
  15.     } 

(The next field of the Node class is of the AtomicReference <Node> type, that is, an atomic reference pointing to the Node type)

The search operation of the lockless linked list is no different from that of the normal linked list. In the delete operation, you need to find node A in front of the node to be deleted and Node B in the rear, use the CAS operation to verify and update the next pointer of node A so that it points to Node B.

Difficulties and breakthroughs in non-lock HashMap

HashMap mainly includesInsert,Delete,SearchAndReHashFour basic operations. A typical HashMap implementation uses an array. Each element of the array is a linked list of nodes. For this linked list, we can use the operation method mentioned above to perform insert, delete, and search operations, but it is more difficult for ReHash operations.

4. During the ReHash process, a typical operation is to traverse each node in the old table, calculate its position in the new table, and then move it to the new table. During this period, we need to manipulate the pointer three times:

These three pointer operations must be completed at the same time to ensure the atomicity of the mobile operation. However, it is not hard to see that CAS operations can only guaranteeOneThe value of the variable is verified and updated atomically, which cannot meet the needs of simultaneously verifying and updating the three pointers.

So let's take another idea. Since the operations on a mobile node are so difficult, we can keep all the nodes in order to avoid moving operations. In a typical HashMap implementation, the length of the array is always 2i, while the process of ing from the Hash value to the underlying value of the array, simply execute the modulo operation on the array length, that is, only the post I bit of the Hash binary is retained ). When ReHash is enabled, the length of the array doubles to 2i + 1. each node in the j-entry linked list of the old array is either moved to the j-entry of the new array, either move to the j + 2i entry in the new array, and the only difference between them is that the different I + 1 bits of the Hash value are 0, then they are still j items, (j + 2i ).

5. All nodes are sorted in descending order of Hash values, for example, 1101-> 1011. When the array size is 8, 2 and 18 are in one group; 3, 11, and 27 are in another group. At the beginning of each group, insert a sentinel node to facilitate subsequent operations. In order to correctly rank the sentinel node at the forefront of the group, we change the Hash highest bit of the normal node to the lowest Bit) to 1, and the sentinel node does not set this bit.

When the array is expanded to 16, see Figure 6). The second group is split into a group containing only 3 and a group containing 11 and 27, but the relative sequence between nodes remains unchanged. In this way, we do not need to move nodes during ReHash.

Implementation Details

Because group replication takes a lot of time during expansion, we use the method of dividing the entire array and setting up it in laziness. In this way, when accessing a tag, you only need to determine whether the block where the subscript is located has been created. If not, the subscript is created ).

In addition, the size is defined as the subscript range currently in use, and its initial value is 2. You only need to double the size when resizing the array; define count to indicate that the total number of nodes contained in the current HashMap is not counted as a sentinel node ).

Initially, all items except 0th in the array are null. The first item points to a linked list with only one sentinel node, representing the starting point of the entire chain. In the initial stage, the full picture is shown in Figure 7. The light green shows the range of subscripts that are not currently used, and the dotted arrow shows the logical existence but the blocks that are not actually created.

Initialize subscript operation

Null items in the array are considered to be in the uninitialized state. initializing a subscript means that a corresponding sentinel node is created. The initialization is recursive. That is, if the parent subscript is not initialized, the parent subscript is initialized first. The parent subscript of a sub-object is the subscript obtained after the highest binary bit is removed. The Code is as follows:

 
 
  1. privatevoidinitializeBucket(intbucketIdx) { 
  2.  
  3.     intparentIdx = bucketIdx ^ Integer.highestOneBit(bucketIdx); 
  4.  
  5.     if(getBucket(parentIdx) ==null) 
  6.  
  7.         initializeBucket(parentIdx); 
  8.  
  9.     Node dummy =newNode(); 
  10.  
  11.     dummy.hash = Integer.reverse(bucketIdx); 
  12.  
  13.     dummy.next =newAtomicReference&lt;&gt;(); 
  14.  
  15.     setBucket(bucketIdx, listInsert(getBucket(parentIdx), dummy)); 
  16.  
  17.  

GetBucket is the encapsulated method for obtaining the content of a subscript in the array. The same applies to setBucket. ListInsert will start from the specified position to find the suitable position to insert into the given node. If there are already hash nodes in the linked list, the existing node will be returned; otherwise, the newly inserted node will be returned.

Insert operation
  • First, use the hashCode modulo of the key of the HashMap size pair to obtain the array subscript to be inserted.
  • Then judge whether the subscript is null. If it is null, the subscript is initialized.
  • Construct a new node and insert it to the appropriate position. Note that the hash value in the node should be the value after the original hashCode goes through bit flip and the lowest position 1.
  • Add 1 to the number of nodes counter. If there are too many nodes after adding 1, you only need to change the size to size * 2, which indicates expanding the ReHash of the array ).
Search operation
  • Find the subscript of the node to be searched in the array.
  • Determines whether the subscript is null. If it is null, the query fails.
  • Go to the linked list from the corresponding location and search for the node until you find the node to be searched for or beyond the node range of this group.
Delete operation
  • Find the subscript of the node to be deleted in the array.
  • Determines whether the subscript is null. If it is null, the subscript is initialized.
  • Find the node to be deleted and delete it from the linked list. Note that due to the existence of the sentinel node, any normal element is referenced only by its unique precursor node, and the element is not referenced by both the precursor node and the pointer in the array, so that multiple pointers need to be modified at the same time)
  • Reduce the number of nodes by 1.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.