In layman's Java Concurrency (17): Concurrent Container Part 2 Concurrentmap (2) [Go]

Source: Internet
Author: User
Tags rehash

Originally wanted to be more comprehensive and in-depth talk about Concurrenthashmap, found online there are a lot of hashmap and concurrenthashmap analysis of the article, so this section as far as possible to analyze the details, less theoretical things, more talk about the internal design principles and ideas.

To talk about the structure of concurrenthashmap, we have to talk about the structure of hashmap, so we will start with HashMap briefly.

HashMap principle

Let's start from the beginning. To store objects together, how to design this container. At present there are only two ways to go, one is to use a sub-grid technology, each object stored in a lattice, so that the number of the lattice can be taken or traversed objects, the other technique is in a series of ways, the individual objects in series, which requires the individual objects with at least the next object index (or pointer). Obviously the first is the concept of arrays, the second is the concept of linked lists. The implementation of all containers is based on both of these, whether it is an array or a linked list, or both. HashMap is the way the array is used.

The following two conditions are required to complete the conditions required for the map when you have a container to access the object.

    • The ability to quickly locate elements: the need for a map is to be able to quickly get the desired result based on a query condition, so this process needs to be as fast as possible.
    • Ability to automatically expand capacity: Obviously for the container, it is not necessary to manually control the capacity of the container is the best, so that for external users less know the bottom details better, not only easy to use, but also the more secure.

First condition 1, quickly locate the element. The fast locating element belongs to the category of algorithm and data structure, and the hashing (hash) algorithm is usually a simple and feasible algorithm. The so-called hashing algorithm is a small binary value that maps a binary value of any length to a fixed length. The common md2,md4,md5,sha-1 and so on are all belong to the hash algorithm category. The specific algorithm principle and the introduction can refer to the corresponding algorithm and the data structure the book, but here specially reminds, because will a large collection to map to a small collection, therefore must have several elements to map to the same element the result, this is called "The collision", later will use this knowledge, is not the table.

Condition 2, if condition 1 is met, an element is mapped to a location, and now once the capacity is expanded, it means that the location of the element map needs to change. Because for the hash algorithm, adjust the small set of mappings, then the original map of the path will definitely no longer exist, then you need to re-calculate the existing mapping path, known as the rehash process.

Well, with the above theoretical knowledge, we see how hashmap is realized.

In HashMap, it is unavoidable that an array of objects is first, and the modifier transient simply means that the serial number is not stored. Size describes the sizes of the elements in the map, threshold describes the need to expand after reaching the specified number of elements, Loadfactor is the expansion factor (loadfactor>0), which is calculated threshold. Then the capacity of the element is Table.length, which is the size of the array. In other words, if the size of the accessed element reaches the Loadfactor times (i.e., table.length*loadfactor) of the entire capacity (Table.length), then the capacity needs to be expanded. Each expansion in the HASHMAP will enlarge the array one time, making the array size twice times the original.

Then see how to map an element to the array table. Obviously the key to be mapped is an endless large collection, and table is a smaller finite set, so one way is to map the key encoded Hashcode value to the table, which looks good. But in Java, a more efficient approach is adopted. Since the and (&) is more efficient than modulus (%), Java uses hash values and array size-1 to determine the array index. Why is this more effective? Reference 7 A very detailed analysis of this piece, the author of this article is very serious, but also very careful analysis of the ideas contained inside.

Listing 1 Indexfor Fragment

static int indexfor (int h, int length) {
Return H & (LENGTH-1);
}

As explained earlier, since large sets are mapped to small sets, there is a certain "collision", which means that different keys are mapped to the same elements. So how did HashMap solve the problem?

The following approach has been used in HashMap to resolve this issue.

    1. Array elements of the same index make up a list of the elements that are required to find the loop linked list when allowed.
    2. Distribute the elements evenly across the array as much as possible.

A data structure used for problem 1,hashmap. Each element in the table is a map.entry, where Entry contains four data, Key,value,hash,next. Key and value are stored data; Hash is the expression of the hash of the element key (which is ultimately mapped to the array), where the hash of all the elements on the linked list will be given the same array index after the indexfor of listing 1; Next is the index of the next element. The elements on the same list are connected by next.

Let's look at question 2 as much as possible to distribute the elements evenly across the array how the problem is solved. First of all, listing 2 is a series of transformations that hashcode the key to make it more consistent with the hash model of the small data set.

Listing 2 Hashcode two hashes

static int hash (int h) {
This function ensures, hashcodes that differ
Constant multiples at each bit position has a bounded
Number of collisions (approximately 8 at default load factor).
H ^= (H >>>) ^ (h >>> 12);
Return h ^ (H >>> 7) ^ (H >>> 4);
}

As for listing 2, I didn't find the basis for such a hash, and I don't have any good references. Reference 1 analyzes this process and considers it to be a more effective way to be interested in the study.

The 2nd is in the description of Listing 1, as far as possible with the length of the array minus 1 of the number and operation to make it evenly distributed. This is described in reference 7.

The 3rd is to construct the array when the length of the array is a multiple of 2. Listing 3 reflects this process. Why a multiple of 2? In reference 7, it was analyzed to make the elements as evenly distributed as possible.

Listing 3 HashMap Constructing an array

Find a power of 2 >= initialcapacity
int capacity = 1;
while (Capacity < initialcapacity)
Capacity <<= 1;

This.loadfactor = Loadfactor;
threshold = (int) (capacity * loadfactor);
Table = new Entry[capacity];

In addition loadfactor default value 0.75 and capacity default value 16 is a large number of statistical analysis, a long time ago I have seen the relevant data analysis, now can not find, interested in the relevant information. There is no longer a narrative here.

With the above principles, it is not a problem to analyze the various methods of HashMap.

Listing 4 get operations for HashMap

Public V get (Object key) {
    if (key = = null)
     & nbsp;  return Getfornullkey ();
    int hash = hash (Key.hashcode ());
    for (entry<k,v> e = table[indexfor (hash, table.length)];
         E! = null;
         e = e.next) {
        Object K;
        if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))
&nb sp;           return e.value;
   }
    return null;
}

Listing 4 describes the get operation of HashMap, which first determines whether the key is empty, because NULL is always mapped to the No. 0 element of the table (see Listing 2 and Listing 1 above). Then you need to find the index of the table. Once the corresponding map.entry element is found, it begins to traverse the linked list. Since different hashes may be mapped to the same table[index], while the same key is mapped to the same hash, a key and entry corresponding to the condition is hash (key) ==e.hash and Key.equals (E.key). From here we see that Object.hashcode () is just to map the same elements to the same list (Map.entry), and Object.Equals () is the key to comparing two elements! This is why it is always possible to overwrite hashcode () and Equals ().

Listing 5 put operations for HashMap

Public V put (K key, V value) {
    if (key = = null)
        RE Turn putfornullkey (value);
    int hash = hash (Key.hashcode ());
    int i = indexfor (hash, table.length);
    for (entry<k,v> e = table[i]; E! = null; e = e.next) {
       ;  Object K;
        if (E.hash = = Hash && ((k = e.key) = = Key | | key.equals (k))) {
& nbsp;           V oldValue = e.value;
            e.value = value;
            e.recordaccess (this);
            return oldValue;
       }
   }

modcount++;
AddEntry (hash, key, value, I);
return null;
}
void AddEntry (int hash, K key, V value, int bucketindex) {
Entry<k,v> e = Table[bucketindex];
Table[bucketindex] = new entry<k,v> (hash, key, value, E);
if (size++ >= threshold)
Resize (2 * table.length);
}

Listing 5 describes the put operation of HashMap. Compared to the get operation, it can be found that the put is actually the first lookup, once the key corresponding to the entry to modify the value of entry directly, or add an element. The added element is the head of the list, which is the element that occupies the table, and if the corresponding index in the table has elements, the entire list is added behind the newly added element. This means that the newly added element is better than the element on the same linked list that was added before it. Here is the expansion, that is, once the number of elements reached the number of expansion factor (Threhold=table.length*loadfactor), the array is enlarged one times.

Listing 6 HashMap Expansion process

void Resize (int newcapacity) {
entry[] oldtable = table;
int oldcapacity = Oldtable.length;
if (oldcapacity = = maximum_capacity) {
threshold = Integer.max_value;
Return
}

entry[] newtable = new Entry[newcapacity];
Transfer (newtable);
Table = newtable;
threshold = (int) (newcapacity * loadfactor);
}

void Transfer (entry[] newtable) {
entry[] src = table;
int newcapacity = Newtable.length;
for (int j = 0; J < Src.length; J + +) {
Entry<k,v> e = src[j];
if (E! = null) {
SRC[J] = null;
do {
Entry<k,v> next = E.next;
int i = indexfor (E.hash, newcapacity);
E.next = Newtable[i];
Newtable[i] = e;
e = next;
} while (E! = null);
}
}
}

Listing 6 describes the process of HashMap expansion. You can see that the expansion process causes all elements of the element data to be re-hashed, and this process is called rehash. Obviously this is a very time-consuming process, otherwise the expansion will cause all elements to recalculate the hash. Therefore, choosing the appropriate initialization size as much as possible is the key to effectively improve the hashmap efficiency. Too big will lead to too much wasted space, too small can lead to heavy rehash process. Loadfactor can also be considered in this process.

For example, if you want to store 1000 elements, using the default expansion factor of 0.75, then 1024 is obviously not enough, because 1000>0.75*1024, so the choice of 2048 is necessary, obviously wasted 1048 of space. If you determine a maximum of 1000 elements, then the expansion factor is 1, then 1024 is a good choice. It is also important to emphasize that the larger the expansion, from a statistical point of view, it means that the length of the list is also large, that is, when looking for elements need more cycles. So everything must be a balanced process.

There may be a problem here, once I expand the capacity of map (that is, the size of the array), can this capacity be reduced? For example, there may be 10,000 elements in the first map, and once the map size will never exceed 10 after the run time, can the map's capacity be reduced to 10 or 16? The answer is no, this capacity once enlarged can not be reduced, only by constructing a new map to control the capacity.

HashMap's several internal iterators are also very important, and the space is limited to no longer unfold, interested can be studied by themselves.

The principle of Hashtable is almost the same as the principle of hashmap, so it is not discussed. In addition Linkedhashmap is on the basis of Map.entry added Before/after two two-way index, used to connect all the Map.entry, so you can traverse or do the LRU cache and so on. There is no longer a discussion here.

memcached internal data structure is the use of hashmap similar ideas to achieve, interested can be reference 8,9,10.

In order not to make this article too long, so put the principle of concurrenthashmap in the next chapter. It is necessary to note that, although the name of Concurrenthashmap and HashMap some source, and the implementation of the principle is somewhat similar, but in order to better support concurrency, concurrenthashmap in the interior there are some relatively large adjustments, this in the next article will be described in detail.

Resources:

    1. Analysis of HashMap hash method
    2. Research on Hash storage mechanism by analyzing JDK source code
    3. Java Theory and Practice: hashing
    4. Java Theory and Practice: building a better HashMap
    5. jdk1.6 Concurrenthashmap
    6. The implementation details of Concurrenthashmap
    7. Deep understanding of HashMap
    8. memcached-Data structure
    9. memcached Storage Management Data structure
    10. Memcached

In layman's Java Concurrency (17): Concurrent Container Part 2 Concurrentmap (2) [Go]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.