Introduction to the internal implementation of HashMap and the implementation of HashMap

Source: Internet
Author: User

Introduction to the internal implementation of HashMap and the implementation of HashMap

Weigh Time and Space

 

HashMap stores data in key-value pairs.

 

If there is no memory limit, I will directly use the hash Map key as the index of the array, and then press the index to get it, but the land price is so expensive, where are there unrestricted sites.

 

If there is no time limit, I can put the data in an unordered array and search in order. Sooner or later, I can find the data. But time is money, and time is so short, who can wait.

 

Therefore, HashMap implements a compromise policy and balances the values with appropriate time and space. This can be attributed to the "linked list hash method", which is a classic method for dealing with conflicts in hash tables.

  Linked List hash

 

So what is the "linked list hash? See:

 

 

 

Vertical is an array, each of which is a linked list. You can think of this array as N buckets, each with a chain.

 

What is an array? Each item of the array is stored in the linked list.

 

What is linked list? Responsible for storing Map data. For example, a HashMap has two keys: key1 and key2. Then the linked list will separate two nodes to store the two key-value pairs respectively (each key-value pair is packaged in the Entry object ).

 

How do linked lists link up? The Entry contains key, value, next node next, and hash value, and each node is stringed up by this next operation.

 

The HashMap data storage process is: Calculate the hash value of the key-value pair to be saved (which determines the bucket in which the current key-value pair is to be saved ), find the corresponding bucket based on the hash value. If there is no data in the bucket, put it directly. If data is already stored in the bucket (that is, one or more key-value pairs are placed in the bucket chain), compare them one by one following the chain in the bucket, check whether the key is the same as the key of the data to be saved. If the values are the same, the values of the original key are overwritten. If there are not the same elements, save the elements in the chain header (the earliest elements will run to the end of the chain ).

  Filling factor

 

The number of buckets determines how many hashmaps can be placed, and the specific number of buckets is directly related to the search efficiency. For example, if you go to the next-door class to find James, there are 10 people in the class, and you will soon find James. There are 100 people in the class, and you may find them for half a day. So you can see that the HashMap constructor is like this:

 

public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);        this.loadFactor = loadFactor;        threshold = initialCapacity;        init();    }public HashMap(int initialCapacity) {        this(initialCapacity, DEFAULT_LOAD_FACTOR);    }public HashMap() {        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);}

 

The three constructors affect two things: initialCapacity and loadFactor. The former indicates the initial number of buckets (that is, the array size), and the latter indicates the filling factor ", the fill factor is a scale in which a hash table can reach full capacity before its capacity increases automatically. For example, the initial size of an array is 100. If the filling factor is 0.6, it indicates that after 60 maps are stored in the array, the array must be resized before it can be stored. This is to solve the efficiency problem mentioned above.

 

The loading factor is small, so it is faster to search for data, but it is a waste of space. When the filling factor is large, the space utilization is high, but it is a waste of time. Life is like this. It is inevitable to lose one's foot. How can everything be done. After weighing the advantages and disadvantages of the system, the default filling factor is 0.75, which generally does not need to be changed.

  Remainder Division

 

There is another problem. How can I decide which bucket to store the hash value of a Map? If the Map data in the last array is crowded together, the query will be slow. It's too loose, and it's a waste of space. Java uses a trick to remove the residual elements to ensure even distribution of data in the array.

 

"Except for residual memory" is the modulo operation. For example, if the length of an array is 100, the hash value of Map is 80, and the length is 80%, and the remainder is 80, put it at the position of 80. But Java is not that computation, and the source code is as follows:

 

void addEntry(int hash, K key, V value, int bucketIndex) {        if ((size >= threshold) && (null != table[bucketIndex])) {            resize(2 * table.length);            hash = (null != key) ? hash(key) : 0;            bucketIndex = indexFor(hash, table.length);        }        createEntry(hash, key, value, bucketIndex);}

 

The above code is the method for adding Entry Data in HashMap. BucketIndex is the index of the current Map in the array. The third line is not about resizing, but focuses on the indexFor method. This method is "modulo ". Let's click here to see:

 

static int indexFor(int h, int length) {// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";        return h & (length-1);}

 

H is the hash value of Map, and length is the length of the array. It directly uses an h & (length-1 ). In fact, this sentence is equivalent to modulo the array, but the binary operation is much faster than the mathematical calculation. This also inspired our programmers to use bitwise operations as much as possible to increase the force and efficiency.

  Uniform Distribution

 

Another interesting thing is that the comment in the above Code is: length must be a non-zero power of 2, which means that the length of the array must be the Npower of 2.

 

Why is it the Npower of 2?

 

If it is not the Npower of 2, for example, the length is 15, h is 2, 3, 4, respectively. H & (length-1:

 

H

Length-1

H & (length-1)

0010

1110

0010, that is, 2

0011

1110

0010, that is, 2

0100

1110

0100, that is, 4

 

You see, a collision occurs when you test three numbers. Why?

 

This is because: if it is not the power of N2. then the bitwise of 2 ^ n-1 must be 0, and 0, 1, and 0 are used for the "and" operation, and the result is 0. That is to say, no matter how much h is, the result of h & (length-1) is 0. Then all the positions with 1 in the array are vacant, which leads to uneven distribution of data in the array and affects the query efficiency.

 

It is much simpler to read data. You can find the Entry in the index of the table array by using the hash value of the key and then return the value corresponding to the key.

 

References:

 

Http://www.cnblogs.com/chenssy/p/3521565.html

Http://blog.csdn.net/zhuanshenweiliu/article/details/39177447

Http://blog.csdn.net/tanggao1314/article/details/51457585#t1

Http://www.importnew.com/18851.html

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.