How the HashMap works

Last Update:2016-12-23 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a section to let you in-depth understanding of Hash_map's introduction, if you just want to swallowed, do not want to understand the principle, you can skip this section, but I still suggest you look, more understanding of some no harm.

The Hash_map is based on a hash table (hash table). The biggest advantage of the hash table is that the time to store and find data is greatly reduced, almost as a constant time, but at the cost of more memory than it consumes. However, with more and more memory available now, it is worthwhile to use space for time-swapping. In addition, the coding is easy is also one of its characteristics.

The rationale is to use an array with a larger subscript range to store the elements. You can design a function (a hash function, also called a hash function) so that each element's keyword corresponds to a function value (that is, the array subscript, hash value), and then use the array unit to store the element, or it can be simply understood, by the keyword for each element "classification", This element is then stored in the corresponding "class" corresponding to the place, called the bucket.

However, it is not possible to guarantee that each element's keyword corresponds to a function value of one by one, so it is very likely that there are different elements, but the same function values are computed, thus creating a "conflict", in other words, by splitting different elements into the same "class". In general, "direct addressing" and "conflict resolution" are two major features of the hash table.

Hash_map, first allocating a large amount of memory, forming many barrels. is to use the hash function, the key is mapped to different regions (buckets) to save. The insertion process is:

1. Get key
2. Hash value obtained by hash function
3. Get the bucket number (usually hash value for the number of barrels to mold)
4. Store key and value inside the bucket.

The procedure for taking the value is:

1. Get key
2. Hash value obtained by hash function
3. Get the bucket number (usually hash value for the number of barrels to mold)
4. Compare whether the inner elements of the bucket are equal to the key, and if they are not equal, they are not found.
5. Remove the value of the equivalent record. The direct address in Hash_map is generated by the hash function, resolving the conflict and solving it with the comparison function. As can be seen here, if there is only one element inside each bucket, then there is only one comparison when searching. Many queries are faster when there are no values in many buckets (when they are not found).

Thus, to implement a hash table, the user is related to: hash function (hashcode) and comparison function (equals).

It is assumed that the hash function distributes elements correctly between buckets, providing stable performance for basic operations (get and put). The time that is required to iterate the collection view is proportional to the "capacity" (the number of buckets) of the HASHMAP instance and its size (number of key-value mappings). Therefore, if iteration performance is important, do not set the initial capacity too high (or set the load factor too low). An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. Capacity is the number of buckets in the hash table, and the initial capacity is just the capacity at the time of creation of the Hashtable. A load factor is a scale in which a hash table can reach a full amount before its capacity increases automatically. Doubles the capacity by calling the Rehash method when the number of entries in the hash table exceeds the product of the load factor to the current capacity. Typically, the default load factor (. 75) seeks a tradeoff in time and space costs. The high load factor, while reducing the space overhead, also increases the query cost (which is reflected in the operations of most HASHMAP classes, including get and put operations). When setting the initial capacity, you should take into account the number of entries required in the mapping and their loading factors in order to minimize the number of rehash operations. The rehash operation does not occur if the initial capacity is greater than the maximum number of entries divided by the load factor. If many of the mapping relationships are to be stored in the HASHMAP instance, creating it with a large enough initial capacity will make it more efficient to store the mapping relationship relative to the automatic rehash operation on demand to increase the capacity of the table. Note that this implementation is not synchronous. If multiple threads access this mapping at the same time, and at least one of the threads modifies the mapping from the fabric, it must remain externally synchronized. (Structural modifications are actions that add or remove one or more mapping relationships; changing only the values associated with the key that the instance already contains are not structural modifications.) This is typically done by synchronizing the objects that naturally encapsulate the mapping. If such an object does not exist, you should use the Collections.synchronizedmap method to "wrap" the map. It is a good idea to do this at creation time to prevent accidental, out-of-sync access to the mappings as follows: Map m = Collections.synchronizedmap (new HashMap (...));

The working principle of hashmap is a common Java surface question in recent years. Almost every Java programmer knows HashMap, know where to use HashMap, know the difference between Hashtable and HashMap, then why this question is so special? It is because of the depth of the examination of the problem. This problem is often seen in senior or intermediate interviews. Investment banks prefer to ask this question and even ask you to implement HASHMAP to examine your programming skills. The introduction of Concurrenthashmap and other synchronization sets has made the problem more complicated. Let's start exploring the journey!

Let's start with some simple questions.

"Have you ever used HashMap?" "What is HashMap?" Why did you use it? ”

Almost everyone will answer "yes" and then answer some of the features of HashMap, such as HashMap can accept null key value and value, while Hashtable cannot; HashMap is synchronized; HashMap is fast; and HashMap stores key-value pairs and so on. This shows that you have used hashmap and are quite familiar with it. But the interviewer came to a precipitous start, asking some tricky questions about the details of HashMap's more basic. The interviewer may ask the following questions:

"Do you know how HashMap works?" "Do you know how the HashMap get () method works?" ”

You may be able to answer, "I didn't look up the standard Java API, you can look at Java source code or open JDK." "I can use Google to find the answer." ”

But some interviewers may be able to give the answer, "HashMap is based on the principle of hashing, we use put (key, value) to store objects into HashMap, using get (key) to get objects from HashMap. When we pass the key and value to the put () method, we first call the Hashcode () method on the key, and the returned hashcode is used to locate the bucket position to store the entry object. "The key point here is to point out that HashMap is storing key objects and value objects in buckets as Map.entry. This helps to understand the logic of acquiring objects. If you are not aware of this, or if you mistakenly think that you only store values in buckets, you will not be able to answer the logic of getting objects from HashMap. The answer is quite correct and shows that the interviewer really knows how hashing and hashmap work. But this is just the beginning of the story, and when the interviewer joins some real-world scenarios that Java programmers encounter every day, the wrong answer is frequent. The next question may be about collision detection (collision detection) in HashMap and the resolution of collisions:

"What happens when a hashcode of two objects is the same?" "From here, the real confusion begins, and some interviewers will answer because Hashcode is the same, so two objects are equal, HashMap will throw an exception, or it will not store them. The interviewer may then remind them of the Equals () and Hashcode () two methods and tell them that the two objects are hashcode the same, but they may not be equal. Some interviewers may give up, while others can move on, they answer "because the hashcode is the same, they have the same bucket position and the ' collision ' will happen." Because HashMap uses LinkedList storage objects, this entry (Map.entry object with key-value pairs) is stored in LinkedList. "The answer is very reasonable, although there are many ways to deal with collisions, this method is the simplest, and it is the hashmap approach." But the story is not over yet, the interviewer will continue to ask:

"If the two keys are the same hashcode, how do you get the value object?" The interviewer will answer: when we call the Get () method, HashMap uses the hashcode of the key object to find the bucket location and then gets the value object. The interviewer reminds him that if there are two value objects stored in the same bucket, he gives the answer: The LinkedList will be traversed until the value object is found. The interviewer will ask, because you don't have a value for the object to compare, how do you know for sure to find the value object? They will not be able to answer this question unless the interviewer is storing key-value pairs in LinkedList until HashMap.

Some of the interviewers who remember this important point of knowledge will say that after finding the bucket location, the Keys.equals () method is called to find the correct node in the LinkedList and finally find the value object to find. The perfect answer!

In many cases, interviewers will make mistakes in this link because they confuse the hashcode () and Equals () methods. Because Hashcode () occurs before this, the Equals () method appears only when the value object is fetched. Some good developers will point out that using immutable, declared final objects, and using the appropriate equals () and Hashcode () methods, will reduce collisions and increase efficiency. Immutability makes it possible to cache hashcode of different keys, which increases the overall speed of getting objects, and using wrapper classes like String,interger as Keys is a good choice.

If you think it's over, you'll be surprised to hear the following question. "What if the size of the hashmap exceeds the capacity defined by the load factor (payload factor)?" "Unless you really know how HashMap works, you won't be answering the question," he says. The default load factor size is 0.75, which means that when a map fills 75% buckets, as with other collection classes (such as ArrayList, etc.), it will create a twice-fold bucket array of the original HashMap size to resize the map. And put the original object in a new bucket array. This process is called rehashing because it calls the hash method to find the new bucket position.

If you can answer this question, here's the question: "Do you understand what's wrong with resizing hashmap?" "You may not be able to answer the question, and the interviewer will remind you that in the case of multithreading, conditional competition (race condition) may arise.

When resizing the HashMap, there is a conditional competition, because if two threads find that HashMap needs resizing, they try resizing at the same time. During resizing, the order of the elements stored in the LinkedList is reversed, because when moving to a new bucket position, HashMap does not place the element at the end of the LinkedList, but rather on the head, in order to avoid trailing traversal (tail traversing). If the conditional competition happens, then the cycle is dead. This time, you can question the interviewer, why so strange, in a multi-threaded environment to use HashMap it? ：）

Enthusiastic readers have contributed more questions about HashMap:

Why does a wrapper class like String, Interger, fit as a key? String, Interger such wrapper class as the HashMap key is more suitable, and string is most commonly used. Because the string is immutable and final, the Equals () and Hashcode () methods have been rewritten. Other wrapper classes also have this feature. Immutability is necessary because, in order to calculate hashcode (), it is important to prevent the key value from changing, and if the key value returns a different hashcode when it is placed and obtained, then you cannot find the object you want from the HashMap. Immutability has other advantages, such as thread safety. If you can make sure that hashcode is constant by simply declaring a field as final, then please do so. Because the Equals () and Hashcode () methods are used when acquiring the object, it is important that the key object is correctly overridden by the two methods. If two unequal objects return different hashcode, then the chances of collisions are smaller, which can improve the performance of HashMap.
Can we use the custom object as a key? This is an extension of the previous question. Of course you might use any object as a key, as long as it adheres to the definition rules of the Equals () and Hashcode () methods, and will no longer change when the object is inserted into the map. If the custom object is immutable, it already satisfies the condition as a key, because it cannot be changed once it is created.
Can we use Cocurrenthashmap instead of Hashtable? This is another very popular face test, because concurrenthashmap more and more people use. We know that Hashtable is synchronized, but the Concurrenthashmap synchronization performance is better because it locks only part of the map based on the sync level. Concurrenthashmap can of course take the place of Hashtable, but Hashtable provides stronger thread safety. Check out this blog to see the difference between Hashtable and Concurrenthashmap.

I personally like this problem because of the depth and breadth of the problem, and not directly related to different concepts. Let's take a look at these questions to design what knowledge points:

The concept of hashing
Methods of solving collisions in HashMap
The application of Equals () and hashcode (), and their importance in HashMap
The benefits of immutable objects
HashMap Multi-threaded conditional competition
Resize the HashMap

Summarize how the HashMap works

HashMap based on the hashing principle, we store and retrieve objects through the put () and get () methods. When we pass a key-value pair to the put () method, it calls the Hashcode () method of the Key object to calculate the hashcode, allowing the bucket position to be found to store the value object. When the object is fetched, the correct key-value pair is found by the Equals () method of the Key object, and then the value object is returned. HashMap uses LinkedList to solve the collision problem, and when a collision occurs, the object is stored in the next node of the LinkedList. HashMap stores key-value pairs of objects in each LinkedList node.

What happens when the hashcode of two different key objects is the same? They are stored in a linkedlist in the same bucket position. The Equals () method of the Key object is used to locate the key-value pair.

Because HashMap has many benefits, I used HashMap as a cache in e-commerce applications. Because of the many uses of Java in the financial sector, and for performance reasons, we often use HashMap and concurrenthashmap. You can see more articles about HashMap and Hashtable.

How the HashMap works

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More