The important nature and optimization of HashMap in java, javahashmap

Source: Internet
Author: User

The important nature and optimization of HashMap in java, javahashmap

Preface

As HashMap plays an important role in java Development, it is particularly important to summarize some important attributes and optimization points of hashmap, at the same time, it can also improve the efficiency of hashMap in practical work. However, this article does not give a much overview of the analysis of hashMap. This article is intended to help beginners of java or readers who wish to have a deeper understanding of hashMap usage efficiency.

 

1. Key attributes of hashMap

1,

/**
* The default initial capacity-MUST be a power of two. (initial map size)
*/
DEFAULT_INITIAL_CAPACITY = 16; (default size)

 

2,

/**
* The maximum capacity, used if a higher value is implicitly specified
* By either of the constructors with arguments.
* MUST be a power of two <= 1 <30.

* (Maximum capacity. This value is used if the specified capacity is greater than the maximum capacity)
*/
MAXIMUM_CAPACITY = 1 <30; (maximum capacity)

 

3,

/**
* The load factor used when none specified in constructor.
*/
DEFAULT_LOAD_FACTOR = 0.75f; (default load factor)

 

4,

/**
* The next size value at which to resize (capacity * load factor ).
* (The decisive factor for map resizing)
*/
Threshold;

 

5,

Bucket (the minimum storage unit in the array, which is Entry in the source code)

 

II. Introduction to the process of creating a HashMap to put

HashMap is known by its name. It uses a hash algorithm to manage objects stored in it. It is managed by two data structures: array and linked list.

1. Initialization

If all parameters are specified, the default value is used for initialization.

This. loadFactor = DEFAULT_LOAD_FACTOR;
Threshold = (int) (DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR );
Table = new Entry [DEFAULT_INITIAL_CAPACITY]; (an array is initialized to store objects, buckets)

If loadFactor and initialCapacity are specified

This. loadFactor = loadFactor

The program uses initialCapacity to calculate a new capacity. The capacity is an integer power value of 2, which is greater than the minimum initial easy value. (For example, if the initial capacity is 15, the capacity is 16. if the initial value is 3, the capacity is 4 ),

Threshold = (int) (capacity * loadFactor );

Table = new Entry [capacity];

2. put

    During the put process, if an object is hashed to the same bucket, a linked list is formed, and the linked list query is linear. After the object is put into the map, the map size is checked. If the size of the map is greater than or equal to the value of threshold (capacity * load factor), note that it is not scaled when the size is greater than capacity, the capacity will be expanded by twice the size of map (this step is designed to re-apply for space and calculate the hash value, the performance consumption is relatively large)

 

  3. Optimize hashMap

If the internal array of the hash ing contains only one element, all items are mapped to the position of this array to form a long link list. Because our update and access use linear search for the link list, this is much slower than the case where each array index in Map contains only one object, therefore, the efficiency is very low. The time used to access or update the link list is linearly related to the list size. When a hash function is used to ask or update a single element in an array, it is irrelevant to the array size-it is progressive (Big-O Notation) the former is O (n), while the latter is O (1 ). Therefore, it makes sense to use a large array instead of clustering too many items in too few array locations.

Adjust the size of Map implementation

In hash terms, each location in an internal array is called a bucket, and the number of available buckets (the size of an internal array) is called capacity ). To make the Map object effectively process any number of items, the Map implementation can adjust its own size. However, the overhead of resizing is high. To adjust the size, you need to re-insert all elements into the new array. This is because different array sizes mean that the object is now mapped to different index values. Previously conflicted keys may no longer conflict with each other, while other keys that previously did not conflict with each other may now conflict. This clearly indicates that if Map is adjusted to a large value, it can be reduced or even no longer needs to be adjusted, which may significantly increase the speed.

Run a simple test using 1.4.2 JVM, that is, fill the HashMap with a large number of items (more than 1 million. Table 5 shows the results and standardizes all time into a server mode with preset sizes (associated files. For JVM with preset size, the running time of the client and server mode JVM is almost the same (after the JIT compilation phase is abandoned ). However, the default size of Map will lead to multiple resizing operations with a high overhead. In server mode, you need to spend 50% more time, in client mode, it takes almost twice as much time!

Table 5: time needed to populate a pre-configured HashMap with the default size

  Client Mode Server Mode
Pre-set size 100% 100%
Default size 294% 157%

 

Use load factor

To determine when to adjust the size, instead of counting the depth of the Link List in each bucket, the hash-Based Map uses an additional parameter and roughly calculates the bucket density. Before adjusting the size, Map uses a parameter named "load factor" to indicate the "LOAD" that Map will undertake, that is, its load level. The relationship between the load factor, Map size, and capacity is simple and clear:

 

  • If (load factor) x (capacity)> (Map size), adjust the Map size

 

For example, if the default load factor is 0.75 and the default capacity is 11, then 11x0.75 = 8.25, this value is rounded down to 8 elements. Therefore, if you add 8th items to this Map, the Map will resize itself to a larger value. On the contrary, to calculate the initial capacity required to avoid resizing, divide the number of items to be added by the load factor and perform an integer upwards. For example,

 

  • For the 0.75 items with a load factor of 100, the capacity should be set to 100/0. 75 = 133.33, and the result should be rounded up to 134 (or rounded up to 135 to use an odd number)

 

The odd number of buckets enables map to improve execution efficiency by reducing the number of conflicts. Although I did the test (the associated file does not indicate that the prime number can always achieve better efficiency, but the ideal situation is to take the capacity of the prime number. Some maps (such as HashMap and LinkedHashMap, instead of Hashtable or IdentityHashMap) after version 1.4 Use hash functions that require the power of 2, but the power of the next up to 2 is calculated by these maps, therefore, you do not have to calculate it yourself.

The load factor itself is a compromise between the adjustment of space and time. A smaller load factor will occupy more space, but will reduce the possibility of conflict, which will speed up access and updates. It may be unwise to use a load factor greater than 0.75, but it is unknown to use a load factor greater than 1.0, because it will inevitably lead to a conflict. The advantage of using a load factor smaller than 0.50 is not big, but as long as you adjust the Map size effectively, it will not cause performance overhead to the small load factor, but will only cause memory overhead. However, a smaller load factor means that if you do not adjust the Map size in advance, the size will be adjusted more frequently to reduce performance, therefore, you must pay attention to this problem when adjusting the load factor.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.