A detailed description of the latest Java array

Last Update:2015-07-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java in HashMap detailed http://alex09.iteye.com/blog/539545

Summarize:

1. Just like an array of reference types, when we put a Java object into an array, we don't really put the Java object in the array, just put the object's reference into the array, each array element is a reference variable.

2.HashMap uses a so-called "Hash algorithm" to determine where each element is stored.

The 3.HashMap bottom uses a entry[] array to hold all key-value pairs, and when a Entry object needs to be stored, the storage location is determined according to the Hash algorithm;

HashMap and HashSet are two important members of the Java Collection Framework, where HashMap is a common implementation class for the Map interface, and HashSet is a common implementation class for the Set interface. Although the interface specifications implemented by HASHMAP and HashSet are different, their underlying Hash storage mechanism is exactly the same, even HashSet itself is implemented using HASHMAP.
The Hash storage mechanism is analyzed through the source code of HASHMAP and HashSet.
In fact, there are many similarities between HashSet and HashMap, for HashSet, the system uses the Hash algorithm to determine the storage location of the collection elements, so as to ensure that the set elements can be stored and taken quickly, and for HashMap, the system key-value as a whole Line processing, the system always calculates the storage location of the Key-value according to the Hash algorithm, so that the key-value of the MAP can be saved and taken quickly.

Before you introduce the collection store, it is important to point out that although the collection is supposed to store Java objects, it does not actually put the Java objects in the Set collection, only the references to those objects in the Set collection. That is, the Java collection is actually a collection of multiple reference variables that point to the actual Java object.

collections and References

Just like an array of reference types, when we put a Java object into an array, we don't actually put the Java object in the array, just put the object's reference into the array, each array element is a reference variable.

Storage implementation of HASHMAP
When the program tries to put multiple key-value into HashMap, take the following code snippet as an example:

Java code

hashmap<string, double> map = new hashmap<string, double> ();
Map.put ("language", 80.0);
Map.put ("mathematics", 89.0);
Map.put ("English", 78.2);

HashMap uses a so-called "Hash algorithm" to determine where each element is stored.

When the program executes Map.put ("language", 80.0); , the system will invoke the Hashcode () method of the "language" to get its hashcode value-each Java object has a hashcode () method that can be used to obtain its hashcode value. After you get the Hashcode value for this object, the system determines where the element is stored, based on the hashcode value.

We can look at the source code of the put (K key, V value) method of the HashMap class:

Java code

Public V put (K key, V value)
{
//If key is null, call the Putfornullkey method for processing
if (key = = null)
return Putfornullkey (value);
//Calculate Hash value according to Key's KeyCode
int hash = hash (Key.hashcode ());
//Search the index of the specified hash value in the corresponding table
int i = indexfor (hash, table.length);
//If the Entry at the I index is not NULL, the next element of the E element is traversed continuously through the loop
For (entry<k,v> e = table[i]; E! = null; e = e.next)
{
Object K;
//Find the specified key equal to the key to be placed (the hash value is the same
//Put back true by equals comparison)
if (E.hash = = Hash && (k = e.key) = = Key
|| Key.equals (k)))
{
V oldValue = E.value;
E.value = value;
E.recordaccess (this);
return oldValue;
}
}
//If the Entry at the I index is NULL, it indicates that there is no Entry
modcount++;
//Add key, value to index at I
AddEntry (hash, key, value, I);
return null;
}

The above program uses an important internal interface: Map.entry, each map.entry is actually a key-value pair. As can be seen from the above program: when the system decides to store the Key-value pair in the HASHMAP, it does not take into account the value in Entry, but only calculates and determines the storage location of each Entry based on key. This also illustrates the previous conclusion: we can completely consider the value of the MAP set as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

The method above provides a method for calculating the hash code based on the Hashcode () return value: hash (), which is a purely mathematical calculation with the following method:

Java code

static int hash (int h)
{
H ^= (H >>> ) ^ (H >>> 12);
return H ^ (H >>> 7) ^ (H >>> 4);
}

For any given object, the hash code value computed by the program call hash (int h) method is always the same as long as its hashcode () return value is the same. The program then calls the indexfor (int h, int length) method to calculate at which index the object should be stored in the table array. The code for the indexfor (int h, int length) method is as follows:

Java code

static int indexfor (int h, int length)
{
return H & (length-1);
}

This method is very ingenious, it always through H & (table.length-1) to get the object's save location-and HashMap the bottom array is always 2 of the length of the N-square, this point can be see later about the HashMap constructor of the introduction.

When length is always a multiple of 2, H & (Length-1) will be a very ingenious design: assuming h=5,length=16, then H & Length-1 will get 5; if h=6,length=16, then H &am P Length-1 will get 6 ... If h=15,length=16, then H & Length-1 will get 15, but when H=16, length=16, then H & Length-1 will get 0; when h=17, length=1 6 o'clock, then H & length-1 will get 1 ... This ensures that the computed index value is always within the index of the table array.

According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry of two HASHC keys The ODE () return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see AddEntry () Description of the method.

When adding key-value pairs to HashMap, the hashcode () return value of its key determines where the key-value pair (that is, the Entry object) is stored. When the hashcode () return value of the key of the two Entry object is the same, it is determined by the key through the Eqauls () comparison value whether the overwrite behavior (return true) or the Entry chain (return false) is used.

The above program also calls the AddEntry (hash, key, value, I); Code, where AddEntry is a method that HASHMAP provides access to a package that is used only to add a key-value pair. Here is the code for the method:

Java code

void AddEntry (int hash, K key, V value, int bucketindex)
{
//Gets the Entry at the specified Bucketindex index
Entry<k,v> e = Table[bucketindex]; //①
//Add the newly created Entry to the Bucketindex index and let the new Entry point to the original Entry
Table[bucketindex] = new entry<k,v> (hash, key, value, E);
//If the number of key-value pairs in the Map exceeds the limit
if (size++ >= threshold)
//Extend the length of the table object to twice times.
Resize (2 * table.length); //②
}

The code for the above method is simple, but it contains a very elegant design: The system always places the newly added Entry object in the Bucketindex index of the table array--If a Entry object is already at the Bucketindex index, the newly added Entry Like pointing to the original Entry object (producing a Entry chain), if there is no Entry object at the Bucketindex index, that is, the e variable of the program ① code above is null, that is, the newly placed Entry object points to null, that is, no resulting Entry Chain.

JDK Source Code

In the JDK installation directory, you can find a Src.zip compressed file that contains all the source files for the Java base Class library. As long as the reader is interested in learning, you can open this compressed file to read the Java class Library source code, which is very helpful to improve the reader's programming ability. It should be noted that the source code contained in the Src.zip does not contain the Chinese comments as described above, which are added by the author himself.

Performance options for Hash algorithms

As can be seen from the above code, in the case of the same bucket storage Entry chain, the new Entry is always in the bucket, and the first place in the bucket Entry is located at the end of the Entry chain.

There are two more variables in the above program:

* Size: This variable holds the number of key-value pairs contained in the HASHMAP.
* Threshold: This variable contains the limit of the key-value pair that the HASHMAP can hold, and its value equals the capacity of HashMap multiplied by the load factor (load factor).

From the ② code in the above program, it can be seen that when size++ >= threshold, HashMap automatically calls resize method to expand the capacity of HashMap. For each expansion, the capacity of the HashMap is increased by one-fold.

The table used in the above program is actually an ordinary array, each array has a fixed length, the length of the array is HashMap capacity. The HASHMAP contains the following constructors:

* HASHMAP (): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.
* HASHMAP (int initialcapacity): Build a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.
* HASHMAP (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, specified load factor.

When creating a HashMap, the system automatically creates a table array to hold the Entry in HashMap, and the following is the code for a constructor in HashMap:

Java code

Create HASHMAP with specified initialization capacity, load factor
Public HashMap (int initialcapacity, float loadfactor)
{
//Initial capacity cannot be negative
if (Initialcapacity < 0)
throw New IllegalArgumentException (
"Illegal initial capacity:" +
initialcapacity);
//If the initial capacity is greater than the maximum capacity, show capacity
if (initialcapacity > Maximum_capacity)
initialcapacity = maximum_capacity;
//load factor must be greater than 0 value
if (loadfactor <= 0 | | Float.isnan (loadfactor))
throw New IllegalArgumentException (
Loadfactor);
//Calculates a minimum of 2 of the n-th square value greater than initialcapacity.
int capacity = 1;
While (capacity < initialcapacity)
Capacity <<= 1;
this.loadfactor = Loadfactor;
//Set capacity limit equals capacity * load factor
Threshold = (int) (capacity * loadfactor);
//Initialize table array
Table = new entry[capacity]; //①
Init ();
}

The bold code in the above code contains a concise code implementation: Find the minimum 2 N-square value greater than initialcapacity, and use it as the actual capacity of the HashMap (saved by the capacity variable). For example, given a initialcapacity of 10, the actual capacity of the HASHMAP is 16.
Program ① code can be seen: the essence of table is an array, a length of capacity array.

For HashMap and its subclasses, they use a Hash algorithm to determine where the elements in the collection are stored. When the system starts initializing HASHMAP, the system creates an Entry array of length capacity, where the elements can be stored in a bucket, each bucket has its specified index, and the system can quickly access the bucket based on its index. Elements stored in the.

Whenever a HashMap "bucket" stores only one element (that is, one Entry), the Entry object can contain a reference variable (the last parameter of the Entry constructor) to point to the next Entry, so the possible scenario is: HASHMAP There is only one Entry in the bucket, but this Entry points to another entry--which forms a chain of Entry. 1 is shown below:

Figure 1. Storage schematic of the HASHMAP

Read implementation of HASHMAP

HASHMAP has the best performance when the Entry stored in each bucket of HASHMAP is simply a single entry--that is not generated by a pointer Entry: When the program takes the value out of the key, the system calculates the key first The Hashcode () return value that finds the index of the key in the table array based on the Hashcode return value, then takes out the Entry at that index, and finally returns the value corresponding to the key. Look at the Get (K key) method code for the HashMap class:

Java code

Public V get (Object key)
{
//If key is null, call Getfornullkey to remove the corresponding value
if (key = = null)
return Getfornullkey ();
//Calculate its hash code based on the hashcode value of the key
int hash = hash (Key.hashcode ());
//directly takes out the value at the specified index in the table array,
for (entry<k,v> e = table[indexfor (hash, table.length)];
E! = null;
//Search for the next Entr of the Entry chain
E = e.next) //①
{
Object K;
//If the Entry key is the same as the key being searched
if (E.hash = = Hash && (k = e.key) = = Key
|| Key.equals (k)))
return e.value;
}
return null;
}

As can be seen from the above code, if there is only one Entry in each bucket of HashMap, HashMap can quickly take out the Entry in the bucket according to the index, and in the case of "Hash conflict", the single bucket is not stored in an E Ntry, instead of a Entry chain, the system must traverse each Entry sequentially until it finds the Entry to search for--if the Entry that happens to be searched is at the very end of the Entry chain (the Entry is first placed in the bucket), the system must loop to the Before you can find the element.

summed up simply, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. HashMap the bottom of a entry[] array to hold all key-value pairs, when a Entry object needs to be stored, according to the hash algorithm to determine its storage location, when the need to remove a Entry, the hash algorithm will also find its storage location, directly take out The Entry. Thus: HashMap is able to quickly save, take it contains the Entry, exactly like the real life of the mother taught us: different things to put in different locations, when needed to quickly find it.

When creating a HashMap, there is a default load factor (load factor) with a default value of 0.75, which is a tradeoff between time and space costs: increasing the load factor can reduce the memory footprint of the Hash table (which is the Entry array), but increases the time overhead of querying the data , and the query is the most frequent operation (the HashMap get () and the Put () method all use the query); Reducing the load factor will improve the performance of the data query, but will increase the memory space occupied by the Hash table.

Having mastered the above knowledge, we can adjust the value of load factor according to the actual need when we create HASHMAP, if the program is concerned about space overhead, memory is more tense, the load factor can be increased appropriately, if the program is more concerned about the time overhead, A more comfortable memory can reduce the load factor appropriately. Typically, programmers do not need to change the value of the load factor.

If you start to know that HashMap will save multiple key-value pairs, you can use a large initialization capacity at creation time, if the number of Entry in HashMap never exceeds the limit capacity (capacity * load factor), HASHMAP does not need to call The resize () method re-allocates the table array to ensure good performance. Of course, starting to set the initial capacity too high can be a waste of space (the system needs to create a Entry array of length capacity), so initializing the capacity setting when creating HashMap also requires careful treatment.

A detailed description of the latest Java array

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A detailed description of the latest Java array

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A detailed description of the latest Java array

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support