A detailed description of the latest Java array

Source: Internet
Author: User

Java in HashMap detailed http://alex09.iteye.com/blog/539545

Summarize:

1. Just like an array of reference types, when we put a Java object into an array, we don't really put the Java object in the array, just put the object's reference into the array, each array element is a reference variable.

2.HashMap uses a so-called "Hash algorithm" to determine where each element is stored.

The 3.HashMap bottom uses a entry[] array to hold all key-value pairs, and when a Entry object needs to be stored, the storage location is determined according to the Hash algorithm;

HashMap and HashSet are two important members of the Java Collection Framework, where HashMap is a common implementation class for the Map interface, and HashSet is a common implementation class for the Set interface. Although the interface specifications implemented by HASHMAP and HashSet are different, their underlying Hash storage mechanism is exactly the same, even HashSet itself is implemented using HASHMAP.
The Hash storage mechanism is analyzed through the source code of HASHMAP and HashSet.
In fact, there are many similarities between HashSet and HashMap, for HashSet, the system uses the Hash algorithm to determine the storage location of the collection elements, so as to ensure that the set elements can be stored and taken quickly, and for HashMap, the system key-value as a whole Line processing, the system always calculates the storage location of the Key-value according to the Hash algorithm, so that the key-value of the MAP can be saved and taken quickly.

Before you introduce the collection store, it is important to point out that although the collection is supposed to store Java objects, it does not actually put the Java objects in the Set collection, only the references to those objects in the Set collection. That is, the Java collection is actually a collection of multiple reference variables that point to the actual Java object.

collections and References

Just like an array of reference types, when we put a Java object into an array, we don't actually put the Java object in the array, just put the object's reference into the array, each array element is a reference variable.

Storage implementation of HASHMAP
When the program tries to put multiple key-value into HashMap, take the following code snippet as an example:

Java code
    1. hashmap<string, double> map = new hashmap<string, double> ();
    2. Map.put ("language", 80.0);
    3. Map.put ("mathematics", 89.0);
    4. Map.put ("English", 78.2);




HashMap uses a so-called "Hash algorithm" to determine where each element is stored.

When the program executes Map.put ("language", 80.0); , the system will invoke the Hashcode () method of the "language" to get its hashcode value-each Java object has a hashcode () method that can be used to obtain its hashcode value. After you get the Hashcode value for this object, the system determines where the element is stored, based on the hashcode value.

We can look at the source code of the put (K key, V value) method of the HashMap class:

Java code
  1. Public V put (K key, V value)
  2. {
  3. //If key is null, call the Putfornullkey method for processing
  4. if (key = = null)
  5. return Putfornullkey (value);
  6. //Calculate Hash value according to Key's KeyCode
  7. int hash = hash (Key.hashcode ());
  8. //Search the index of the specified hash value in the corresponding table
  9. int i = indexfor (hash, table.length);
  10. //If the Entry at the I index is not NULL, the next element of the E element is traversed continuously through the loop
  11. For (entry<k,v> e = table[i]; E! = null; e = e.next)
  12. {
  13. Object K;
  14. //Find the specified key equal to the key to be placed (the hash value is the same
  15. //Put back true by equals comparison)
  16. if (E.hash = = Hash && (k = e.key) = = Key
  17. || Key.equals (k)))
  18. {
  19. V oldValue = E.value;
  20. E.value = value;
  21. E.recordaccess (this);
  22. return oldValue;
  23. }
  24. }
  25. //If the Entry at the I index is NULL, it indicates that there is no Entry
  26. modcount++;
  27. //Add key, value to index at I
  28. AddEntry (hash, key, value, I);
  29. return null;
  30. }


The above program uses an important internal interface: Map.entry, each map.entry is actually a key-value pair. As can be seen from the above program: when the system decides to store the Key-value pair in the HASHMAP, it does not take into account the value in Entry, but only calculates and determines the storage location of each Entry based on key. This also illustrates the previous conclusion: we can completely consider the value of the MAP set as a subsidiary of the key, and when the system determines where the key is stored, value is stored there.

The method above provides a method for calculating the hash code based on the Hashcode () return value: hash (), which is a purely mathematical calculation with the following method:

Java code
    1. static int hash (int h)
    2. {
    3. H ^= (H >>> ) ^ (H >>> 12);
    4. return H ^ (H >>> 7) ^ (H >>> 4);
    5. }




For any given object, the hash code value computed by the program call hash (int h) method is always the same as long as its hashcode () return value is the same. The program then calls the indexfor (int h, int length) method to calculate at which index the object should be stored in the table array. The code for the indexfor (int h, int length) method is as follows:

Java code
    1. static int indexfor (int h, int length)
    2. {
    3. return H & (length-1);
    4. }



This method is very ingenious, it always through H & (table.length-1) to get the object's save location-and HashMap the bottom array is always 2 of the length of the N-square, this point can be see later about the HashMap constructor of the introduction.

When length is always a multiple of 2, H & (Length-1) will be a very ingenious design: assuming h=5,length=16, then H & Length-1 will get 5; if h=6,length=16, then H &am P Length-1 will get 6 ... If h=15,length=16, then H & Length-1 will get 15, but when H=16, length=16, then H & Length-1 will get 0; when h=17, length=1 6 o'clock, then H & length-1 will get 1 ... This ensures that the computed index value is always within the index of the table array.

According to the source code of the Put method above, when the program tries to put a key-value pair into HashMap, the program first determines the storage location of the Entry based on the hashcode () return value of the key: if the Entry of two HASHC keys The ODE () return values are the same, and they are stored in the same location. If these two Entry keys return true by equals, the newly added Entry value overrides the Entry value in the collection, but the key is not overwritten. If these two Entry keys return false by Equals, the newly added Entry will form a Entry chain with Entry in the collection, and the newly added Entry is located in the head of the Entry chain--Specify to continue to see AddEntry () Description of the method.

When adding key-value pairs to HashMap, the hashcode () return value of its key determines where the key-value pair (that is, the Entry object) is stored. When the hashcode () return value of the key of the two Entry object is the same, it is determined by the key through the Eqauls () comparison value whether the overwrite behavior (return true) or the Entry chain (return false) is used.

The above program also calls the AddEntry (hash, key, value, I); Code, where AddEntry is a method that HASHMAP provides access to a package that is used only to add a key-value pair. Here is the code for the method:

Java code
  1. void AddEntry (int hash, K key, V value, int bucketindex)
  2. {
  3. //Gets the Entry at the specified Bucketindex index
  4. Entry<k,v> e = Table[bucketindex]; //①
  5. //Add the newly created Entry to the Bucketindex index and let the new Entry point to the original Entry
  6. Table[bucketindex] = new entry<k,v> (hash, key, value, E);
  7. //If the number of key-value pairs in the Map exceeds the limit
  8. if (size++ >= threshold)
  9. //Extend the length of the table object to twice times.
  10. Resize (2 * table.length); //②
  11. }




The code for the above method is simple, but it contains a very elegant design: The system always places the newly added Entry object in the Bucketindex index of the table array--If a Entry object is already at the Bucketindex index, the newly added Entry Like pointing to the original Entry object (producing a Entry chain), if there is no Entry object at the Bucketindex index, that is, the e variable of the program ① code above is null, that is, the newly placed Entry object points to null, that is, no resulting Entry Chain.

JDK Source Code

In the JDK installation directory, you can find a Src.zip compressed file that contains all the source files for the Java base Class library. As long as the reader is interested in learning, you can open this compressed file to read the Java class Library source code, which is very helpful to improve the reader's programming ability. It should be noted that the source code contained in the Src.zip does not contain the Chinese comments as described above, which are added by the author himself.

Performance options for Hash algorithms

As can be seen from the above code, in the case of the same bucket storage Entry chain, the new Entry is always in the bucket, and the first place in the bucket Entry is located at the end of the Entry chain.

There are two more variables in the above program:

* Size: This variable holds the number of key-value pairs contained in the HASHMAP.
* Threshold: This variable contains the limit of the key-value pair that the HASHMAP can hold, and its value equals the capacity of HashMap multiplied by the load factor (load factor).

From the ② code in the above program, it can be seen that when size++ >= threshold, HashMap automatically calls resize method to expand the capacity of HashMap. For each expansion, the capacity of the HashMap is increased by one-fold.

The table used in the above program is actually an ordinary array, each array has a fixed length, the length of the array is HashMap capacity. The HASHMAP contains the following constructors:

* HASHMAP (): Constructs a HashMap with an initial capacity of 16 and a load factor of 0.75.
* HASHMAP (int initialcapacity): Build a HashMap with an initial capacity of initialcapacity and a load factor of 0.75.
* HASHMAP (int initialcapacity, float loadfactor): Creates a HashMap with the specified initial capacity, specified load factor.

When creating a HashMap, the system automatically creates a table array to hold the Entry in HashMap, and the following is the code for a constructor in HashMap:

Java code
  1. Create HASHMAP with specified initialization capacity, load factor
  2. Public HashMap (int initialcapacity, float loadfactor)
  3. {
  4. //Initial capacity cannot be negative
  5. if (Initialcapacity < 0)
  6. throw New IllegalArgumentException (
  7. "Illegal initial capacity:" +
  8. initialcapacity);
  9. //If the initial capacity is greater than the maximum capacity, show capacity
  10. if (initialcapacity > Maximum_capacity)
  11. initialcapacity = maximum_capacity;
  12. //load factor must be greater than 0 value
  13. if (loadfactor <= 0 | | Float.isnan (loadfactor))
  14. throw New IllegalArgumentException (
  15. Loadfactor);
  16. //Calculates a minimum of 2 of the n-th square value greater than initialcapacity.
  17. int capacity = 1;
  18. While (capacity < initialcapacity)
  19. Capacity <<= 1;
  20. this.loadfactor = Loadfactor;
  21. //Set capacity limit equals capacity * load factor
  22. Threshold = (int) (capacity * loadfactor);
  23. //Initialize table array
  24. Table = new entry[capacity]; //①
  25. Init ();
  26. }




The bold code in the above code contains a concise code implementation: Find the minimum 2 N-square value greater than initialcapacity, and use it as the actual capacity of the HashMap (saved by the capacity variable). For example, given a initialcapacity of 10, the actual capacity of the HASHMAP is 16.
Program ① code can be seen: the essence of table is an array, a length of capacity array.

For HashMap and its subclasses, they use a Hash algorithm to determine where the elements in the collection are stored. When the system starts initializing HASHMAP, the system creates an Entry array of length capacity, where the elements can be stored in a bucket, each bucket has its specified index, and the system can quickly access the bucket based on its index. Elements stored in the.

Whenever a HashMap "bucket" stores only one element (that is, one Entry), the Entry object can contain a reference variable (the last parameter of the Entry constructor) to point to the next Entry, so the possible scenario is: HASHMAP There is only one Entry in the bucket, but this Entry points to another entry--which forms a chain of Entry. 1 is shown below:



Figure 1. Storage schematic of the HASHMAP

Read implementation of HASHMAP

HASHMAP has the best performance when the Entry stored in each bucket of HASHMAP is simply a single entry--that is not generated by a pointer Entry: When the program takes the value out of the key, the system calculates the key first The Hashcode () return value that finds the index of the key in the table array based on the Hashcode return value, then takes out the Entry at that index, and finally returns the value corresponding to the key. Look at the Get (K key) method code for the HashMap class:

Java code
  1. Public V get (Object key)
  2. {
  3. //If key is null, call Getfornullkey to remove the corresponding value
  4. if (key = = null)
  5. return Getfornullkey ();
  6. //Calculate its hash code based on the hashcode value of the key
  7. int hash = hash (Key.hashcode ());
  8. //directly takes out the value at the specified index in the table array,
  9. for (entry<k,v> e = table[indexfor (hash, table.length)];
  10. E! = null;
  11. //Search for the next Entr of the Entry chain
  12. E = e.next) //①
  13. {
  14. Object K;
  15. //If the Entry key is the same as the key being searched
  16. if (E.hash = = Hash && (k = e.key) = = Key
  17. || Key.equals (k)))
  18. return e.value;
  19. }
  20. return null;
  21. }




As can be seen from the above code, if there is only one Entry in each bucket of HashMap, HashMap can quickly take out the Entry in the bucket according to the index, and in the case of "Hash conflict", the single bucket is not stored in an E Ntry, instead of a Entry chain, the system must traverse each Entry sequentially until it finds the Entry to search for--if the Entry that happens to be searched is at the very end of the Entry chain (the Entry is first placed in the bucket), the system must loop to the Before you can find the element.

summed up simply, HashMap at the bottom of the key-value as a whole to deal with, this whole is a Entry object. HashMap the bottom of a entry[] array to hold all key-value pairs, when a Entry object needs to be stored, according to the hash algorithm to determine its storage location, when the need to remove a Entry, the hash algorithm will also find its storage location, directly take out The Entry. Thus: HashMap is able to quickly save, take it contains the Entry, exactly like the real life of the mother taught us: different things to put in different locations, when needed to quickly find it.

When creating a HashMap, there is a default load factor (load factor) with a default value of 0.75, which is a tradeoff between time and space costs: increasing the load factor can reduce the memory footprint of the Hash table (which is the Entry array), but increases the time overhead of querying the data , and the query is the most frequent operation (the HashMap get () and the Put () method all use the query); Reducing the load factor will improve the performance of the data query, but will increase the memory space occupied by the Hash table.

Having mastered the above knowledge, we can adjust the value of load factor according to the actual need when we create HASHMAP, if the program is concerned about space overhead, memory is more tense, the load factor can be increased appropriately, if the program is more concerned about the time overhead, A more comfortable memory can reduce the load factor appropriately. Typically, programmers do not need to change the value of the load factor.

If you start to know that HashMap will save multiple key-value pairs, you can use a large initialization capacity at creation time, if the number of Entry in HashMap never exceeds the limit capacity (capacity * load factor), HASHMAP does not need to call The resize () method re-allocates the table array to ensure good performance. Of course, starting to set the initial capacity too high can be a waste of space (the system needs to create a Entry array of length capacity), so initializing the capacity setting when creating HashMap also requires careful treatment.

A detailed description of the latest Java array

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.