Java HashMap Code parsing for the Java Collection Framework
After summarizing the Java Collection framework, the code of the specific collection class begins with the familiar and unfamiliar hashmap.
Originated from http://www.codeceo.com/article/java-hashmap-java-collection.html signature (signature)
public class Hashmap<k,v>extends abstractmap<k,v>implements Map<k,v>, cloneable, Serializable
You can see HashMap
the inherited
- The markup interface, Cloneable, is used to indicate that
HashMap
the object overrides java.lang.Object#clone()
the method, and HashMap implements a shallow copy (shallow copy).
- Tag interface serializable, which indicates that an
HashMap
object can be serialized
It is interesting to HashMap
also inherit the abstract class AbstractMap
and interface Map
, because AbstractMap
the signature of the abstract class is
Public abstract class Abstractmap<k,v> implements Map<k,v>
The Stack Overfloooow explains:
It is superfluous to inherit the interface at the syntactic level Map
, just so that the person reading the code clearly knows that HashMap
it belongs to the system and acts as Map
a document.
AbstractMap
The equivalent of an auxiliary class, Map
some operations this has already provided the default implementation, the following specific subclass if there is no special behavior, you can directly use AbstractMap
the implementation provided.
Cloneable interface
<code>it ' s evil, don ' t use It. </code>
Cloneable
This interface design is very bad, the most deadly point is that there is no clone
method, that is, we write our own class can fully implement this interface without rewriting the clone
method.
About Cloneable
The shortcomings, you can go to see "effective Java" the author gives reasons, in the link to the article, Josh Bloch will also say how to achieve a deep copy of the better, I do not repeat here.
Map interface
The outline panel in Eclipse can see that the Map
interface contains the following member methods and internal classes:
Map_field_method
As you can see, the member method here is nothing more than "adding and removing changes", which also reflects that when we write the program, it must be "data"-oriented.
Although not in the previous article Map
Collection
, it provides three "set View" (Collection views), corresponding to the following three method one by one:
Set<K> keySet()
, providing a set view of key
Collection<V> values()
, providing a collection view of value
Set<Map.Entry<K, V>> entrySet()
, which provides a set view of the key-value sequence pairs, where the internal class is used Map.Entry
to represent the order
Abstractmap Abstract class
AbstractMap
The Map
method in this paper provides a basic implementation, which reduces the Map
effort of implementing the interface.
For example:
If you want to implement an immutable (unmodifiable) map, then simply inherit AbstractMap
and then implement its entrySet
method, the set returned by this method does not support add and remove, At the same time, the set iterator (iterator) does not support the remove operation.
Conversely, if you want to implement a mutable (modifiable) map, first inherit AbstractMap
and then override AbstractMap
the Put method, and implement the entrySet
Remove method of the iterator that returns the set.
Design concept (concept) hash table (hash table)
HashMap
is a hash table based on the implementation of the map, hash table (also known as associative array) a common data structure, most modern languages are natively supported, the concept is relatively simple: key经过hash函数作用后得到一个槽(buckets或slots)的索引(index),槽中保存着我们想要获取的值
as shown in
Hash Table Demo
It is easy to think that some of the different keys may produce the same index after the same hash function, that is, there is a conflict, which is unavoidable.
So when using a hash table data structure to implement a specific class, you need:
- Design a good hash function, so as to reduce the conflict as much as possible
- The second is the need to resolve what happens after a conflict.
We will focus on HashMap
how to solve these two problems in the following sections.
Some features of HashMap
- Thread is non-secure and allows both key and value to be null,
HashTable
and conversely, for thread safety, both key and value do not allow null values.
- The order of internal elements is not guaranteed, and the position of the same element may change over time (resize case)
- The time complexity of the put and get Operations is O (1).
- The time complexity of traversing its set view point is proportional to its capacity (capacity, number of slots) and the size of the existing elements (number of entry), so if the traversal performance requirements are high, do not put the capactiy set too much or the balance factor (load factor, When the number of entry is greater than Capacity*loadfactor, the resize,reside will cause the key to be rehash) set too low.
- Since HashMap is thread-insecure, this means that if multiple threads are attempting to iterate over a HashMap collection at the same time (adding, deleting, entry, changing the value of entry only does not count structural changes), Then will be reported concurrentmodificationexception, professional terminology
fail-fast
, early error is necessary for multi-threaded program.
Map m = Collections.synchronizedMap(new HashMap(...));
In this way, a thread-safe map can be obtained.
SOURCE Analysis
First, starting from the constructor, we HashMap
follow the constraints of the set framework, providing a constructor with an empty argument and a constructor with a parameter with a parameter type of map. In addition to this, two constructors are provided to set HashMap
the capacity (capacity) and balance factor (loadfactor).
Public HashMap (int initialcapacity, float loadfactor) { if (initialcapacity < 0) throw new IllegalArgumentException ("Illegal initial capacity:" + initialcapacity); if (initialcapacity > maximum_capacity) initialcapacity = maximum_capacity; if (loadfactor <= 0 | | Float.isnan (loadfactor)) throw new IllegalArgumentException ("Illegal load factor:" + loadfactor); This.loadfactor = Loadfactor; threshold = initialcapacity; Init ();} Public HashMap (Int. initialcapacity) {This (initialcapacity, default_load_factor);} Public HashMap () {This (default_initial_capacity, default_load_factor);}
As you can see from the code, both the capacity and the balance factor have a default value, and the capacity has a maximum value
/** * The default initial capacity-must be a power of. */static final int default_initial_capacity = 1 << 4; aka 16/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with a Rguments. * Must be a power of <= 1<<30. */static final int maximum_capacity = 1 << 30;/** * The load factor used when none specified in constructor. */static final float default_load_factor = 0.75f;
As you can see, the default balance factor is 0.75, which is the best value to weigh the time complexity with the space complexity (the JDK is best), the high factor decreases the storage space but the time to find (lookup, including the put and get methods in HashMap) increases.
What's strange here is that the capacity must be 2 times the number of points (default is 16), which is why? To answer this question, we need to understand the design principle of the HashMap function in the system.
The design principle of hash function
/** * Retrieve Object hash code and applies a supplemental hash function to the * result hash, which defends against poo R quality Hash functions. This is * critical because HASHMAP uses power-of-two length hash tables, which * otherwise encounter collisions for HASHC Odes that does not differ * in lower bits. Note:null keys always maps to hash 0, thus index 0. */FINAL int hash (Object k) {int h = hashseed; if (0! = h && k instanceof String) {return Sun.misc.Hashing.stringHash32 (string) k); } h ^= K.hashcode (); This function ensures, that hashcodes, differ only to//constant multiples at each bit position has a bounded Number of collisions (approximately 8 at default load factor). H ^= (H >>>) ^ (h >>> 12); Return h ^ (H >>> 7) ^ (H >>> 4); }/** * Returns index for hash code h. */static int indexfor (int h, int length) {//Assert integer.bitcount (length) = = 1: "Length must be a Non-zero power of 2 "; Return H & (LENGTH-1); }
See so many operations, is not feel dizzy, or to make clear principle on the line, after all, bit operation speed is very fast, not because of bad understanding will not be used.
Online said this problem is also more, I here according to their own understanding, try to be easy to understand.
In the case of a hash table capacity (that is, buckets or slots size) of length, in order for each key to be mapped to a minimum of conflict [0,length)
(note is left closed right open interval) index, there are generally two ways:
- Let length be the prime number and then use
hashCode(key) mod length
the method to get the index
- Make length 2 a number of times, and then use
hashCode(key) & (length-1)
the method to get the index
Hashtable is using Method 1, HashMap
using Method 2.
Because this topic is hashmap, so about Method 1 Why use prime, I do not want to introduce too much here, you can see here.
Focus on the case of Method 2, Method 2 is actually better understood:
Because length is 2 times the number of points, so the length-1
corresponding bits are 1, and then in and hashCode(key)
do with the operation, you can get [0,length)
the index inside
But there is a problem here, if hashCode(key)
the value is greater than the length
bits, and hashCode(key)
the low change is not small, then there will be a lot of conflict, for example:
The hash value of an object in Java is a 32-bit integer, and the default size of HashMap is 16, so the hash value of two objects is as follows: 0xABAB0000
with, the latter of them are the 0xBABA0000
same, then the result should be the same as the 16 Xor, that is, the conflict is generated.
The main reason for the conflict is that the 16 limit is only low, and the high level is discarded, so we need an extra hash function instead of just a simple object hashCode
method.
Specifically, it's the HashMap hash
function.
First, there's a random hashseed to reduce the chance of a collision.
Then, if it's a string, it's used sun.misc.Hashing.stringHash32((String) k);
to get the index value.
Finally, by a series of unsigned right-shift operations, the high and low levels are different or manipulated to reduce the chance of conflict.
Shift right offset 20,12,7,4 How did it come about? Because the hash of the object in Java is 32 bits, so these numbers should be the high and low to do the difference or operation, as to how these several numbers are selected, it is not clear, the web search for a half-day there is no unified and convincing argument, we can refer to the following several links:
- http://stackoverflow.com/questions/7922019/openjdks-rehashing-mechanism/7922219#7922219
- http://stackoverflow.com/questions/9335169/understanding-strange-java-hash-function/9336103#9336103
- http://stackoverflow.com/questions/14453163/can-anybody-explain-how-java-design-hashmaps-hash-function/14479945#14479945
Hashmap.entry
The HashMap is the Hashmap.entry object, which inherits from the Map.entry, the more important is the constructor
Static Class Entry<k,v> implements Map.entry<k,v> { final K key; V value; Entry<k,v> Next; int hash; Entry (int h, K K, v V, entry<k,v> N) { value = V; Next = N; key = k; hash = h; } Setter, getter, equals, toString method omit public final int hashcode () {//hash value of key with hash value of value as entry return Objects.hashcode (GetKey ()) ^ Objects.hashcode (GetValue ()); /** * This method was invoked whenever the value of an entry was * overwritten by an invocation of put (K,V) for a Key k that's already * in the HashMap. * /void recordaccess (hashmap<k,v> m) { } /** * This method is invoked whenever the entry Is
* removed from the table. * /void Recordremoval (hashmap<k,v> m) { }}
As you can see, entry implements the function of a one-way linked list, with next
member variables to connect them.
After introducing the entry object, let's say a more important member variable
/** * The table, resized as necessary. Length must always be a power of. *///hashmap internally maintains an array of type Entry variable table, which is used to save the added Entry object transient entry<k,v>[] table = (entry<k,v>[]) empty_table;
You may be wondering, entry is not a one-way list, why do you need a table of the array type?
I turned over the previous algorithm book, in fact, this is a way to resolve the conflict: The chain Address method (open hash method), the effect is as follows:
Chain address method to deal with conflicting hash lists
Is the entry of the same index value, which exists in the form of a one-way list
Visualization of the chain address method
Find a good website on the web to visualize a variety of common algorithms, great. Instantly feel that foreign universities than the domestic strong do not know how many times.
The following link can imitate hash table using the chain address method to resolve the conflict, you can go to play
- Https://www.cs.usfca.edu/~galles/visualization/OpenHash.html
Get operation
The get operation is simpler than the put operation, so the get operation is introduced first
Public V get (Object key) { //Handle the case of key null separately if (key = = null) return Getfornullkey (); entry<k,v> Entry = Getentry (key); return NULL = = entry? Null:entry.getValue ();} Private V Getfornullkey () { if (size = = 0) { return null; } The key null Entry is used in table[0], but the key of the Entry in the table[0] conflict chain is not necessarily null //So you need to traverse the conflict chain to find out if key exists for (entry<k, V> e = table[0]; E! = null; E = e.next) { if (E.key = = null) return e.value; } return null;} Final entry<k,v> getentry (Object key) { if (size = = 0) { return null; } int hash = (key = = null)? 0:hash (key); First locate the index in the table position //And then traverse the conflict chain to find out if key exists for (entry<k,v> e = table[indexfor (hash, table.length)]; E! = null; E = e.next) { Object k; if (E.hash = = Hash && (k = e.key) = = Key | | (Key! = null && key.equals (k) ))) return e; } return null;}
Put operation (with update operation)
Because the put operation may require resize hashmap, the implementation is slightly more complex
private void inflatetable (int tosize) {//auxiliary function, used to populate HashMap to the specified capacity//Find a power of 2 >= tosize int CAPA City = RoundUpToPowerOf2 (tosize); Threshold is the threshold value of resize, more than HashMap will be resize, the content of entry will be rehash threshold = (int) math.min (capacity * Loadfactor, MAXIMUM _capacity + 1); Table = new Entry[capacity]; Inithashseedasneeded (capacity);} /** * Associates The specified value with the specified key in this map. * If The map previously contained a mapping for the key, the old * value is replaced. */public V put (K key, V value) {if (table = = empty_table) {inflatetable (threshold); } if (key = = null) return Putfornullkey (value); int hash = hash (key); int i = indexfor (hash, table.length); The loop here is the key//when the new key corresponds to the index i, corresponding to table[i] already has the value, enter the loop body for (entry<k,v> e = table[i]; E! = null; e = e.next) { Object K; Determine if there is a key for this insertion, if there is a value replaced with this time before oldvalue, equivalent to the update operation//and return the previous oldvalue if (E.hash = = Hash &&Amp ((k = e.key) = = Key | | key.equals (k))) {V oldValue = E.value; E.value = value; E.recordaccess (this); return oldValue; }}//If the new key does not exist in HashMap before, Modcount plus 1, indicating that the structure changed the modcount++; AddEntry (hash, key, value, I); return null;} void AddEntry (int hash, K key, V value, int bucketindex) {//If an element is added after the HashMap size exceeds the threshold, resize if (size >= thre shold) && (null! = Table[bucketindex])) {//increased amplitude is 1 time times previous resize (2 * table.length); hash = (Null! = key)? Hash (key): 0; Bucketindex = Indexfor (hash, table.length); } createentry (hash, key, value, Bucketindex);} void Createentry (int hash, K key, V value, int bucketindex) {//First get the conflict chain entries at that index, possibly null, NOT NULL entry<k,v> e = Table[bucketindex]; Then add the new entry to the beginning of the conflict chain, that is, after the insertion is in front (really did not see the first time)//need to note that Table[bucketindex] itself does not store node information,//It is equivalent to a unidirectional list of the head pointer, The data is stored in the conflict chain. Table[bucketindex] = new entry<> (hash, key, value,e); size++;} below to see how HashMap is resize, the truth will reveal the void resize (int newcapacity) {entry[] oldtable = table; int oldcapacity = Oldtable.length; If the maximum capacity has been reached, then return directly if (oldcapacity = = maximum_capacity) {threshold = Integer.max_value; Return } entry[] newtable = new Entry[newcapacity]; The return value of inithashseedasneeded (newcapacity) determines whether the hash value of the entry needs to be recalculated transfer (newtable, inithashseedasneeded ( newcapacity)); Table = newtable; threshold = (int) math.min (newcapacity * loadfactor, maximum_capacity + 1);} /** * Transfers all entries from the current table to newtable. */void transfer (entry[] newtable, Boolean rehash) {int newcapacity = newtable.length; Iterates through the current table, adding the elements inside to the new newtable for (entry<k,v> e:table) {while (null! = e) {entry<k,v& Gt Next = E.next; if (rehash) {E.hash = NULL = = E.key? 0:hash (E.key); } int i = Indexfor (E.hash, newcapacity); E.next = nEwtable[i]; The last two sentences are used with the same skill as put pass//will be inserted after the instead in front newtable[i] = e; e = next; }}}/** * Initialize the hashing mask value. We defer initialization until we * really need it. */final boolean inithashseedasneeded (int capacity) {Boolean currentalthashing = hashseed! = 0; Boolean usealthashing = sun.misc.VM.isBooted () && (capacity >= holder.alternative_hashing_threshold) ; This shows that, when the hashseed is not 0 or satisfies Usealthash, it will re-calculate the hash value of entry//As to the role of usealthashing can refer to the following link//http://stackoverflow.com/ Questions/29918624/what-is-the-use-of-holder-class-in-hashmap Boolean switching = currentalthashing ^ useAltHashing; if (switching) {hashseed = usealthashing? Sun.misc.Hashing.randomHashSeed (This): 0; } return switching;}
Remove operation
public V Remove (Object key) {entry<k,v> E = Removeentryforkey (key); You can see that the deleted key, if present, returns its corresponding value return (E = = null? null:e.value);} Final entry<k,v> Removeentryforkey (Object key) {if (size = = 0) {return null; } int hash = (key = = null)? 0:hash (key); int i = indexfor (hash, table.length); Here used two Entry object, equivalent to two pointers, in order to prevent the conflict chain break the situation//here the idea is the general one-way list of the deletion idea entry<k,v> prev = table[i]; entry<k,v> e = prev; When there is a conflict chain in Table[i], start traversing inside the element while (E! = null) {entry<k,v> next = E.next; Object K; if (E.hash = = Hash && (k = e.key) = = Key | | (Key! = null && key.equals (k)))) {modcount++; size--; if (prev = = e)//When the conflict chain has only one entry table[i] = next; else Prev.next = next; E.recordremoval (this); return e; } prev = e; e = next; } return e;}
Up to now, HashMap's additions and deletions have been introduced.
In general, it is considered that the four operation time complexity of HashMap is O (1), because its hash function is good, and the probability of conflict occurrence is small.
Serialization of HashMap
Introduction here, basically is to hashmap some of the core points, but there is a more serious problem: Save Entry table array is transient, that is, when serializing, does not include the member, this is why?
Transient entry<k,v>[] table = (entry<k,v>[]) empty_table;
To answer this question, we need to make clear the following facts:
- The Object.hashcode method returns a different hash value for two instances of a class
We can imagine the following scenario:
We get the hash value and index of object A in machine A, then insert it into HashMap, and then serialize the HashMap, re-calculate the hash value and index of the object on machine B, which is not the same as machine a, so when we get object A on machine b it gets the wrong result.
So, when serializing a HashMap object, saving the Entry table does not need to be serialized in, because it is wrong on the other machine.
For this reason, HashMap reproduced writeObject
with the readObject
method
private void WriteObject (Java.io.ObjectOutputStream s) throws ioexception{//Write out the threshold, Loadfactor, a nd any hidden stuff s.defaultwriteobject (); Write out number of buckets if (table==empty_table) {S.writeint (RoundUpToPowerOf2 (threshold)); } else {s.writeint (table.length); }//Write out size (number of Mappings) s.writeint (size); Write out keys and values (alternating) if (Size > 0) {for (map.entry<k,v> e:entryset0 ()) { S.writeobject (E.getkey ()); S.writeobject (E.getvalue ()); }}}private static final long serialversionuid = 362498820763181265l;private void ReadObject (Java.io.ObjectInputStream s) throws IOException, classnotfoundexception{//Read in the threshold (ignored), loadfactor, and any hidden stuff S.defaultreadobject (); if (loadfactor <= 0 | | Float.isnan (Loadfactor)) {throw new invalidobjectexception ("Illegal load factor:" + Loadfactor); }//Set other fields that need values Table = (entry<k,v>[]) empty_table; Read in number of buckets s.readint (); Ignored. Read number of mappings int mappings = S.readint (); if (Mappings < 0) throw new invalidobjectexception ("Illegal mappings count:" + mappings); Capacity chosen by number of mappings and desired load (if >= 0.25) int capacity = (int) math.min ( Mappings * Math.min (1/loadfactor, 4.0f),//We have limits ... hashmap.maximum_capacity); Allocate the bucket array; if (Mappings > 0) {inflatetable (capacity); } else {threshold = capacity; } init (); Give subclass a chance to does its thing. Read the keys and values, and put the mappings in the HASHMAP for (int i = 0; i < mappings; i++) {K key = (K) S.readobject (); V value = (V) s.readobject (); Putforcreate (key, value); }}private void Putforcreate (K key, V value) {int hash = NULL = = key? 0:hash (key); int i = indexfor (hash, table.length); /** * Look for preexisting entry for key. This would never happen for * clone or deserialize. It would be happen for construction if the * input map was a sorted map whose ordering is inconsistent w/equals. */for (entry<k,v> e = table[i]; E! = null; e = e.next) {Object K; if (E.hash = = Hash && (k = e.key) = = Key | | (Key! = null && key.equals (k)))) {e.value = value; Return }} createentry (hash, key, value, I);}
Simply put, when serializing, the key and value for entry are serialized separately, and when deserialized, they are processed separately.
Summarize
After summing up the HashMap, found here some core things, such as hash table conflict resolution, are arithmetic class to learn, but because "age", has forgotten almost, I think forget
- On the one hand, because time is not enough
- On the other hand because I don't understand well
Usually more to think about, so in the face of some performance problems and troubleshooting.
Another point is that we are in the analysis of certain specific classes or methods, do not spend too much time on some of the minutiae of the boundary conditions, this is not worth the candle, not to say that the boundary conditions are not important, the procedural bug is often the boundary condition is not considered well-rounded cause.
It's just that we can analyze these boundary conditions after we understand the general idea of the class or method.
If the analysis at the outset, it is the two monks--can not touch the mind, with the deepening of its working principle, it is possible to understand these boundary conditions of the scene.
Java HashMap Code parsing for the Java collection framework