Before the list, said ArrayList, LinkedList, and finally talked about the copyonwritearraylist, in terms of the former two, reflecting the thinking:
(1) ArrayList is implemented as an array, sequential insertion, lookup fast, insert, delete slower
(2) LinkedList is implemented as a chain list, sequential insertion, slow lookup, easy insertion and deletion
So is there a data structure that can combine the advantages of both of these? Yes, the answer is HashMap.
HashMap is a very common, convenient and useful collection, is a key-value pair (k-v) Form of the storage structure, the following will still be illustrated in the way of interpretation of hashmap implementation principle,
Four points of concern on the HashMap answer
| Focus Point |
Conclusion |
| HashMap whether NULL is allowed |
Both key and value are allowed to be empty |
| HashMap whether duplicate data is allowed |
Key repetition overrides, value allows repetition |
| Whether the HashMap is orderly |
disorder, especially when this disorder refers to the traversal of HashMap, the order of the elements obtained is basically not the order of put |
| HashMap is thread safe |
Non-thread safe |
Add data
First, take a look at the HashMap storage Unit entry:
Static Class Entry<k,v> implements Map.entry<k,v> { final K key; V value; Entry<k,v> Next; int hash; ...}
Before a write LinkedList article, which wrote to LinkedList is a doubly linked list, from HashMap entry See, entry is composed of a one- way list, because there is only entry successor entry, Without entry's precursor entry. The graph indicates that it should be such a data structure:
Next, suppose I have this piece of code:
1 public static void main (string[] args) 2 {3 map<string, string> Map = new hashmap<string, string> (); 4
map.put ("111", "111"), 5 map.put ("222", "222"); 6}
Take a look at what you've done. Starting with line 3rd, new has a hashmap out:
1 public HashMap () {2 this.loadfactor = default_load_factor;3 threshold = (int) (Default_initial_capacity * Default_load_factor); 4 table = new entry[default_initial_capacity];5 init (); 6}
Notice that the 5th line of Init () is an empty method, which is used by HashMap subclasses such as Linkedhashmap constructs. Default_initial_capacity is 16, that is, HashMap constructs a entry array of size 16 at new, and all the data in the entry takes the default values, shown as:
See that new has a entry array of size 16. Then on line 4th, put a string with a key and a value of 111, and look at what the bottom layer does when put:
1 public V put (K key, V value) {2 if (key = = null) 3 return Putfornullkey (value); 4 int hash = hash (KEY.HASHC Ode ()); 5 int i = indexfor (hash, table.length), 6 for (entry<k,v> e = table[i]; E! = null; e = e.next) {7 objec T k; 8 if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {9 V oldValue = e.value;10 e.value = V Alue;11 e.recordaccess (this); return oldvalue;13 }14 }15 AddEntry (hash, key, value, I); return null;19}
1 static int hash (int h) {2 //This function ensures, hashcodes that differ-only by3 //constant-multiples at Each bit position has a bounded4 //number of collisions (approximately 8 at default load factor). 5 H ^= (H >& gt;> ^ (h >>> N), 6 return H ^ (H >>> 7) ^ (H >>> 4); 7}
1 static int indexfor (int h, int length) {2 return H & (LENGTH-1); 3}
Take a look at a few steps of the Put method:
1, line 2nd ~ 3rd line is HashMap allow the key value is empty, the empty key will be placed by default in the No. 0 bit of the array position
2, the 4th line to get the key value of Hashcode, because Hashcode is the method of object, so each object has a hashcode, this hashcode do a hash calculation. According to the JDK source code comments, the role of this hash is based on a given hashcode to do an upset operation, to prevent some bad hash algorithm produced bad hash value, as to why to prevent the bad hash value, HashMap add element of the end will talk about
3, the 5th line based on the recalculation of the hashcode, the size of the entry array to get a entry array position. See here the use of &, shift speed up a little bit of code running efficiency. In addition, the correctness of this modulo operation depends on the length must be 2 n power, this familiar binary friend must understand, so note HashMap constructor, if you specify HashMap initial array size initialcapacity, If Initialcapacity is not an n-power of 2, HashMap calculates the value of the n-th power of the smallest 2 greater than initialcapacity, as the initialization size of the entry array . For ease of interpretation, we assume that both the string 111 and the string 222 figure out that I are all 1
4, line 6th ~ 14th Row First to determine whether the original data structure exists the same key value, the existence is overwritten and returned, do not execute the following code. Note Recordaccess This method, it is also hashmap subclass such as Linkedhashmap used, HashMap this method is empty. In addition, note that if the key is the same, whether it is the same as the hashcode, hashcode the same again to determine whether the equals is true, which greatly increases the efficiency of HashMap, Hashcode unfamiliar friends can read my article about the role of hashcode
5, the 16th line of modecount++ is used for the fail-fast mechanism, each time you modify the HASHMAP data structure, it will increment the value once
And then there's the key AddEntry method:
void AddEntry (int hash, K key, V value, int bucketindex) {entry<k,v> e = Table[bucketindex]; Table[bucketindex] = new entry<k,v> (hash, key, value, e); if (size++ >= threshold) Resize (2 * table.length);}
Entry (int h, K K, v V, entry<k,v> N) { value = V; Next = N; key = k; hash = h;}
Assuming that the new entry address is 0x00000001, then the put ("111", "111") is represented by the figure:
Each new entry is located in the table[1], in addition, the hash inside is rehash after the hash instead of the most primitive hash of the key. See table[1] stored on the 111---->111 This key value pair, it holds the original table[1] reference address, so can be addressed to the original table[1], this is a one-way list. Take a look at what the put ("222", "222") has done, and a picture can be understood:
The new entry occupies the table[1] position again, and holds the original table[1], that is, the 111---->111 this key value pair.
At this point, the process of hashmap the put data is clearly present. However, there is one more problem, that is hashmap how to expand, and then look at the AddEntry method:
1 void addentry (int hash, K key, V value, int bucketindex) {2 entry<k,v> e = Table[bucketindex]; 3 Table[bucket Index] = new entry<k,v> (hash, key, value, E); 4 if (size++ >= threshold) 5 Resize (2 * table.length); 6}
See line 4th ~ 5th row, that is, after each place entry will determine whether the need for expansion. This does not say that the expansion is because the hashmap expansion in the incorrect use of the scene will lead to a dead loop , this is a topic worth exploring, but also my work has actually encountered a problem, so the next article will explain in detail why incorrect use of HashMap will lead to a dead loop.
Delete data
There is a section of code:
1 public static void main (string[] args) 2 {3 map<string, string> Map = new hashmap<string, string> (); 4
map.put ("111", "111"), 5 map.put ("222", "222"), 6 map.remove ("111"); 7}
Line 6th Delete the element, see what did when the element was deleted, line 4th ~ Line 5th added two key value pairs to follow the above diagram, HashMap delete the specified key value pair source code is:
1 public V Remove (Object key) {2 entry<k,v> e = Removeentryforkey (key); 3 return (E = = null? null:e.valu e); 4}
1 Final entry<k,v> Removeentryforkey (Object key) {2 int hash = (key = = null)? 0:hash (Key.hashcode ()); 3 int i = indexfor (hash, table.length); 4 entry<k,v> prev = table[i]; 5 entry<k,v> e = prev; 6 7 while (E! = null) {8 entry<k, V> next = E.next; 9 Object k;10 if (E.hash = = Hash &&11 (k = e.key) = = Key | | (Key! = null && key.equals (k)))) { modcount++;13 size--;14 if (prev = = e) table[i] = next;16 else17 prev.next = Next ; E.recordremoval (this), return e;20 }21 prev = e;22 e = next;23 }24 Return e;26}
Take a few steps when analyzing the remove element:
1, according to the hash of key to find the key value to be deleted in which position in the table
2. Record a prev that represents the previous position of the entry to be deleted entry,e can be considered the current position
3, from Table[i] began to traverse the list, if found a matching entry, to make a judgment, this entry is not Table[i]:
(1) is the words, that is, line 14th ~ 15th line, Table[i] is directly the next node of Table[i], the back of all do not need to move
(2) No, that is, line 16th ~ 17th Line, E of the previous entry that is Prev,prev next point to E, which is the next node, that is, so that the e represented by the entry was kicked out, E, and entry connected up
Remove ("111") is represented by a figure:
The entire process only needs to modify the next value of a node, it is very convenient.
modifying data
Modify the element is also put, look at the source code:
1 public V put (K key, V value) {2 if (key = = null) 3 return Putfornullkey (value); 4 int hash = hash (KEY.HASHC Ode ()); 5 int i = indexfor (hash, table.length), 6 for (entry<k,v> e = table[i]; E! = null; e = e.next) {7 objec T k; 8 if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k))) {9 V oldValue = e.value;10 e.value = V Alue;11 e.recordaccess (this), return oldvalue;13 }14 }15 modcount++;16 addentry ( Hash, key, value, I); + return null;18}
This is actually mentioned earlier, line 6th ~ 14th Line is to modify the logic of the element, if a key already exists in the data structure, then will overwrite the original value, that is, the code of line 10th.
Inserting data
The so-called "insertion element", in my understanding, must be based on the data structure is ordered under the premise. Like ArrayList, LinkedList, and so far point is the database, one line is orderly.
And HashMap, whose order is based on Hashcode,hashcode is a very random number, so the entry in HashMap is completely random. HashMap does not maintain the order of the inserted elements like Linkedhashmap, so it makes no sense to talk about inserting elements into the HASHMAP data structure.
Therefore, HashMap does not have the concept of inserting.
Again on the importance of Hashcode
As mentioned earlier, HashMap key in the hashcode to do a rehash, to prevent some bad hash algorithm generated bad hashcode, then why to prevent bad hashcode?
bad hashcode means a hash conflict, that is, multiple different keys may be the same hashcode, bad hash algorithm means that the probability of hash collisions increases, which means that hashmap performance will fall, Performance in two ways:
1, there are 10 keys, may be 6 key hashcode are the same, the other four key is located in the entry evenly distributed in the position of the table, and a location is connected to 6 entry. This loses the meaning of HashMap, hashmap the premise that the data is structurally high-performance,entry evenly distributed across the table position , but is now 1 1 1 1 6 distribution. Therefore, we require hashcode to have a very strong randomness , so as far as possible to ensure the randomness of the distribution of entry, improve the efficiency of hashmap.
2. HashMap the code when traversing a linked list at a table location:
if (E.hash = = Hash && (k = e.key) = = Key | | key.equals (k)))
See, because the use of the "&&" operator, so the first comparison hashcode,hashcode are not the same directly pass, will not be compared to equals. Hashcode because it is an int value, it is relatively fast, and the Equals method tends to compare a series of content, which is slower. the probability of hash conflict is large, which means that the number of equals is bound to increase , which inevitably reduces the efficiency of hashmap.
Why is the HashMap table transient?
A place of great detail:
Transient entry[] table;
See table with the transient decoration, that is, the contents of the table will not be serialized, do not know if you have thought of the reasons for this?
In my opinion, it is very necessary to write like this. Because HashMap is based on Hashcode, Hashcode as the method of object, is native:
public native int hashcode ();
This means:hashcode is related to the underlying implementation, and different virtual machines may have different hashcode algorithms . Further to understand, it is possible that the same key is hashcode=1 on virtual machine A, hashcode=2 on virtual machine B, hashcode=3 on virtual machine C.
This is a problem, Java since its inception, to cross-platform as the biggest selling point, well, if the table is not modified by transient, can be used on virtual machine a program to virtual machine B can use the program is not used, lost the cross-platform, because:
1, key on the virtual machine a hashcode=100, even on the table[4]
2, key on the virtual machine B hashcode=101, so, go to table[5] Find key, obviously can not find
The whole code is out of the question. Therefore, to avoid this, Java takes the method of rewriting its own serialization table, and at WriteObject chooses to append the key and value to the last side of the serialized file:
private void WriteObject (Java.io.ObjectOutputStream s) throws Ioexception{iterator<map.entry<k,v>> i = (Size > 0)? entrySet0 (). Iterator (): null;//Write out the threshold, loadfactor, and any hidden stuffs.defaul Twriteobject ();//write out number of bucketss.writeint (table.length);//write out size (number of Mappings) S.writeint (si Ze); Write out keys and values (alternating) if (i! = null) {while (I.hasnext ()) { map.entry<k,v> e = I.next ();
s.writeobject (E.getkey ()); S.writeobject (E.getvalue ());}}}
And in ReadObject, refactor the HASHMAP data structure:
private void ReadObject (Java.io.ObjectInputStream s) throws IOException, classnotfoundexception{//Read in the Threshold, Loadfactor, and any hidden stuffs.defaultreadobject ();//Read in number of buckets and allocate the bucket ARRA Y;int numbuckets = S.readint (); table = new Entry[numbuckets]; Init (); Give subclass a chance to do it thing.//read in size (number of Mappings) int size = S.readint ();//Read the keys and Values, and put the mappings in the hashmapfor (int i=0; i<size; i++) { k key = (k) s.readobject (); V value = (v) s.readobject (); Putforcreate (key, value);}}
A troublesome way, but it guarantees a cross-platform nature.
This example also tells us that although the virtual machine used is a hotspot in most cases, it is not a good idea to have a cross-platform mind regardless of the other virtual machines.
The difference between HashMap and Hashtable
HashMap and Hashtable are a set of similar key-value pairs, and their differences are also one of the frequently asked questions, and here's a brief summary of the differences between HashMap and Hashtable:
1, Hashtable is thread-safe, hashtable all external methods are used synchronized, that is, synchronization, and HashMap is thread non-security
2, Hashtable does not allow null value, empty value will result in null pointer exception, and HASHMAP does not matter, there is no limitation of this
3, the above two shortcomings is the most important difference, another difference is irrelevant, I just mention, is two rehash algorithm different, Hashtable is:
private int hash (Object k) { //Hashseed'll is zero if alternative hashing is disabled. return hashseed ^ K.hashcode ();}
This hashseed is produced using the Randomhashseed method of the Sun.misc.Hashing class. HashMap's rehash algorithm has been seen above, namely:
static int hash (int h) { //This function ensures, hashcodes that differ-only to //constant multiples at each Bit position has a bounded //number of collisions (approximately 8 at default load factor). H ^= (H >>>) ^ (h >>> N); Return h ^ (H >>> 7) ^ (H >>> 4);}
Set (iv) HASHMAP