Hash is one of the most common data structures. In particular, Ruby JS and other dynamic languages support it at the syntax level. In Java, there is a little bit around (every time you use it, you will feel awkward, I do not know if you have such a feeling ).
This chapter is really tangled, because I have read it before and can't understand it now. Fortunately, all the parts that you cannot understand are star numbers.
A hash is such a thing: it can store things with the key as the index, and then it can be found out when necessary, usually fast :)
The above code is certainly not tested (PS: Please forget java. util. hashtable)
To get fast enough, it is best to calculate the storage location directly with the data key, and then you can find it all at once.
Run the test and the array is out of bounds, because the hash method has not been implemented yet.
Hash is used to hash keywords and other possible columns to the table.
private int hash(int hashCode) { int capacity = table.length; return hashCode % capacity; }
Capacity should not be the power of 2Otherwise, if the value is hashcode's low K bit, the high bit will be wasted and may cause many collisions.
You can select a prime number whose integer power is not close.
Run the test now :)
But wait, sometimes we need to do this:
Java codePublic void Test2 () {hashtable ht = new hashtable (); object O1 = new object (); ht. put ("O1", O1); object anothero1 = new object (); ht. put ("O1", anothero1); // update assertequals (anothero1, ht. get ("O1 "));}
We need to refactor the code and save the key.
First, add a structure to save the key and Value
Java codePublic class hashtable {public static class entry {private string key; private object value; Public entry (string key, object Value) {This. key = key; this. value = value;} Public String getkey () {return key;} public object getvalue () {return value;} private entry [] Table = new entry [1000]; // change the original object [] to entry []
Refactoring put
Java codePublic void put (string key, object Value) {int hash = hash (key. hashcode (); If (Table [hash] = NULL | table [hash]. getkey (). equals (key) {table [hash] = new entry (Key, value);} else {Throw new runtimeexception ("crashed. What should I do? ");}}
Refactor get
Java code public Object get(String key) { int hash = hash(key.hashCode()); Entry entry = table[hash]; return entry == null ? null : entry.getValue(); }
As you can see, the test passed again :)
Check againMultiplication hash
Java codePrivate int Hash (INT hashcode) {int capacity = table. length; Double A = 0.6180334; // omnipotent golden split return (INT) (hashcode * A) % 1) * capacity );}
Use the constant (A) to multiply the hashcode to take the decimal number and then the capacity.
Knuth thinks that the prime number is an ideal value.(Root number 5-1)/2 ~ 0.618) Investors must know
Advantages of multiplication hash:
Capacity has no special requirements. Generally, it is an integer power of 2.
In this way, we can use shift instead of multiplication.
Then the golden split number A can be expressed as 2654435769/(2 ^ 32)
It can be simplified ://
Assume that the computer word length is W bits. Here w is 32 bits, hashcode * 2654435769, which is a value of 2 W bits. The first 32 bits are high, and the last 32 bits are low,(1 <32)-1) is 32 1, with hashcode * 2654435769 and (&) can retain the low 32-bit value, then, the first p bit in the lower 32 bits is the hash value, that is, the right shift of the lower 32 bits (32-p)
(Hashcode * 2654435769) & (1 <32)-1) // retain only 32 bits> (32-P)
Try the Repurchase Code:
First, the array space is 2 ^ P
Java code private int p = 10; private Entry[] table = new Entry[1 << p];
Then:
Java code private int hash(int hashCode) { long k = 2654435769L; return (int)(((k * hashCode) & ((1L << 32) - 1)) >> (32 - p)); }
Test or pass.
Next, let's add a little more elements to break it down.
Java code @Test public void test3() { HashTable ht = new HashTable(); for (int i = 0; i < 1000; i++) { Object o = new Object(); ht.put("key" + i, o); assertEquals(o, ht.get("key" + i)); System.out.println("Ok: " + i); } }
Run the test. If the test fails, you can see that the console outputs only to 108.
Runtimeexception. What should I do if a crash occurs?
AvailableLink Method,Open addressing MethodDone
First comeLink Method
First, refactor the entry and concatenate yourself.
Java code public static class Entry { private String key; private Object value; private Entry next; public Entry(String key, Object value) { this(key, value, null); } public Entry(String key, Object value, Entry next) { this.key = key; this.value = value; this.next = next; } public String getKey() { return key; } public Object getValue() { return value; } public void setValue(Object value) { this.value = value; } public Entry getNext() { return next; } }
At the same time, a setvalue method is added to make it easier to "update elements" in the linked list.
Then refactor put
Java codePublic void put (string key, object Value) {int hash = hash (key. hashcode (); entry = table [hash]; If (Entry = NULL) {// The location has not been used, directly use table [hash] = new entry (Key, value); return;} For (Entry o = entry; o! = NULL; O = O. getnext () {If (O. getkey (). equals (key) {// check whether the key node exists. If yes, update it o. setvalue (value); Return ;}} table [hash] = new entry (Key, value, entry); // string here}
As you can see, the test runs normally :)
However, as the number of elements in the hash list increases, the collision rate increases. It is best to automatically expand the capacity when the number of elements reaches a certain value, so as to ensure its excellent search performance.
But let's take a look at the current hash. What is the collision probability when test3 is run.
Therefore, we reconstruct the number of times when a collision occurs.
Java codePrivate int size = 0; // number of elements in the statistical table private int collidecount = 0; // count the number of collisions public int getsize () {return size;} public float getcolliderate () {return size> 0? (Float) collidecount)/size: 0 ;}
Java codePublic void put (string key, object Value) {int hash = hash (key. hashcode (); entry = table [hash]; If (Entry = NULL) {table [hash] = new entry (Key, value); size ++; // return here;} collidecount ++; // here for (Entry o = entry; o! = NULL; O = O. getnext () {If (O. getkey (). equals (key) {o. setvalue (value); Return ;}} table [hash] = new entry (Key, value, entry); size ++; // and this}
Test:
Java code @Test public void test4() { HashTable ht = new HashTable(); for (int i = 0; i < 1000; i++) { ht.put("key" + i, new Object()); } System.out.println(ht.getCollideRate()); }
Output: 0.309
The total capacity is 1024, with 1000 elements, of which 309 are collision. The accident is serious.
Next we reconstruct the hashtable class so that it can expand the capacity every time it reaches 0.75 of the capacity (load factor :)
Java code private int p = 4; private Entry[] table = new Entry[1 << p]; private float loadFactor = 0.75f;
First, the initial capacity is 16 (1 <4), and then the load factor is 0.75.
Java code public void put(String key, Object value) { if (table.length * loadFactor < size) { resize(); }
Check Before put, and resize it if necessary.
Java code private void resize() { Entry[] old = table; p += 1; table = new Entry[1 << p]; size = 0; collideCount = 0; for (int i = 0; i < old.length; i++) { Entry entry = old[i]; while (entry != null) { put(entry.getKey(), entry.getValue()); entry = entry.getNext(); } } }
Write a test:
Java code @Test public void test5() { HashTable ht = new HashTable(); for (int i = 0; i < 1000; i++) { Object o = new Object(); ht.put("key" + i, o); assertEquals(o, ht.get("key" + i)); } System.out.println(ht.getSize()); assertTrue(ht.getSize() == 1000); System.out.println(ht.getCollideRate()); }
At this time, it is also added to 1000, and loadfactor is 0.08 at this time.
The initial size of our hash is 16 and is added to 1000. It takes several times to resize. The resize overhead is relatively large.
We can refactor the code,Specify the capacity in the constructor to avoid unnecessary resize overhead..
But this is not done here, because it is only to illustrate the algorithm,When java. util. hashmap is used.
Solve the collision andOpen addressing Method
It is also gray and easy to drop. Let's add two methods, put2, and get2 to implement it.
Easy to useLinear Exploration
Java codePublic void put2 (string key, object Value) {If (table. length * loadfactor <size) {resize ();} int hash = hash (key. hashcode (); entry = table [hash]; int nexthash = hash; while (entry! = NULL) {If (entry. getkey (). equals (key) {entry. setvalue (value); return;} nexthash = (nexthash + 1) % table. length; // check the next position entry = table [nexthash];} table [nexthash] = new entry (Key, value); size ++; If (hash! = Nexthash) {collidecount ++ ;}}
Java code public Object get2(String key) { int hash = hash(key.hashCode()); Entry entry = table[hash]; while (entry != null) { if (entry.getKey().equals(key)) { return entry.getValue(); } hash = (hash + 1) % table.length; entry = table[hash]; } return null; }
Similarly, write a test
Java code @Test public void test6() { HashTable ht = new HashTable(); for (int i = 0; i < 1000; i++) { Object o = new Object(); ht.put2("key" + i, o); assertEquals(o, ht.get2("key" + i)); } System.out.println(ht.getSize()); assertTrue(ht.getSize() == 1000); System.out.println(ht.getCollideRate()); }
Linear profiling is easy to implement, but it is easy to cause the problem of "heap together". The book is called:One cluster
AvailableSecondary Exploration, OrDouble hash,It is better to avoid this phenomenon.
//----------
Let's take a look at the implementation of Java. util. hashmap to better disband the list.
First look at put:
Java codePublic v put (K key, V value) {If (Key = NULL) // null can also be key return putfornullkey (value); int hash = hash (key. hashcode (); // int I = indexfor (hash, table. length); // For (Entry <K, V> E = table [I]; e! = NULL; E = E. next) {object K; If (E. hash = hash & (k = E. key) = Key | key. equals (k) {v oldvalue = E. value; E. value = value; E. recordaccess (this); Return oldvalue;} modcount ++; addentry (hash, key, value, I); // return NULL where you are concerned ;}
In the code, hash and indexfor addentry are our concerns.
In addition:Hashmap allows null key
There is an if statement:
Java code
- If (E. Hash = hash & (k = E. Key) = Key | key. Equals (k ))){
First check whether the hash value is equal, and then judge equals
This also gives us the principle of rewriting equals and hash:If you overwrite equals, you must overwrite the hashcode. If the two objects equals, The hashcode must be equal. Otherwise, it will not work correctly in containers such as hashmap.See objective Java.
Let's take a look at hash and indexfor (I added the Chinese annotation)
Java code /** * Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * <strong>because HashMap uses power-of-two length hash tables</strong>, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); }
Hash generates better hash values based on the original hashcode, because the table capacity is exactly the integer power of 2, so this must be done, otherwise the high hash code will be wasted (during modulo operation) --- see the aboveDivision hash
Indexfor: equal to H % length,
Therefore, hashmap usesImproved division hash
Let's take a look at addentry.
Java code void addEntry(int hash, K key, V value, int bucketIndex) { Entry<K, V> e = table[bucketIndex]; table[bucketIndex] = new Entry<K, V>(hash, key, value, e); if (size++ >= threshold) resize(2 * table.length); }
Table also doubles