Introduction to algorithms-discrete list

Source: Internet
Author: User

This article is reproduced, transferred from: http://www.iteye.com/topic/570646

Hash is one of the most common data structures. In particular, Ruby JS and other dynamic languages support it at the syntax level. In Java, there is a little bit around (every time you use it, you will feel awkward, I do not know if you have such a feeling ).

This chapter is really tangled, because I have read it before and can't understand it now. Fortunately, all the parts that you cannot understand are star numbers.

A hash is such a thing: it can store things with the key as the index, and then it can be found out when necessary, usually fast :)

Java code
    @Test      public void test() {          HashTable ht = new HashTable();                    Object o1 = new Object();          ht.put("o1", o1);                    Object o2 = new Object();          ht.put("o2", o2);                    assertEquals(o1, ht.get("o1"));          assertEquals(o2, ht.get("o2"));      }  

The get method must beConstant time. O (1) <--- complexity

Ignore this, but at least hashtable looks like this:

Java code
    public class HashTable {                    public void put(String key, Object value) {              //...          }                    public Object get(String key) {              //...              return null;          }      }  

The above code is certainly not tested (PS: Please forget java. util. hashtable)

To get fast enough, it is best to calculate the storage location directly with the data key, and then you can find it all at once.

Similar to this:

Public class hashtable {private object [] Table = new object [1000]; Public void put (string key, object Value) {int hash = hash (key. hashcode (); If (Table [hash] = NULL) {table [hash] = value;} else {Throw new runtimeexception ("crashed, what should I do? ") ;}} Public object get (string key) {int hash = hash (key. hashcode (); Return table [hash];} private int Hash (INT hashcode) {return-1; // a number [0, table. length-1]}

Run the test and the array is out of bounds, because the hash method has not been implemented yet.

Hash is used to hash keywords and other possible columns to the table.

YesDivision hash,Multiplication hashAnd so on.

Try it firstDivision hash:

 

Java code

 
 private int hash(int hashCode) {          int capacity = table.length;          return hashCode % capacity;      }  

Capacity should not be the power of 2Otherwise, if the value is hashcode's low K bit, the high bit will be wasted and may cause many collisions.

You can select a prime number whose integer power is not close.

Run the test now :)

But wait, sometimes we need to do this:

Java code
Public void Test2 () {hashtable ht = new hashtable (); object O1 = new object (); ht. put ("O1", O1); object anothero1 = new object (); ht. put ("O1", anothero1); // update assertequals (anothero1, ht. get ("O1 "));}

We need to refactor the code and save the key.

First, add a structure to save the key and Value

Java code
Public class hashtable {public static class entry {private string key; private object value; Public entry (string key, object Value) {This. key = key; this. value = value;} Public String getkey () {return key;} public object getvalue () {return value;} private entry [] Table = new entry [1000]; // change the original object [] to entry []

Refactoring put

Java code
Public void put (string key, object Value) {int hash = hash (key. hashcode (); If (Table [hash] = NULL | table [hash]. getkey (). equals (key) {table [hash] = new entry (Key, value);} else {Throw new runtimeexception ("crashed. What should I do? ");}}

Refactor get

Java code
    public Object get(String key) {          int hash = hash(key.hashCode());          Entry entry = table[hash];          return entry == null ? null : entry.getValue();       }  

As you can see, the test passed again :)

Check againMultiplication hash

Java code
Private int Hash (INT hashcode) {int capacity = table. length; Double A = 0.6180334; // omnipotent golden split return (INT) (hashcode * A) % 1) * capacity );}

Use the constant (A) to multiply the hashcode to take the decimal number and then the capacity.

Knuth thinks that the prime number is an ideal value.(Root number 5-1)/2 ~ 0.618) Investors must know

Advantages of multiplication hash:

Capacity has no special requirements. Generally, it is an integer power of 2.

In this way, we can use shift instead of multiplication.

Then the golden split number A can be expressed as 2654435769/(2 ^ 32)

It can be simplified ://

Assume that the computer word length is W bits. Here w is 32 bits, hashcode * 2654435769, which is a value of 2 W bits. The first 32 bits are high, and the last 32 bits are low,
(1 <32)-1) is 32 1, with hashcode * 2654435769 and (&) can retain the low 32-bit value, then, the first p bit in the lower 32 bits is the hash value, that is, the right shift of the lower 32 bits (32-p)

(Hashcode * 2654435769) & (1 <32)-1) // retain only 32 bits> (32-P)

Try the Repurchase Code:

First, the array space is 2 ^ P

Java code
    private int p = 10;      private Entry[] table = new Entry[1 << p];  

Then:

Java code
    private int hash(int hashCode) {          long k = 2654435769L;          return (int)(((k * hashCode) & ((1L << 32) - 1)) >> (32 - p));      }  

Test or pass.

Next, let's add a little more elements to break it down.

Java code
    @Test      public void test3() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              Object o = new Object();              ht.put("key" + i, o);              assertEquals(o, ht.get("key" + i));              System.out.println("Ok: " + i);          }      }  

Run the test. If the test fails, you can see that the console outputs only to 108.

Runtimeexception. What should I do if a crash occurs?

AvailableLink Method,Open addressing MethodDone

First comeLink Method

First, refactor the entry and concatenate yourself.

Java code
    public static class Entry {          private String key;          private Object value;          private Entry next;                    public Entry(String key, Object value) {              this(key, value, null);          }          public Entry(String key, Object value, Entry next) {              this.key = key;              this.value = value;              this.next = next;          }                    public String getKey() {              return key;          }                    public Object getValue() {              return value;          }          public void setValue(Object value) {              this.value = value;          }                    public Entry getNext() {              return next;          }      }  

At the same time, a setvalue method is added to make it easier to "update elements" in the linked list.

Then refactor put

Java code
Public void put (string key, object Value) {int hash = hash (key. hashcode (); entry = table [hash]; If (Entry = NULL) {// The location has not been used, directly use table [hash] = new entry (Key, value); return;} For (Entry o = entry; o! = NULL; O = O. getnext () {If (O. getkey (). equals (key) {// check whether the key node exists. If yes, update it o. setvalue (value); Return ;}} table [hash] = new entry (Key, value, entry); // string here}

As you can see, the test runs normally :)

However, as the number of elements in the hash list increases, the collision rate increases. It is best to automatically expand the capacity when the number of elements reaches a certain value, so as to ensure its excellent search performance.

But let's take a look at the current hash. What is the collision probability when test3 is run.

Therefore, we reconstruct the number of times when a collision occurs.

Java code
Private int size = 0; // number of elements in the statistical table private int collidecount = 0; // count the number of collisions public int getsize () {return size;} public float getcolliderate () {return size> 0? (Float) collidecount)/size: 0 ;}

Java code
Public void put (string key, object Value) {int hash = hash (key. hashcode (); entry = table [hash]; If (Entry = NULL) {table [hash] = new entry (Key, value); size ++; // return here;} collidecount ++; // here for (Entry o = entry; o! = NULL; O = O. getnext () {If (O. getkey (). equals (key) {o. setvalue (value); Return ;}} table [hash] = new entry (Key, value, entry); size ++; // and this}

Test:

Java code
    @Test      public void test4() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              ht.put("key" + i, new Object());          }          System.out.println(ht.getCollideRate());      }  

Output: 0.309

The total capacity is 1024, with 1000 elements, of which 309 are collision. The accident is serious.

Next we reconstruct the hashtable class so that it can expand the capacity every time it reaches 0.75 of the capacity (load factor :)

Java code
    private int p = 4;      private Entry[] table = new Entry[1 << p];      private float loadFactor = 0.75f;  

First, the initial capacity is 16 (1 <4), and then the load factor is 0.75.

Java code
    public void put(String key, Object value) {          if (table.length * loadFactor < size) {              resize();          }  

Check Before put, and resize it if necessary.

Java code
    private void resize() {          Entry[] old = table;                    p += 1;          table = new Entry[1 << p];          size = 0;          collideCount = 0;                    for (int i = 0; i < old.length; i++) {              Entry entry = old[i];              while (entry != null) {                  put(entry.getKey(), entry.getValue());                  entry = entry.getNext();              }          }      }  

Write a test:

Java code
    @Test      public void test5() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              Object o = new Object();              ht.put("key" + i, o);              assertEquals(o, ht.get("key" + i));          }          System.out.println(ht.getSize());          assertTrue(ht.getSize() == 1000);          System.out.println(ht.getCollideRate());      }  

At this time, it is also added to 1000, and loadfactor is 0.08 at this time.

The initial size of our hash is 16 and is added to 1000. It takes several times to resize. The resize overhead is relatively large.

We can refactor the code,Specify the capacity in the constructor to avoid unnecessary resize overhead..

But this is not done here, because it is only to illustrate the algorithm,When java. util. hashmap is used.

Solve the collision andOpen addressing Method

It is also gray and easy to drop. Let's add two methods, put2, and get2 to implement it.

Easy to useLinear Exploration

Java code
Public void put2 (string key, object Value) {If (table. length * loadfactor <size) {resize ();} int hash = hash (key. hashcode (); entry = table [hash]; int nexthash = hash; while (entry! = NULL) {If (entry. getkey (). equals (key) {entry. setvalue (value); return;} nexthash = (nexthash + 1) % table. length; // check the next position entry = table [nexthash];} table [nexthash] = new entry (Key, value); size ++; If (hash! = Nexthash) {collidecount ++ ;}}

Java code
    public Object get2(String key) {          int hash = hash(key.hashCode());          Entry entry = table[hash];          while (entry != null) {              if (entry.getKey().equals(key)) {                  return entry.getValue();              }              hash = (hash + 1) % table.length;              entry = table[hash];          }          return null;      }  

Similarly, write a test

Java code
    @Test      public void test6() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              Object o = new Object();              ht.put2("key" + i, o);              assertEquals(o, ht.get2("key" + i));          }          System.out.println(ht.getSize());          assertTrue(ht.getSize() == 1000);          System.out.println(ht.getCollideRate());      }   

Linear profiling is easy to implement, but it is easy to cause the problem of "heap together". The book is called:One cluster

AvailableSecondary Exploration, OrDouble hash,It is better to avoid this phenomenon.

//----------

Let's take a look at the implementation of Java. util. hashmap to better disband the list.

First look at put:

Java code
Public v put (K key, V value) {If (Key = NULL) // null can also be key return putfornullkey (value); int hash = hash (key. hashcode (); // int I = indexfor (hash, table. length); // For (Entry <K, V> E = table [I]; e! = NULL; E = E. next) {object K; If (E. hash = hash & (k = E. key) = Key | key. equals (k) {v oldvalue = E. value; E. value = value; E. recordaccess (this); Return oldvalue;} modcount ++; addentry (hash, key, value, I); // return NULL where you are concerned ;}

In the code, hash and indexfor addentry are our concerns.

In addition:Hashmap allows null key

There is an if statement:

Java code
  1. If (E. Hash = hash & (k = E. Key) = Key | key. Equals (k ))){

First check whether the hash value is equal, and then judge equals

This also gives us the principle of rewriting equals and hash:If you overwrite equals, you must overwrite the hashcode. If the two objects equals, The hashcode must be equal. Otherwise, it will not work correctly in containers such as hashmap.See objective Java.

Let's take a look at hash and indexfor (I added the Chinese annotation)

Java code
    /**       * Applies a supplemental hash function to a given hashCode, which       * defends against poor quality hash functions.  This is critical       * <strong>because HashMap uses power-of-two length hash tables</strong>, that       * otherwise encounter collisions for hashCodes that do not differ       * in lower bits. Note: Null keys always map to hash 0, thus index 0.       */       static int hash(int h) {           // This function ensures that hashCodes that differ only by           // constant multiples at each bit position have a bounded           // number of collisions (approximately 8 at default load factor).           h ^= (h >>> 20) ^ (h >>> 12);           return h ^ (h >>> 7) ^ (h >>> 4);       }             /**       * Returns index for hash code h.       */       static int indexFor(int h, int length) {           return h & (length-1);       }  

Hash generates better hash values based on the original hashcode, because the table capacity is exactly the integer power of 2, so this must be done, otherwise the high hash code will be wasted (during modulo operation) --- see the aboveDivision hash

Indexfor: equal to H % length,

Therefore, hashmap usesImproved division hash

Let's take a look at addentry.

Java code
    void addEntry(int hash, K key, V value, int bucketIndex) {          Entry<K, V> e = table[bucketIndex];          table[bucketIndex] = new Entry<K, V>(hash, key, value, e);          if (size++ >= threshold)              resize(2 * table.length);      }  

Table also doubles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.