International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Introduction to algorithms-discrete list

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is reproduced, transferred from: http://www.iteye.com/topic/570646

Hash is one of the most common data structures. In particular, Ruby JS and other dynamic languages support it at the syntax level. In Java, there is a little bit around (every time you use it, you will feel awkward, I do not know if you have such a feeling ).

This chapter is really tangled, because I have read it before and can't understand it now. Fortunately, all the parts that you cannot understand are star numbers.

A hash is such a thing: it can store things with the key as the index, and then it can be found out when necessary, usually fast :)

Java code

    @Test      public void test() {          HashTable ht = new HashTable();                    Object o1 = new Object();          ht.put("o1", o1);                    Object o2 = new Object();          ht.put("o2", o2);                    assertEquals(o1, ht.get("o1"));          assertEquals(o2, ht.get("o2"));      }

The get method must beConstant time. O (1) <--- complexity

Ignore this, but at least hashtable looks like this:

Java code

    public class HashTable {                    public void put(String key, Object value) {              //...          }                    public Object get(String key) {              //...              return null;          }      }

The above code is certainly not tested (PS: Please forget java. util. hashtable)

To get fast enough, it is best to calculate the storage location directly with the data key, and then you can find it all at once.

Similar to this:

Public class hashtable {private object [] Table = new object [1000]; Public void put (string key, object Value) {int hash = hash (key. hashcode (); If (Table [hash] = NULL) {table [hash] = value;} else {Throw new runtimeexception ("crashed, what should I do? ") ;}} Public object get (string key) {int hash = hash (key. hashcode (); Return table [hash];} private int Hash (INT hashcode) {return-1; // a number [0, table. length-1]}

Run the test and the array is out of bounds, because the hash method has not been implemented yet.

Hash is used to hash keywords and other possible columns to the table.

YesDivision hash,Multiplication hashAnd so on.

Try it firstDivision hash:

Java code

  private int hash(int hashCode) {          int capacity = table.length;          return hashCode % capacity;      }  
Capacity should not be the power of 2Otherwise, if the value is hashcode's low K bit, the high bit will be wasted and may cause many collisions.
You can select a prime number whose integer power is not close.
Run the test now :)
But wait, sometimes we need to do this:
Java codePublic void Test2 () {hashtable ht = new hashtable (); object O1 = new object (); ht. put ("O1", O1); object anothero1 = new object (); ht. put ("O1", anothero1); // update assertequals (anothero1, ht. get ("O1 "));}
We need to refactor the code and save the key.
First, add a structure to save the key and Value
Java codePublic class hashtable {public static class entry {private string key; private object value; Public entry (string key, object Value) {This. key = key; this. value = value;} Public String getkey () {return key;} public object getvalue () {return value;} private entry [] Table = new entry [1000]; // change the original object [] to entry []
Refactoring put
Java codePublic void put (string key, object Value) {int hash = hash (key. hashcode (); If (Table [hash] = NULL | table [hash]. getkey (). equals (key) {table [hash] = new entry (Key, value);} else {Throw new runtimeexception ("crashed. What should I do? ");}}
Refactor get
Java code    public Object get(String key) {          int hash = hash(key.hashCode());          Entry entry = table[hash];          return entry == null ? null : entry.getValue();       }  
As you can see, the test passed again :)
Check againMultiplication hash
Java codePrivate int Hash (INT hashcode) {int capacity = table. length; Double A = 0.6180334; // omnipotent golden split return (INT) (hashcode * A) % 1) * capacity );}
Use the constant (A) to multiply the hashcode to take the decimal number and then the capacity.
Knuth thinks that the prime number is an ideal value.(Root number 5-1)/2 ~ 0.618) Investors must know
Advantages of multiplication hash:
Capacity has no special requirements. Generally, it is an integer power of 2.
In this way, we can use shift instead of multiplication.
Then the golden split number A can be expressed as 2654435769/(2 ^ 32)
It can be simplified ://
Assume that the computer word length is W bits. Here w is 32 bits, hashcode * 2654435769, which is a value of 2 W bits. The first 32 bits are high, and the last 32 bits are low,(1 <32)-1) is 32 1, with hashcode * 2654435769 and (&) can retain the low 32-bit value, then, the first p bit in the lower 32 bits is the hash value, that is, the right shift of the lower 32 bits (32-p)
(Hashcode * 2654435769) & (1 <32)-1) // retain only 32 bits> (32-P)
Try the Repurchase Code:
First, the array space is 2 ^ P
Java code    private int p = 10;      private Entry[] table = new Entry[1 << p];  
Then:
Java code    private int hash(int hashCode) {          long k = 2654435769L;          return (int)(((k * hashCode) & ((1L << 32) - 1)) >> (32 - p));      }  
Test or pass.
Next, let's add a little more elements to break it down.
Java code    @Test      public void test3() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              Object o = new Object();              ht.put("key" + i, o);              assertEquals(o, ht.get("key" + i));              System.out.println("Ok: " + i);          }      }  
Run the test. If the test fails, you can see that the console outputs only to 108.
Runtimeexception. What should I do if a crash occurs?
AvailableLink Method,Open addressing MethodDone
First comeLink Method
First, refactor the entry and concatenate yourself.
Java code    public static class Entry {          private String key;          private Object value;          private Entry next;                    public Entry(String key, Object value) {              this(key, value, null);          }          public Entry(String key, Object value, Entry next) {              this.key = key;              this.value = value;              this.next = next;          }                    public String getKey() {              return key;          }                    public Object getValue() {              return value;          }          public void setValue(Object value) {              this.value = value;          }                    public Entry getNext() {              return next;          }      }  
At the same time, a setvalue method is added to make it easier to "update elements" in the linked list.
Then refactor put
Java codePublic void put (string key, object Value) {int hash = hash (key. hashcode (); entry = table [hash]; If (Entry = NULL) {// The location has not been used, directly use table [hash] = new entry (Key, value); return;} For (Entry o = entry; o! = NULL; O = O. getnext () {If (O. getkey (). equals (key) {// check whether the key node exists. If yes, update it o. setvalue (value); Return ;}} table [hash] = new entry (Key, value, entry); // string here}
As you can see, the test runs normally :)
However, as the number of elements in the hash list increases, the collision rate increases. It is best to automatically expand the capacity when the number of elements reaches a certain value, so as to ensure its excellent search performance.
But let's take a look at the current hash. What is the collision probability when test3 is run.
Therefore, we reconstruct the number of times when a collision occurs.
Java codePrivate int size = 0; // number of elements in the statistical table private int collidecount = 0; // count the number of collisions public int getsize () {return size;} public float getcolliderate () {return size> 0? (Float) collidecount)/size: 0 ;}
Java codePublic void put (string key, object Value) {int hash = hash (key. hashcode (); entry = table [hash]; If (Entry = NULL) {table [hash] = new entry (Key, value); size ++; // return here;} collidecount ++; // here for (Entry o = entry; o! = NULL; O = O. getnext () {If (O. getkey (). equals (key) {o. setvalue (value); Return ;}} table [hash] = new entry (Key, value, entry); size ++; // and this}
Test:
Java code    @Test      public void test4() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              ht.put("key" + i, new Object());          }          System.out.println(ht.getCollideRate());      }  
Output: 0.309
The total capacity is 1024, with 1000 elements, of which 309 are collision. The accident is serious.
Next we reconstruct the hashtable class so that it can expand the capacity every time it reaches 0.75 of the capacity (load factor :)
Java code    private int p = 4;      private Entry[] table = new Entry[1 << p];      private float loadFactor = 0.75f;  
First, the initial capacity is 16 (1 <4), and then the load factor is 0.75.
Java code    public void put(String key, Object value) {          if (table.length * loadFactor < size) {              resize();          }  
Check Before put, and resize it if necessary.
Java code    private void resize() {          Entry[] old = table;                    p += 1;          table = new Entry[1 << p];          size = 0;          collideCount = 0;                    for (int i = 0; i < old.length; i++) {              Entry entry = old[i];              while (entry != null) {                  put(entry.getKey(), entry.getValue());                  entry = entry.getNext();              }          }      }  
Write a test:
Java code    @Test      public void test5() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              Object o = new Object();              ht.put("key" + i, o);              assertEquals(o, ht.get("key" + i));          }          System.out.println(ht.getSize());          assertTrue(ht.getSize() == 1000);          System.out.println(ht.getCollideRate());      }  
At this time, it is also added to 1000, and loadfactor is 0.08 at this time.
The initial size of our hash is 16 and is added to 1000. It takes several times to resize. The resize overhead is relatively large.
We can refactor the code,Specify the capacity in the constructor to avoid unnecessary resize overhead..
But this is not done here, because it is only to illustrate the algorithm,When java. util. hashmap is used.
Solve the collision andOpen addressing Method
It is also gray and easy to drop. Let's add two methods, put2, and get2 to implement it.
Easy to useLinear Exploration
Java codePublic void put2 (string key, object Value) {If (table. length * loadfactor <size) {resize ();} int hash = hash (key. hashcode (); entry = table [hash]; int nexthash = hash; while (entry! = NULL) {If (entry. getkey (). equals (key) {entry. setvalue (value); return;} nexthash = (nexthash + 1) % table. length; // check the next position entry = table [nexthash];} table [nexthash] = new entry (Key, value); size ++; If (hash! = Nexthash) {collidecount ++ ;}}
Java code    public Object get2(String key) {          int hash = hash(key.hashCode());          Entry entry = table[hash];          while (entry != null) {              if (entry.getKey().equals(key)) {                  return entry.getValue();              }              hash = (hash + 1) % table.length;              entry = table[hash];          }          return null;      }  
Similarly, write a test
Java code    @Test      public void test6() {          HashTable ht = new HashTable();                    for (int i = 0; i < 1000; i++) {              Object o = new Object();              ht.put2("key" + i, o);              assertEquals(o, ht.get2("key" + i));          }          System.out.println(ht.getSize());          assertTrue(ht.getSize() == 1000);          System.out.println(ht.getCollideRate());      }   
Linear profiling is easy to implement, but it is easy to cause the problem of "heap together". The book is called:One cluster
AvailableSecondary Exploration, OrDouble hash,It is better to avoid this phenomenon.
//----------
Let's take a look at the implementation of Java. util. hashmap to better disband the list.
First look at put:
Java codePublic v put (K key, V value) {If (Key = NULL) // null can also be key return putfornullkey (value); int hash = hash (key. hashcode (); // int I = indexfor (hash, table. length); // For (Entry <K, V> E = table [I]; e! = NULL; E = E. next) {object K; If (E. hash = hash & (k = E. key) = Key | key. equals (k) {v oldvalue = E. value; E. value = value; E. recordaccess (this); Return oldvalue;} modcount ++; addentry (hash, key, value, I); // return NULL where you are concerned ;}
In the code, hash and indexfor addentry are our concerns.
In addition:Hashmap allows null key
There is an if statement:Java code
  
  
   
   If (E. Hash = hash & (k = E. Key) = Key | key. Equals (k ))){
  
  
First check whether the hash value is equal, and then judge equals
This also gives us the principle of rewriting equals and hash:If you overwrite equals, you must overwrite the hashcode. If the two objects equals, The hashcode must be equal. Otherwise, it will not work correctly in containers such as hashmap.See objective Java.
Let's take a look at hash and indexfor (I added the Chinese annotation)
Java code    /**       * Applies a supplemental hash function to a given hashCode, which       * defends against poor quality hash functions.  This is critical       * <strong>because HashMap uses power-of-two length hash tables</strong>, that       * otherwise encounter collisions for hashCodes that do not differ       * in lower bits. Note: Null keys always map to hash 0, thus index 0.       */       static int hash(int h) {           // This function ensures that hashCodes that differ only by           // constant multiples at each bit position have a bounded           // number of collisions (approximately 8 at default load factor).           h ^= (h >>> 20) ^ (h >>> 12);           return h ^ (h >>> 7) ^ (h >>> 4);       }             /**       * Returns index for hash code h.       */       static int indexFor(int h, int length) {           return h & (length-1);       }  
Hash generates better hash values based on the original hashcode, because the table capacity is exactly the integer power of 2, so this must be done, otherwise the high hash code will be wasted (during modulo operation) --- see the aboveDivision hash
Indexfor: equal to H % length,
Therefore, hashmap usesImproved division hash
Let's take a look at addentry.
Java code    void addEntry(int hash, K key, V value, int bucketIndex) {          Entry<K, V> e = table[bucketIndex];          table[bucketIndex] = new Entry<K, V>(hash, key, value, e);          if (size++ >= threshold)              resize(2 * table.length);      }  
Table also doubles

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to algorithms-discrete list

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support