Learn to record Java hashes

Source: Internet
Author: User

Hash table (hash table, also known as hash list), is to store the key value pairs (Key-value) tables, the reason is not called it map (key-value pair storage is generally called map), because it is the following features: it can map the key code (keys) to a location in the table to directly access, This is a very fast access speed. The mapping function is called hash function.

1) for the keyword key, f (key) is its storage location, and F is the hash function

2) if key1! = Key2 but f (key1) = = f (key2), this phenomenon is called conflict (Collison). The conflict is unavoidable, because the key value is infinite and the table capacity is always limited (* See study questions * At the end of the article). What we are after is that the probability of the address in the hash table is equal to any keyword, and the hash function is uniform hash function.

There are several hash functions
X Direct addressing: Takes a keyword or a keyword to a linear function value as a hash address. That is, H (key) =key or H (key) = a key + B, where A and B are constants (this hash function is called its own function)
X-Digital Analysis method
The method of X-squared taking
X Folding method
X Random number method
X in addition to the remainder method: Take the keyword is not greater than the hash table length m of the number of p except that the remainder is a hash address. That is, H (key) = key MOD p, p<=m. Not only can the keyword directly modulo, but also in the collapse, the square to take the medium operation after the modulo. The choice of P is very important, generally take prime or m, if p is not good, easy to produce synonyms.

As you can imagine, when the number of data in a table is close to the size of the table, the probability of conflict increases significantly, so the capacity of the table needs to be enlarged when the number of data/table capacity reaches a certain scale, which is called the "reload Factor" (load factor).

There are two main ways to resolve conflicts:
The X-link method, which is the different elements of the hash to the same address, is linked by a chain list, also called the Zipper method.
X Open addressing method, if the address is conflicting, find it near this address. Including linear detection method, square detection method, double hash, etc.


Then take a look at the Java Hashtable implementation

The essence of java.util.Hashtable is the array, the elements of which are linked key-value pairs (one-way linked list).
Theprivatetransient//  entry array  
Theprivatestaticclassimplements map.entry<k,v> {  .    int Hash;   .    K key;   .    V value;   A.    //  entry here indicates a single linked list   .    ...   07.}  
We can use the constructor that specifies the size of the array, the fill factor, or the default constructor, the size of the default array is 11, and the reload factor is 0.75.
Hashtable public (intfloat  loadfactor)  {....   .}   Public  Hashtable () {  .     This (One, 0.75f);   06.}

When you want to enlarge an array, the size becomes oldcapacity * 2 + 1, which of course does not guarantee that the size of the array is always prime.
Look at the method in which the element is inserted, put method:
01. Public synchronizedv put (K key, V value) {02.//Make sure the value was not null03.if(Value = =NULL) {  04.Throw NewNullPointerException (); 05. }  06. 07.//makes sure the key is not already in the Hashtable. Entry tab[] =table; 09.inthash =Key.hashcode (); 10.intIndex = (hash & 0x7FFFFFFF)%tab.length; 11. for(entry<k, v> e = Tab[index]; E! =NULL; E =e.next) {12.if((E.hash = = hash) &&e.key.equals (Key)) {  . V old =E.value; E.value =value; 15.returnOld ; 16. }  17. }  18.}
The object class in Java has several methods, one of which is hashcode (), which means that all objects in Java have this method, and the call can get the hash code of the object itself. Take the table's length and use the linked list in the conflict location.

HashMap and Hashtable function almost the same. But the initial array size of the HashMap is 16 instead of 11, and when the array is enlarged, the size becomes twice times the original, and the default reload factor is 0.75. The Put method is as follows, with changes to the hash value and index:
02.if(Key = =NULL)  03.returnPutfornullkey (value); 04.inthash =Hash (Key.hashcode ()); 05.inti =indexfor (hash, table.length); 06. for(entry<k, v> e = table[i]; E! =NULL; E =e.next) {07.  Object K; 08.if(E.hash = = Hash && (k = e.key) = = Key | |Key.equals (k))) {  OldValue V =E.value; Ten. E.value =value; One. E.recordaccess ( This); 12.returnOldValue; 13. }  14. }  15. modcount++.; 17.  AddEntry (hash, key, value, I); 18.return NULL; 19.} 20. 21st. 22./*** Applies a supplemental hash function to a given hashcode, which. * Defends against poor quality hash functio  Ns. This is critical 25. * Because HashMap uses power-of-two length hash tables, that 26. * Otherwise encounter collisions for hashcodes this do not differ 27. * in lower bits. Note:null keys always maps to hash 0, thus index 0. .*/29.Static intHashinth) {30.//This function ensures, hashcodes that differ31.//constant multiples at each bit position has a bounded32.//Number of collisions (approximately 8 at default load factor). H ^= (H >>>) ^ (H >>> 12); 34.returnH ^ (H >>> 7) ^ (H >>> 4); 35.} 36. 37./*** Returns index for hash code h.*/40.Static intIndexfor (intHintlength) {  41.returnH & (length-1); 42.}
And look at the Hashtable in other open source Java libraries.

There are several open source Java collection implementations, each with different objectives and different emphases. The following analysis of the hash table in the open source framework begins in several ways: the default reload factor and the capacity extension, the hash function, and the method to resolve the conflict.

1. The Trove-trove Library provides a set of efficient base collection classes.

Gnu.trove.set.hash.THashMap inheritance: Thash, Tobjecthash, Thashmap, with its internal keys and values that are represented by 2 arrays. Its conflict resolution approach uses open addressing, where open addressing requires high space, so its default reload factor, load factor, is 0.5 instead of 0.75. Here's a look at the code step-by-step explanation:

By default initialization, the reload factor 0.5, the array size is taken from the prime number, which is always the prime number.

01./**The load above which rehashing occurs.*/02. Public Static Final floatDefault_load_factor = 0.5f; 03. 04.protected intSetUp (intinitialcapacity) {  05.intcapacity; . Capacity =Primefinder.nextprime (initialcapacity); 07.  Computemaxsize (capacity); 08.  Computenextautocompactionamount (initialcapacity); 09.returncapacity; 10.}

Then look at its put method, Insertkey (T key) is its hash algorithm, hash code array length after the remainder, get index, first check whether the location is occupied, if occupied, use the double hash algorithm to resolve the conflict, that is, the Code of Insertkeyrehash () Method.
01. Publicv put (K key, V value) {02.//Insertkey () inserts the key if a slot if found and returns the index03.intindex =Insertkey (key); 04.returnDoPut (value, index); 05.} 06. 07. 08.protected intInsertkey (T key) {Consumefreeslot =false; 10. 11.if(Key = =NULL)  12.returnInsertkeyfornull (); 13. 14.Final inthash = hash (key) & 0X7FFFFFFF; 15.intIndex = hash%_set.length; . Object cur =_set[index]; 17. 18.if(cur = =Free ) {  Consumefreeslot =true; _set[index] = key;//Insert Value21st.returnIndex//empty, all done22. }  23. 24.if(cur = = Key | |equals (key, cur)) {  25.return-index-1;//already stored26. }  27. 28.returnInsertkeyrehash (key, index, hash, cur); 29.}
2. Javolution-provides Java solutions for real-time, built-in, high-performance systems

The hash table in Javolution is Jvolution.util.FastMap, which is a doubly linked list with a default initial size of 16 and an extension of twice times. The load factor is not explicitly defined and can be known from the following statement with a value of 0.5

if//  Table more than half empty.   .    Map.resizetable (_isshared);   03.}  

Then look at the put function, the more striking is the index and the slot is obtained, is completely used hashkey displacement method obtained, so that the index and avoid collisions.
01.Private FinalObject put (object key, object value,intKeyhash,02.BooleanConcurrentBooleanNoreplace,Booleanreturnentry) {  03.FinalFastmap map =Getsubmap (Keyhash); 04.Finalentry[] entries = map._entries;//Atomic. 05.Final intMask = entries.length-1; 06.intSlot =-1; 07. for(inti = Keyhash >> map._keyshift; i++) {  Entry Entry = entries[i &Mask]; 09.if(Entry = =NULL) {  Ten. Slot = Slot < 0? I &Mask:slot; 11. Break; 12.}Else if(Entry = =entry.null) {Slot = Slot < 0? I &Mask:slot; 14.}Else if(key = = Entry._key) | | (Keyhash = = Entry._keyhash) && (_isdirectkeycomparator?key.equals (Entry._key)15. : _keycomparator.areequal (Key, Entry._key ))) {16.if(noreplace) {17.returnReturnentry?Entry:entry._value; 18. }  . Object Prevvalue =Entry._value; Entry._value =value; 21st.returnReturnentry?Entry:prevvalue; 22. }  23. }  24. ...  25.}

Learn to record Java hashes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.