HashMap methods and ideas for solving hash conflicts, hashmaphash

Source: Internet
Author: User

HashMap methods and ideas for solving hash conflicts, hashmaphash

When asked about the hash function, I forgot about it for a while. After a review, I thought the difficulty was nothing more than hash () algorithm design and conflict resolution. It's easy to see this article. I also learned about how JAVA solves conflicts with the HashMap structure.

Text:

In Java programming language, the most basic structure is two types, one is array, the other is analog pointer (reference ), all data structures can be constructed using these two basic structures, and the same is true for HashMap. When the program tries to put multiple key-values into HashMap, take the following code snippet as an example:

HashMap M = new HashMap ();
M. put ("a", "rrr1 ");
M. put ("B", "tt9 ");
M. put ("c", "tt8 ");
M. put ("d", "g7 ");
M. put ("e", "d6 ");
M. put ("f", "d4 ");
M. put ("g", "d4 ");
M. put ("h", "d3 ");
M. put ("I", "d2 ");
M. put ("j", "d1 ");
M. put ("k", "1 ");
M. put ("o", "2 ");
M. put ("p", "3 ");
M. put ("q", "4 ");
M. put ("r", "5 ");
M. put ("s", "6 ");
M. put ("t", "7 ");
M. put ("u", "8 ");
M. put ("v", "9 ");

HashMap uses a so-called "Hash algorithm" to determine the storage location of each element. When the program executes map. when the put (String, Obect) method is used, the system will call the hashCode () method of String to obtain its hashCode value-each Java object has a hashCode () method, you can obtain its hashCode value through this method. After obtaining the hashCode value of this object, the system determines the storage location of the element based on the hashCode value. The source code is as follows:

 

Public V put (K key, V value ){

If (key = null)

Return putForNullKey (value );

Int hash = hash (key. hashCode ());

Int I = indexFor (hash, table. length );

For (Entry E = table [I]; e! = Null; e = e. next ){

Object k;

// Determine whether the currently determined index location has elements with the same hashcode and key. If there are elements with the same hashcode and key, the new value overwrites the original value, and return the old value.

// If the same hashcode exists, the index locations are the same. In this case, check whether their keys are the same. If they are different, a hash conflict occurs.

// After a Hash conflict occurs, an Entry chain is not stored in a single bucket of HashMap.

// The system can only traverse each Entry in order, until the Entry you want to search for is found. If the Entry that you want to search for is located at the end of the Entry chain (the Entry is first placed in the bucket ),

// The system must loop to the end to find this element.

If (e. hash = hash & (k = e. key) = key | key. equals (k ))){

V oldValue = e. value;

E. value = value;

Return oldValue;

}

}

ModCount ++;

AddEntry (hash, key, value, I );

Return null;

}

 

The above program uses an important internal interface: Map. Entry. Each Map. Entry is actually a key-value pair. It can be seen from the above program that when the system decides to store the key-value Pair in HashMap, the value in the Entry is not considered at all. It only calculates and determines the storage location of each Entry based on the key. This also illustrates the previous conclusion: we can regard the value in the Map set as a subsidiary of the key. When the system determines the storage location of the key, the value will be saved there. after my transformation, the HashMap program intentionally constructed a hash conflict because the initial size of HashMap is 16, but I put more than 16 elements in hashmap, and I have blocked its resize () method. Do not allow it to resize. In this case, the underlying array Entry [] table Structure of HashMap is as follows:

The bucket in Hashmap has the form of a single-chain table. A problem to be solved by the hash list is the conflict of hash values. there are usually two methods: the linked list method and the open address method. The linked list method organizes objects with the same hash value into a linked list and places it in the slot corresponding to the hash value. The open address method uses a detection algorithm, when a slot is occupied, continue to find the next slot that can be used. Java. util. HashMap adopts the linked list method. The linked list is a one-way linked list. The core code for creating a single-chain table is as follows:

Void addEntry (int hash, K key, V value, int bucketIndex ){

Entry E = table [bucketIndex];

Table [bucketIndex] = new Entry (Hash, key, value, e );

If (size ++> = threshold)

Resize (2 * table. length );

Bsp;

The code of the above method is very simple, but it contains a design: The system always puts the newly added Entry object to the bucketIndex index of the table Array-if there is already an Entry object at the bucketIndex, the newly added Entry object points to the original Entry object (which generates an Entry chain). If there is no Entry object at the bucketIndex, that is, the e variable in the code above is null, that is, the newly added Entry object points to null, that is, no Entry chain is generated.

When there is no hash conflict in a HashMap and a single-chain table is not formed, the hashmap search element is very fast. The get () method can directly locate the element, but after a single-chain table appears, A single bucket stores not an Entry, but an Entry chain. The system can only traverse each Entry in order, until the Entry you want to search for is found. If the Entry that you want to search for is located at the end of the Entry chain (the Entry is first placed in the bucket ), then the system must loop to the end to find this element.

When creating a HashMap, there is a default load factor. The default value is 0.75, which is a compromise between time and space costs: increasing the load factor can reduce the memory space occupied by the Hash table (that is, the Entry array), but it will increase the time overhead of data query, query is the most frequent operation (query is required for both the get () and put () Methods of HashMap). Reducing the load factor will improve the performance of data query, but it will increase the memory space occupied by the Hash table.

Thinking: compile a simple example

# Include
   
    
# Define LISTSIZE 3 // hashlist size # define TESTSIZE 8 // added during testing
    
     
Number of instances. If the value is greater than the LISTSIZE value, the using namespace std; struct Node {int key; int value; Node * next = NULL; // int times = 0 ;}; class Hashtest {public: Hashtest ();//~ Hashtest (); int hash (int key); void addNode (Node & p); Node * findNode (int key); private: Node * hashlist [LISTSIZE];}; Hashtest:: Hashtest () {for (int I = 0; I
     
      
Next; delete temp ;}} * // structure, because the Node graph in the main test is convenient to use the array to create and store in stack instead of new Node () in heap, so no. Int Hashtest: hash (int key) {int pos = key % LISTSIZE; return pos;} // a simple hash function void Hashtest: addNode (Node & p) {int pos = this-> hash (p. key); Node * head = hashlist [pos]; if (head = NULL) {hashlist [pos] = & p; return;} while (head-> next! = NULL) {head = head-> next;} head-> next = & p;} // Add a group of data p (key, value) through key and hash () locate the head pointer of the chain table where p should be stored; Node * Hashtest: findNode (int itemkey) {int pos = this-> hash (itemkey ); node * head = hashlist [pos]; while (head-> key! = Itemkey & head! = NULL) {head = head-> next;} return head;} // search for a group of data p. Locate the position in the hashlist first, and then in the linked list. The storage structure is actually a semi-static and semi-dynamic two-dimensional array. Int main () {Hashtest ht; Node nodelist [TESTSIZE]; for (int I = 0; I
      
       
Value <
       
        
Output 0 to (TESTSIZE-1) as expected ). The logic is OK, but the efficiency problem can only be tested theoretically.
        

The time for searching. We can find that hashlist [I] is O (1) Through hash (), and the time overhead is mainly in the searching of linked lists. If the chain table header is okay, the whole chain table will be traversed at the end of the table.

Can I add a times attribute to the Node to record the historical access times through extra space overhead? during maintenance, the linked list will be sorted at a certain interval, place the Node with a High Access frequency in the head of the linked list. Is the Access Frequency expected to remain high in the future?

I suddenly thought of the LRU (Least Recently Used) algorithm in the cache expiration Policy. Is there a "Recently Used algorithm" that corresponds to it "? Add a step in the findNode () method. Each time a Node is queried, it means that it may be highly active and may be more likely to be queried in the future, move the Node to the front of the linked list. Because the Moving Consumption of nodes in the linked list is very low, it may save a full Traversal Time for the next or n queries.

I have never worked in mass data processing, so I don't have to worry about it --

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.