Learning notes: Hashtable and HashMap, hashtablehashmap

Source: Internet
Author: User
Tags rehash

Learning notes: Hashtable and HashMap, hashtablehashmap

After learning the basic knowledge of these days, I found myself still sweating. It's no wonder that I have been confused. However, I don't know whether such an improvement is useless. Have you forgotten this knowledge after a while? The work of this kind of knowledge seems to be on demand at ordinary times. It is not that you have not paid attention to this basic knowledge at ordinary times, but you forget it when you have used it. Therefore, writing notes is also a good habit. Reading a concept alone is not easy to remember. It is much easier to organize and write documents, and it will be easier to find out in the future.

Why use Hash Table? This reminds me of one thing I encountered in my previous work. I was still writing delphi many years ago. The software features a lot of batch data operations, because the data needs to be pulled to the memory, and then multiple data sets are traversed for search and comparison, in this case, a large amount of data will be very slow, and memory errors will often occur, and the cause will not be found. Later, when an experienced programmer joined, he proposed to use hashtable to solve this problem. After the test, the performance is greatly improved. The following is a simple analysis: Our data object is located by comparing the primary key fields, and this field is of the string type and the length is 40, to find a piece of data in a dataset, We need to traverse it and compare whether the primary key is the same. There are two problems: 1. Comparison between strings and strings, if the data volume is large, it is a big problem. After all, each time it is compared with the length of 40 bytes. 2. Because the data is stored in the memory linked list, if you want to locate a data, you need to search for it. What should you do if you want to solve these two problems in a large number of cycles? 1. Reduce the time when comparing primary key fields, such as using an integer type, in this way, there are only four bytes. 2. The optimization algorithm improves the efficiency of data search, or improves the efficiency of data positioning. We use hashtable to meet these requirements. The Hashtable storage structure uses: array + linked list method. First, the data is stored in the array. Isn't the addressing capability of the array very fast? Second, hash the Key so that the Int type can be used, this solves the problem of string comparison. If you see the benefits, you will have the motivation to continue learning. Step by step. What is a Hash Table? There should be no stranger to the Hash table name. Let's take a look at the definition.
A Hash table (also called a Hash table) directly accesses the data structure in the memory storage location based on the Key value. That is to say, It maps the key value to a position in the table through the calculation of a function to access the record, which accelerates the search speed. This ing function is called a hash function, and the array that stores records is called a hash function.
To understand more specifically, we need to learn more about the concept of hash. Let's continue to look at Wikipedia and understand it a little bit:
 
 
  • If the keyword is, the value is stored in the storage location. Therefore, you can directly obtain the queried records without comparing them. This ing is called a hash function, and the table created based on this idea isHash.
  • Different keywords may get the same hash address, that is, this phenomenon is calledCollisionCollision ). Keywords with the same function value are called for the hash function.Synonym. To sum up, a set of keywords are mapped to a finite continuous address set (range) based on the hash function and collision processing method, the keyword "image" in the address set is used as the storage location of the record in the table.HashThe ing process is called a hash table or a hash, and the resulting storage location is called a hash address.
Several important concepts are described here: keywords, hash functions, and collisions. It should be said that what we have already said is very clear, and there is nothing hard to understand. Hash FunctionThere are many implementation methods, as shown below:

The hash function allows you to access a data sequence more quickly and effectively. With the hash function, data elements are located faster.

A bit of suspense is nothing more than simply using a number or calculating a number as the subscript of the array, that is, the location address of the storage, so that the storage and retrieval can be directly located, simple and efficient. Either method has a common problem, that is, the hash calculation results may be the same, that is, the collision problem. There are several ways to solve this problem:
  • Open address Method: If a conflict occurs, continue to find the next empty hash address.
  • Independent linked list method: Stores conflicting data directly using a linked list in a conflicting location.
  • Hash again: When the last hash calculation collision occurred, use another hash function to calculate the new hash function address until the collision does not occur.
  • Create a public overflow Zone: Create basic tables and overflow tables, and place conflicting tables in the overflow zone.
Reference article: The mechanism of External Table seems to have almost understood the role and benefits of HashTable. Next, we will have the motivation to continue learning HashTable. The storage structure of the Hash Table class of Delphi mentioned above is in the form of array + linked list, and the source code cannot be found. Let's take the Java Hash Table class as an object to learn.
  • Storage Structure
The default HashTable class in Java SDK uses an array + single-chain table storage structure. With the previous concepts, we can understand that arrays are the continuous address space used to store data, the linked list is used to solve the collision problem. In this way, the efficient addressing capability of the array is used, and the collision storage problem is solved through the linked list. First look at the array:
/*** The hash table data.*/private transient Entry[] table;

View the data linked list again

    /**     * Hashtable collision list.     */    private static class Entry<K,V> implements Map.Entry<K,V> {    int hash;    K key;    V value;    Entry<K,V> next;    protected Entry(int hash, K key, V value, Entry<K,V> next) {        this.hash = hash;        this.key = key;        this.value = value;        this.next = next;    }    protected Object clone() {        return new Entry<K,V>(hash, key, value,                  (next==null ? null : (Entry<K,V>) next.clone()));    }    // Map.Entry Ops    public K getKey() {        return key;    }    public V getValue() {        return value;    }    public V setValue(V value) {        if (value == null)        throw new NullPointerException();        V oldValue = this.value;        this.value = value;        return oldValue;    }    public boolean equals(Object o) {        if (!(o instanceof Map.Entry))        return false;        Map.Entry e = (Map.Entry)o;        return (key==null ? e.getKey()==null : key.equals(e.getKey())) &&           (value==null ? e.getValue()==null : value.equals(e.getValue()));    }    public int hashCode() {        return hash ^ (value==null ? 0 : value.hashCode());    }    public String toString() {        return key.toString()+"="+value.toString();    }    }

In fact, it is quite simple, it is an Entry array, and Entry is a key-value relational table, and provides the linked list function.

 

  • Data Access
After writing so much code, open the Hashtable code to find out those things. The key is to store and retrieve the code. Put: 
public synchronized V put(K key, V value) {    // Make sure the value is not null    if (value == null) {        throw new NullPointerException();    }    // Makes sure the key is not already in the hashtable.    Entry tab[] = table;    int hash = key.hashCode();    int index = (hash & 0x7FFFFFFF) % tab.length;    for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {        if ((e.hash == hash) && e.key.equals(key)) {        V old = e.value;        e.value = value;        return old;        }    }    modCount++;    if (count >= threshold) {        // Rehash the table if the threshold is exceeded        rehash();            tab = table;            index = (hash & 0x7FFFFFFF) % tab.length;    }    // Creates the new entry.    Entry<K,V> e = tab[index];    tab[index] = new Entry<K,V>(hash, key, value, e);    count++;    return null;    }
  • Here we can see the thread synchronization keyword, because the entire hashtable is thread-synchronized, so it is also a problem to use it in the thread.
  • NOTE: If valua is not null, an exception is thrown.
  • The data to be stored is put in the key-value method. The Hash value is calculated using the key, which is the position of the array.

Let's look at the key code:

    int hash = key.hashCode();    int index = (hash & 0x7FFFFFFF) % tab.length;

First obtain the hashcode of the key, and then obtain the array index by removing the remainder of the key. This gives you a simple and efficient storage location.

  • Then you can check whether the same project exists. If yes, replace it. Finally, create an Entry object to save the data. If a collision Entry exists, it will be automatically written to the linked list to resolve the conflict.

 

The method for obtaining data is get:

    public synchronized V get(Object key) {    Entry tab[] = table;    int hash = key.hashCode();    int index = (hash & 0x7FFFFFFF) % tab.length;    for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {        if ((e.hash == hash) && e.key.equals(key)) {        return e.value;        }    }    return null;    }

The method for calculating the bottom mark of an array is the same. This positioning method is especially efficient. You only need to calculate it once, of course, if there is a conflict, we need to compare the hash and key of the historical table and then determine the final data item.

Let's take a look at the implementation of HashMap in haspMap, which is basically the same as that of hashtable. The storage structure is also similar, but there are some minor differences:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.