Detailed explanation of source code of hashmap and hashtable

Source: Internet
Author: User
Tags rehash

Hashtable has been available since jdk1.0, so let's take a look at how it works, and then explore the principles of hashmap and their differences.

 

Hashtable has the following main fields,

 

/**
* The hash table data.
*/
Private transient entry [] table;

/**
* The total number of entries in the hash table.
*/
Private transient int count;

/**
* The table is rehashed when its size exceeds thisthreshold. (
* Value of this field is (INT) (capacity * loadfactor ).)
*
* @ Serial
*/
Private int threshold;

 

The most important one is the table array. It is the basic data structure of the entire hashtable! Let's take a look at this field.

Private transiententry [] table;

 

We can see that the basic data structure of hashtable is a two-dimensional array containing the entry class. this entry class is the internal class of hashtable. It is actually a one-way chain. Let's analyze it in detail.

 

 

Privatestatic class entry <K, V> implements map. Entry <K, V> {
Int hash;
K key;
V value;
Entry <K, V> next;
...

...

 

Do you think of the data structure principle taught in the school? The entry class defines a simple one-way link structure, which includes the key, value, and next entry Class Object next.

Here I want to emphasize,The hashtable data structure is a two-dimensional array containing unidirectional links.

 

 

 

 

Next let's take a look at what the hashtable constructor looks like.

 

The longest

 

Public hashtable (){
This (11, 0.75f );
}

 

This constructor calls another constructor.

 

Public hashtable (INT initialcapacity, float loadfactor ){
If (initialcapacity <0)
Throw new illegalargumentexception ("illegal capacity:" +
Initialcapacity );
If (loadfactor <= 0 | float. isnan (loadfactor ))
Throw newillegalargumentexception ("illegal load:" + loadfactor );

If (initialcapacity = 0)
Initialcapacity = 1;
This. loadfactor = loadfactor;
Table = newentry [initialcapacity];
Threshold = (INT) (initialcapacity * loadfactor );

}

 

After reading the code, we found that this constructor has constructed the table field and threshold. table, which has been described in detail earlier. What is threshold?

In fact, threshold has a great impact on hashtalbe performance! Because table is an array, if the number of objects saved in hashtable is greater than a certain number, reading and writing data will be very slow, because, A lot of data is stored in the Entry-Type Unidirectional chain. Each read/write operation must compare all the data in the chain. The longer the chain, the slower the read/write operation.

So when the data capacity is greater than threshold, hashtable will do rehash (), rehash will double the table capacity, and then all the data in the previous table will be moved back to the new table. how overhead is in such a process.

Threshold = (INT) (initialcapacity * loadfactor );

The hashtable class provides the number of constructor Han, which can be customized by the user. For hashtable that probably knows the capacity, the user should customize intitialcapacity. In this way, a large rehash overhead can be saved.

 

Now let's take a look at the put and get operations of hashtable.

 

 

Public synchronized v get (Object key ){
Entry tab [] = table;
Int hash = key. hashcode ();
Int Index = (hash & 0x7fffffff) % tab. length;
For (Entry <K, V> E = tab [Index]; e! = NULL; E = E. Next ){
If (E. Hash = hash) & E. Key. Equals (key )){
Return e. value;
}
}
Return NULL;
}

 

First, let's look at the get method. Get is the most basic method in hashtable. It uses the key to get the value in hashtable.

Int hash = key. hashcode ();
Int Index = (hash & 0x7fffffff) % tab. length;

Obtain the hashcode from the key and calculate the index in the table from the hashcode, that is, the first column in the array.

As for why it is different from 0x7fffffff, It is a hash algorithm provided by hashtable. hashmap provides different algorithms. You can also define your own algorithms. if you want to know different specific algorithms, Google or Baidu.

 

Now that we have the index, we can go to the entry one-way chain in the table array to find the value.

 

For (Entry <K, V> E = tab [Index]; e! = NULL; E = E. Next ){
If (E. Hash = hash) & E. Key. Equals (key )){
Return e. value;
}
}

 

A For statement is a simple single link for retrieving entries. If statements check whether the keys are the same. Here we encounter a major knowledge point in Java learning. The relationship between hascode () and equal.

Everyone has learned that if hascode () has the same value, equal may not be the same. If equal is the same, hascode must be the same. But why? Actually, the answer is in the code above!

The data structure of hashtable isA two-dimensional array containing unidirectional links. we get the hash and index from hascode and can determine the first columns of the key in the table array. However, this is obviously not enough because the entry class is a one-way column, it can be one or multiple keys. You need to find the key from a series of entries with the same hascode. Equals is required. only two keys are equal, so we are looking for them. after finding the key, you just need to simply return the value. if you have any questions about the entry class, refer to the previous explanation.

 

 

 

Publicsynchronized v put (K key, V value ){
// Make surethe value is not null
If (value = NULL ){
Throw new nullpointerexception ();
}

// Makessure the key is not already in the hashtable.
Entry tab [] = table;
Int hash = key. hashcode ();
Int Index = (hash & 0x7fffffff) % tab. length;
For (Entry <K, V> E = tab [Index]; e! = NULL; E = E. Next ){
If (E. Hash = hash) & E. Key. Equals (key )){
V old = E. value;
E. value = value;
Return old;
}
}

Modcount ++;
If (count> = threshold ){
// Rehash the table if the threshold is exceeded
Rehash ();

Tab = table;
Index = (hash & 0x7fffffff) % tab. length;
}

// Createsthe new entry.
Entry <K, V> E = tab [Index];
Tab [Index] = new entry <K, V> (hash, key, value, e );
Count ++;
Return NULL;
}

 

Next let's take a look at the put method and understand get. Put is easy to understand.

First, the value to be put into hashtable cannot be null; otherwise, an error is returned.

Second, make sure that the key cannot be included in hashtable. If yes, the value is returned.

Again, check whether the capacity is too large. If it is too large, rehash will be used. This will be a waste of resources. Please refer to the previous article.

 

Finally, it is also the most important. We need to save the key-value to hashtable.

 

Entry <K, V> E = tab [Index];
Tab [Index] = new entry <K, V> (hash, key, value, e );

1. Get the entry object in the current table array.

2. Create a new entry based on the input key and value and assign it to the index of the current table.

Protected entry (INT hash, K key, V value, entry <K, V> next ){
This. Hash = hash;
This. Key = key;
This. value = value;
This. Next = next;
}

This is the entry class constructor. simply put, a new entry object is added at the front end of a single link. it can also be seen from this that for those objects written later, they can be read at a relatively fast speed, because the objects written later are always at the front end of the chain.

 

 

After reading hashtable, let's take a look at hashmap.

Hashmap can be regarded as an upgraded version of hashtable, which was first available since 1.2.

In general, hashmap optimizes the code for the hashtable class. For example, it eliminates hardcoding and Adds code reuse.

However, there are two main differences between the two.

1. The read and write operations of hashmap are unsynchronized. Be sure to use it in a multi-threaded environment.

Hashtable is synchronized.

The difference between the two is achieved by adding the synchronized keyword to the read/write method.

 

Hashmap

Public v put (K key, V value)

Public v get (Object key)

 

Hashtable

Public synchronized v get (Object key)

Public synchronized v put (K key, V value)

 

Someone may ask, "synchronized can ensure thread security. Why not?

This is actually a matter of efficiency. for thread-safe methods, the system must lock and unlock. performance will be greatly affected. many programs work in the case of single thread or thread security, so synchronized is redundant.

 

3. The second difference is that hashmap can empty values, and hashtable will report an error.

Hashmap

Public v put (K key, vvalue ){

If (Key = NULL)

Return putfornullkey (value );

 

Hashtable

Public synchronized vput (K key, V value ){

// Make sure the value is not null

If (value = NULL ){

Throw new nullpointerexception ();

}

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.