Research on. net collection classes-hash table (1)-Hashtable, Dictionary

Source: Internet
Author: User
Tags net hash

Today, we will explore hash tables. The built-in. net hash table container is the Hashtable class, and Dictionary <TKey, TValue> is the corresponding generic hash table.

Hash table-Hashtable instantiation

Generally, when we instantiate an ArrayList or List <T>, if the capacity is not specified, the internal value is assigned as a static empty array. When there is an add operation, it is instantiated as an array with a length of 4. If the capacity is full, it will be automatically expanded to double the capacity.

The hash table has a similar situation. If the length of new Hashtable () is not specified, the length of the built-in bucket array is set to 11 by default. If a value such as new Hashtable (20) is specified, the length of the bucket array is not set to 20. Instead, a prime number array is generated inside the loop, set to a prime number (also called prime number) that is just greater than 20. Here is 23. It can be seen that the length of the access array in the hash table is always a prime number. Related code:

 

Public Hashtable (int capacity, float loadFactor) {...... // The default value is 11 int num2 = (num> 11.0 )? HashHelpers. getPrime (int) num): 11; this. buckets = new bucket [num2]; ......} [ReliabilityContract (Consistency. willNotCorruptState, Cer. success)] internal static int GetPrime (int min ){...... // find the prime number just greater than min in the cyclic prime number array for (int I = 0; I <primes. length; I ++) {int num2 = primes [I]; if (num2> = min) {return num2 ;}}......} // initialize the prime array static HashHelpers () {primes = new int [] {3, 7, 11, 0x11, 0x17, 0x1d, 0x25, 0x2f, 0x3b, 0x47, 0x59, 0x6b, 0x83, 0xa3, 0xc5, 0xef, 0x125, 0x161, 0x1af, 0x209, 0x277, values, 0x397, 0x44f, 0x52f, 0x63d, 0x78b, 0x0000, 0xaf1, 0xd2b, 0xfd1, 0x12fd, 0x16cf, 0x1b65, 0x20e3, 0x2777, expires, 0x38ff, 0x446f, values, 0x628d, 0x7655, 0x8e01, 0xaa6b, 0xcc89, 0xf583, 0x126a7, 0x1619b, numeric, 0x1fd3b, 0x26315, 0x2dd67, 0x3701b, 0x42023, numeric, 0x5f0ed, 0x72125, 0x88e31, 0xa443b, 0xc51eb, 0xec8c1, 0x11bdbf, expires, 0x198c4f, 0x1ea867, 0x24ca19, expires, 0x34fa1b, expires, 0x6dda89 };}

 

 

From the source code, we can draw a conclusion:Hashtable always creates an array larger than we expected.

Hash table-Hashtable access method

Hashtable contains an array variable of the bucket type. The bucket is a structure used to save Key, Value, and hash values.

 

 

From the code above, we can see that Hashtable is the same as ArrayList and List <T>, and data is stored in an Array internally. Since it is also an array, what are the differences between Hashtable and ArrayList access methods?

When hashtable adds data (for example, hash. add (key) first calls the gethashcode () method of the current key value to obtain the hash value (int32 ), take the absolute value of the hash value and the total capacity of the internal array (the buckets in the code above) for the remainder operation (that is, the '%' operation) to obtain a value smaller than the total capacity, this value is regarded as an array serial number and has a corresponding position. The process of getting an array serial number through the key is called ing.

Of course, two different key values may obtain the same sequence number. This is called a hash conflict and will not be analyzed here.

The following are some source code:

 
Private void insert (Object key, object nvalue, bool add ){...... uint num3 = This. inithash (key, this. buckets. length, out num, out num2); int num4 = 0; int Index =-1; // num6 is the serial number of the array obtained through the remainder calculation. Int num6 = (INT) (Num % This. buckets. length );......} private uint inithash (Object key, int hashsize, out uint seed, out uint incr) {// obtain the hashcode value of the key in Clr and obtain the absolute value. Uint num = (uint) (this. gethash (key) & 0x7fffffff); seed = num; // here for hash conflicts incr = 1 + (uint) (seed> 5) + 1) % (hashsize-1); Return num ;}

The value obtained from hashtable also maps the key value to find the index storage location in the array and directly retrieves the value. The whole process does not require loops, and the time complexity is O (1 ), while the arraylist needs to loop through the entire array to find a value, matching one by one. The time complexity is O (n), so we can see the advantages of the hash table in the query.

Since Hashtable itself is also an array storage, and the query speed is fast, what should ArrayList do? Why not use Hashtable to store data? There is a saying that when XX opens a door, many windows must be closed. Now let's analyze the windows closed by Hashtable.

First,The hash table cannot have duplicate key values.

Second,The hash table occupies more memory space.

The first point is obvious. Now the second point is analyzed.

Hash table-internal expansion mode when Hashtable is added
Private struct bucket {public object key; public object val; public int hash_coll;} public class Hashtable: // related interfaces {private IEqualityComparer _ keycomparer; private object _ syncRoot; private bucket [] buckets ;//....}

 

We know that when adding values in ArrayList and List <T>, if the capacity is insufficient, we will create an array of two times the capacity, and then copy the original array data to expand the capacity, therefore, when using ArrayList and List <T>, we recommend that you specify the capacity in the instantiation method.

What if Hashtable encounters insufficient internal array capacity during the add operation?

In fact, the processing method is basically similar to that of ArrayList, which is also a double expansion, but slightly different. The difference is:

First,The scale-up is slightly larger than that of the second time.As mentioned above in the Hashtable instantiation section, due to the characteristics of the hash algorithm, the array length is always a prime number, so when we need to expand, the length of the new array is a prime number that is equal to or greater than twice the length of the original array.

Second,If the array is not full, it must be expanded.Hashtable has a loading Factor of 0.72f. The actual data size is equal to the internal Array length * 0.72f. That is to say, an array with a length of 11 cannot be stored after seven data records are saved and the capacity needs to be expanded. The Code is as follows:

 
Public Hashtable (int capacity, float loadFactor) {... // initial state loafFactor = 1fthis. loadFactor = 0.72f * loadFactor; ......int num2 = (num> 11.0 )? HashHelpers. getPrime (int) num): 11; this. buckets = new bucket [num2]; // actual load that can be added this. loadsize = (int) (this. loadFactor * num2 );......} // The Add method of Hashtable calls the Insert method private void Insert (object key, object nvalue, bool add ){...... if (this. count> = this. loadsize) {// internal capacity expansion method this. expand ();}......}

 

That is to say, in Hashtable, if 100 locations are initialized, there are only 72 Available locations, and the other 28 cannot be used, but they still need to occupy the location.

From this we can know that,If the stored data volume is the same, Hashtable occupies more memory space than ArrayList.

Hash table-comparison between generic Dictionary <TKey, TValue> and Hashtable

. Net 2.0 and later have a generic type, which avoids the losses caused by packing and unpacking. In most cases, using a generic set has higher performance than a traditional set. List <T> corresponds to ArrayList, Dictionary <TKey, TValue> corresponds to Hashtable.

Dictionary <TKey, TValue> improves the hash value algorithm and hash conflict detection in addition to the advantages of generics, there is no waste of space available at 100 locations. Therefore, when using a hash table, the Dictionary <TKey, TValue> is preferred.

Conclusion:Hash Tables have obvious advantages in query, but occupy a little more memory space. When a collection in the business has a large number of search operations, hash tables are given priority. Once a hash table is determined, give priority to generic hash tables.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.