Dotnet Source code interpretation of the mystery of--hashtable directory extension

Source: Internet
Author: User
Tags dotnet rehash

Dotnet Source code interpretation of the mystery of--hashtable directory extension



Abstract: In order to explore the directory structure of Hashtable in dotnet and the algorithm related to directory extension, this article through the relevant source reading and analysis, concluded that Hashtable directory is organized by the array, the directory element represents a data node, not a data bucket. The directory extension is the first prime number encountered in extending the current directory length twice times to 1 times. Directory extension trigger conditions: load factor trigger, taking into account that "clutter" requires re-hashing. The directory extension needs to traverse all the elements in the original directory. The query process is similar to probing a hash.

Keywords: dotnet,hashmap, Directory extension method, Directory extension trigger condition

I. Directory structure

This article originated from: http://blog.csdn.net/daliaojie/article/details/26366795

First, let's take a look at the main member variables of this class

        Private bucket[] buckets;        private int count;        Private Const int initialsize = 3;        private float Loadfactor;        private int loadsize;        private int occupancy;   

Buckets are directories, which are maintained using arrays.

And see what buckets are:

        Private struct bucket        {public            object key;            public object Val;            public int hash_coll;        }

It turns out he is a struct, a value type. In other words, the directory location in Hashtable is not a data bucket, but a key-value pair. Just one data node.


Count is the number of elements that have been loaded inside, loadfactor and loadsize are loading factors and thresholds respectively. Occupancy said later.


Second, insert operation

1. Hash conflict resolution Method

public virtual void Add (object key, object value)        {this            . Insert (key, value, true);        }
We track the Insert method:

 private void Insert (Object key, Object Nvalue, bool add) {UINT num;            UINT Num2; if (key = = null) {throw new ArgumentNullException ("Key", Environment.getresourcestring ("Argumen            Tnull_key "));            } if (This.count >= this.loadsize) {This.expand (); } else if ((This.occupancy > This.loadsize) && (This.count >)) {th            Is.rehash (); } UINT NUM3 = this.            Inithash (Key, this.buckets.Length, out num, out num2);            int num4 = 0;            int index =-1;        int NUM6 = (int) (num% this.buckets.Length); Label_0071:if ((Index = =-1) && (This.buckets[num6].key = = this.buckets)) && (this.buckets[nu            M6].hash_coll < 0)) {index = NUM6; } if ((This.buckets[num6].key = = null) | | (This.buckets[num6].key = = ThiS.buckets) && ((This.buckets[num6].hash_coll & 0x80000000l) = = 0L)) {if (Index! =                -1) {NUM6 = index;                } thread.begincriticalregion ();                This.iswriterinprogress = true;                This.buckets[num6].val = Nvalue;                This.buckets[num6].key = key;                This.buckets[num6].hash_coll |= (int) num3;                this.count++; This.                Updateversion ();                This.iswriterinprogress = false;            Thread.EndCriticalRegion (); } else if (((This.buckets[num6].hash_coll & 0x7fffffff) = = num3) && this.  Keyequals (This.buckets[num6].key, key)) {if (add) {throw new ArgumentException (environment.getresourcestring ("argument_addingduplicate__", new object[] {THIS.BUCKETS[NUM6].                Key, key}); } thread.begincriticAlregion ();                This.iswriterinprogress = true;                This.buckets[num6].val = Nvalue; This.                Updateversion ();                This.iswriterinprogress = false;            Thread.EndCriticalRegion ();                } else {if (index = =-1) && (this.buckets[num6].hash_coll >= 0))                    {This.buckets[num6].hash_coll |=-2147483648;                this.occupancy++;                } NUM6 = (int) ((num6 + num2)% ((ULONG) this.buckets.Length));                if (++num4 < this.buckets.Length) {goto label_0071; } if (index = =-1) {throw new InvalidOperationException (ENVIRONMENT.GETR                Esourcestring ("invalidoperation_hashinsertfailed"));                } thread.begincriticalregion ();                This.iswriterinprogress = true; This.bucKets[index].val = Nvalue;                This.buckets[index].key = key;                This.buckets[index].hash_coll |= (int) num3;                this.count++; This.                Updateversion ();                This.iswriterinprogress = false;            Thread.EndCriticalRegion (); }        }

The initial start of the insert operation is to determine whether the extension and re-hashing are required. Extension (expand) and re-hashing (rehash) two operation trigger conditions and operation, we trace later.

That is to say we do not need to extend and re-hash at this time.

The insert operation later calculates the index of the directory where the key is located, and if there is no data at that location, it can be occupied if it is already occupied and the key value is equal, the default action is to replace the value of value. Otherwise, the directory location is already

is occupied, and the key is not equal, then we will choose another location to detect the suitability, if appropriate, insert. The way to choose a location again is not to simply pick the next-door location, but add a number. This is done in order to find the free location faster, and it is obvious that the hash resolves the conflict in the way of the open address method.

Here's

Occupancy

It should be understood that the number of elements is not where he should be, and the index calculated by key is not the position of its son.

2. Re-hashing and extension methods

Re-hash trigger condition

(This.occupancy > This.loadsize) && (This.count > 100)


That is, if the element that occupies the wrong position reaches this threshold and the number of member loads reaches 100, then the hash is initiated.

Trigger conditions for extended operations:

This.count >= This.loadsize

That is, the number of load elements to the threshold to trigger the expansion operation, in fact, it is related to the load factor.


We look at their corresponding source code.

private void Rehash ()        {            this.rehash (this.buckets.Length);        }

private void expand ()        {            int prime = Hashhelpers.getprime (This.buckets.Length * 2);            This.rehash (prime);        }

We found out they were all called.

private void Rehash (int newsize)        {            this.occupancy = 0;            hashtable.bucket[] newbuckets = new Hashtable.bucket[newsize];            for (int i = 0; i < this.buckets.Length; i++)            {                Hashtable.bucket bucket = this.buckets[i];                if (Bucket.key! = null) && (Bucket.key! = this.buckets))                {                    this.putentry (newbuckets, Bucket.key, Bucket.val, Bucket.hash_coll & 0x7fffffff);                }            }            Thread.BeginCriticalRegion ();            This.iswriterinprogress = true;            This.buckets = newbuckets;            this.loadsize = (int) (This.loadfactor * newsize);            This. Updateversion ();            This.iswriterinprogress = false;            Thread.EndCriticalRegion ();        }

Just as the hash operation passes in the length of the current directory, and the extension is passed in, the first prime number encountered from twice times the current directory length to 2 times. They believe that the probability of conflict is small after the prime number hash.

Here are the strategies for prime numbers, refer to the article:

Let's look at this method:

Resets the number of elements in the wrong position to zero.

New A bucket array of the specified length.

Iterate through each of the existing elements in the old bucket array directory.

Put them in a new directory.

private void Putentry (bucket[] newbuckets, Object key, Object nvalue, int hashcode) {uint num = (UINT)            Hashcode;            UINT NUM2 = (UINT) (1 + ((num >> 5) + 1)% (newbuckets.length-1));        int index = (int) (num% newbuckets.length); Label_0017:if ((Newbuckets[index].key = = null) | |            (Newbuckets[index].key = = this.buckets))                {newbuckets[index].val = Nvalue;                Newbuckets[index].key = key;            Newbuckets[index].hash_coll |= hashcode;                    } else {if (newbuckets[index].hash_coll >= 0) {                    Newbuckets[index].hash_coll |=-2147483648;                this.occupancy++;                } index = (int) ((index + num2)% ((ULONG) newbuckets.length));            Goto label_0017; }        }

This operation is similar to inserting, and it has to be a conflict resolution method.

Here we know that the method of directory extension is to expand the directory of the first prime number that is less than twice times the current directory length.

End:

Through the analysis and interpretation of HashMap source code in dotnet, it is concluded that the directory of Hashtable is organized by an array, the directory element represents a data node, not a data bucket. The directory extension is the first prime number encountered in extending the current directory length twice times to 1 times. Directory extension trigger conditions: load factor trigger, taking into account that "clutter" requires re-hashing. The directory extension needs to traverse all the elements in the original directory. The query process is similar to probing a hash.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.