C # search objects in a hash table
Author: Bill Wagner
Download source code
Most containers in. NET Framework are sequential containers: they store objects in order. This type of container has many functions-you can store any number of objects in any special order.
However, this versatility is at the cost of certain performance. The time required to search for a special object in a sequence depends on the number of objects in the container. If we do not sort the elements in the container, as the number of elements increases, the search time you need increases linearly. If the number of elements in the container doubles, the time you use to search for a special element is doubled. However, if we sort the elements in the container, the search time increases with the logarithm of the number of elements: to double the time for searching an element, you must increase the number of elements in the set by four times. If you use a key to search for objects, you can store your objects in a better way than a sequential container. You can use a hash table ).
A hash table stores objects in buckets based on a key. Hash Value is a number calculated from the value of an object. Each different hash value creates a new bucket. To search for an object, you only need to calculate the hash value of the object and search for the corresponding bucket. By quickly finding the corresponding bucket, you can reduce the number of objects you need to search.
For example, suppose there are some customer records in a data structure and you want to search for those records by credit card number. A simple hash function uses the last two digits of a credit card number to create a bucket for each of the 100 buckets, from 00 to 99. (Similarly, the last three digits will create 1000 buckets .) You only need to query a bucket to find any records without querying all the buckets.
However, like everything, not everything is so simple. If you create a hash function with a credit card number and you want to search for the customer by name, you need to query the entire hash table, which takes a lot of time. This is because the hash table uses a different field as the key. In addition, if you query the entire hash table, there is no need to arrange the elements in the order you want. Elements are arranged based on hash values, rather than keys.
In this article, I will detail the examples in my previous article ("Create a class for a better set"), allowing you to modify an employee record. Suppose there is a large company with thousands of employees. You want to find a record in the fastest way. A hash table for all employees can complete the search in the shortest time.
A hash function must have certain attributes. For beginners, the hash function must be unchanged. This means that the same key must generate the same hash value. Once an object is created, the hash value cannot be changed. If the hash value changes, you can no longer find the corresponding object in the hash table.
The second attribute required by the hash function is the ability to evenly allocate buckets. If all objects generate the same hash value, it takes more time to find a special object.
In fact, these two principles are easily followed. There are 178 classes in. NET Framework that overload gethashcode () to better serve them. All classes in. Net FCL (framework class library) ensure a better distribution of hash values and follow the uniqueness principle. You should determine whether your own classes and structures need to overload the gethashcode () method. The simplest (usually the best) method is to select a constant member in the key and use the hash value generated by that member.
An obvious hash key of the employee database is the social security number ). Not only does it not change, but the nine-digit number can also be used at will to get the expected performance. You can download the sample to see how different it is to use Hash keys for search and use a sequential container for search.
To add employees to a hash table, you can create a nine-digit number and use it as the key:
int hash = 111223333;for (int i = 0; i < 100; i++){ string lastname = "Person" + i.ToString(); e = new Employee ("Employee", lastname, (200-i)*200); members.Add(hash++, e);}
The social insurance number meets the requirements of a good hash key: it will not change, it can be reasonably allocated, the value depends on the number rather than the reference. (You need to use the value-based hash key instead of the reference-based hash key to avoid problems I have mentioned before .) It is also easy to use this hash key to search for objects:
int ssn = Int32.Parse(this.SSN.Text);currentEmp = (Employee)members[ssn];if (currentEmp != null){ LastName.Text = currentEmp.LastName; FirstName.Text = currentEmp.FirstName; Salary.Text = currentEmp.Salary.ToString ();} else LastName.Text = "Not Found";
In C #, you can use array syntax to search for objects in a hash table. This syntax emphasizes the concept of constant time search: You can regard array access as a fast operation, rather than a very expensive function call.
The last focus of a hash table is that, like all sets, they also store references ). You don't need any extra work to update the objects in the hash table. Once you reference an object in a hash table, you can modify it at will. Remember, the same principle does not apply to keys. You can write code to change the keys, but if the code modifies the hash value, you will lose the objects in your set.
A hash table is a useful and effective container. However, to use them effectively, you need to understand the relationship between the container and the state of objects in the container. A hash table is useful when you can search for objects with the same value calculated from objects. If you search for objects in different order (by name, social insurance number, or age), the hash table is not that useful.