Searching objects with hash tables in C #

Source: Internet
Author: User
Tags hash sort

Most containers in the. NET Framework are sequential containers (sequence containers): They store objects sequentially. This type of container has a lot of functionality-you can store any number of objects in any particular order.

However, this versatility is at the expense of certain performance. The time it takes to find a special object in a sequence depends on the number of objects in the container. If we do not sort the elements in the container, then as the number of elements increases, the search time you need is increased linearly: if the number of elements in the container increases by one time, then you increase the amount of times you use to find a particular element. However, if we sort the elements in the container, then the lookup time increases with the logarithm of the number of elements: to increase the time to find an element one times, you must increase the number of elements in the collection by four times times. If you use a key to search for objects, you can use a better method than a sequential container to store your objects. You can use hash tables (hash table).

The hash table stores the object in the buckets based on a numeric keyword (key) called Hash. Hash value is a number computed from the value in the object. Each of the different hash value will create a new bucket. To find an object, you only need to compute the hash value of the object and search for the corresponding bucket. By quickly finding the appropriate bucket, you can reduce the number of objects you need to search for.

For example, imagine that there are some customer records in a data structure, and you want to search for those records by credit card numbers. A simple hash function would use the last two digits of the credit card number, which would form 100 buckets--from 00 to 99 each of the two-digit digits would create a bucket. (Again, using the three digits will create 1000 buckets.) Only need to query a bucket, you can find any records, and do not need to query all buckets.

However, like anything else, not everything is so simple. If you create a hash function with a credit card number, and you want to find the customer by name, you need to query the entire hash table, which will take a lot of time. This is because the hash table is in a different field as key. Also, if you query the entire hash table, the elements are not necessarily sorted in the order you want them to be. Elements are arranged according to the hash value, not by the keys.

In this article, I'll elaborate on my example in the previous article ("Creating a class for a better collection") and let you modify an employee record. Suppose you have a big company with thousands of employees in the company, you want to find a record in the quickest way. A hash table for all employees can make the search complete in the shortest time possible.

A hash function needs to have a certain attribute. For beginners, the hash function must be invariant. This means that the same key must generate the same hash value, and once the object is created, the hash value cannot be changed. If the hash value changes, you can no longer find the corresponding object in the hash table.

The second property that a hash function requires is the ability to distribute buckets evenly. If all objects generate the same hash value, then more time is needed to find a particular object.

In fact, these two principles are easy to follow. There are 178 classes in the. NET framework that overload the GetHashCode () to better play their role. The implementation of classes in all. NET FCL (Framework class Library) ensures a better allocation of hash value, and follows the principle of uniqueness. You should determine whether your own classes and structs need to overload the GetHashCode () method. The simplest (and usually best) method is to select a constant member in the key and use the hash value generated by that member.

An obvious hash key for an employee database is the Social Security Number (Social). Not only does it not change, but the nine-digit number also allows you to use it arbitrarily to get the performance you expect. You can download the sample to see what the difference is between using the hash keys for searching and using a sequential container for searching.

To add an employee to a hash table, you can create a nine-digit number and use it as a key:

int hash = 111223333;
for (int i = 0; i < 100; i++)
{
   string lastname = "Person" + i.ToString();
   e = new Employee ("Employee", lastname, (200-i)*200);
   members.Add(hash++, e);
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.