Data structure--hash (i)

Source: Internet
Author: User

A binary tree provides the power to perform a variety of operations on a set of data, especially when it is convenient to find a binary tree when dealing with ordered data. For example, Findmax and findmin operations, where the two operation time complexity is O (N) in a table data structure, but both operations in the binary lookup tree require only the time complexity of O (Logn). In many cases, however, the order of the data is not an issue of application concern. This kind of application only needs to insert, delete, look for these basic operations, at this time using a two-tree data structure to do these operations is also O (Logn), when the volume of data is very large, such a operation is very time-consuming. In order to solve this problem, a data structure that can insert, delete and find operations in constant time complexity is presented, hash table. This data structure does not support sequence-related operations, but the advantage is that the time required to process and delete operations is independent of the amount of data, ensuring that it remains efficient in the case of large data volumes.

General idea of hashing

Ideally, a hash table is a fixed-size array, with the size of tablesize, the index of the stored position in the table varies from 0 to TableSize-1, where Tablesize is also part of the hash-table data structure, not just a variable that floats in the global context. The keyword (the data to be stored) is mapped to a number within that range and is stored in the appropriate unit. The so-called mappings are hashes and require the support of hash functions. Ideally, hash functions can map different keys to different locations, but in practice this is not possible because the number of keywords is generally much larger than the range of values that can be mapped to. This will have to deal with the case where different keywords are mapped to the same location, that is, resolving conflicts.

hash function

If the keyword is an integer, then the more reasonable hash function is the key mod tablesize, while the tablesize is preferably prime, which ensures that the hash results are relatively uniform. Typically, the keyword is a string, which requires careful consideration of the hash function. Consider a hash function selection with a tablesize size of 10007 for this prime case:

A simple idea is to add each character value in a string (a character type can also be considered an integral type):

typedef unsigned int Index;

Index Hash (const char *key,int tablesize)

{

unsigned int hashval=0;

while (*key!= ')

{

hashval+=*key++;

}

return hashval% tablesize;

}

In this case, the string length is assumed to be up to 8 characters, because one character is encoded at a maximum of 127, then the total size of the 8 characters is 1016,tablesize 10007, and only a small portion of the hash table can be truly allocated, so this is not an even distribution. Another way of thinking is to use only the first three characters of a string, so that the public 26*26*26=17576 a possible value, in the case of Tablesize 10007, the distribution is more uniform, the implementation of the following:

Index Hash (const char * key,int tablesize)

{

Return (key[0]+27*key[1]+729*key[2])%tablesize;

}

But in actual statistics, the first three characters of a string are not random, and the 3-character combination is actually only 2851 possible. The result of this hash function is not evenly distributed. The third attempt to hash is to consider all the characters in the string and use it to calculate the hash value, which is implemented as follows:

Index Hash (const char *key,int tablesize)

{

unsigned int hashval=0;

while (*key!= ')

Hashval= (hashval<<5) +*key++;

Return hashval%tablesize;

}

Where you choose to multiply the power of I by 32 is because multiplying by 32 equals moving the 5 bits to the left, faster. It replaces 27, which represents 26 English characters and a null symbol. It is proved that this hash function can better achieve the uniform distribution of the string.

Resolve Conflicts

When the keyword is mapped to a location in the hash table, it is unavoidable to encounter different keywords mapped to the same position in the table, in this case there are two ways of processing, one is the separation link method, the main idea of this method is to link the keyword to the table in the linked list, This allows the same table location to hold many keywords. Another approach is open addressing, which saves the first keyword mapped to the position in the table to the corresponding location in the table, and if the next keyword is mapped to this location, it will be placed in a different table location by a certain way. When resolving conflicts in this way, the hash table capacity is larger than the number of data stored in the hash table. The next article will describe both ways to resolve the conflict.

Data structure--hash (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.