Comprehensible data Structure C language version (14)--hash list

Source: Internet
Author: User

We know that because of the characteristics of the binary tree (perfect case can exclude half of the data each time), it is relatively fast to find, the time complexity of O (Logn). However, is there a data structure for finding that supports a constant level of time complexity? The answer is that there is a hash table (hash tables, also called a hash table). The hash list can support an O (1) insertion and, ideally, the lookup and deletion of O (1) can be supported.

  The basic idea of a hash table is simple:

1. Design a hash function whose input is the keyword of the data, the output is the hash value n (positive integer), and the different data keywords must derive different hash value n (that is, the hash function is required to conform to the single-shot condition)

2. Create an array hashtable (that is, a hash table), the inserted data is stored in hashtable[n], n is the hash value of the data and must be less than the maximum subscript of the hash table

In this way, inserting data only needs to calculate the hash value n of the data and then save the data to Hashtable[n]. Finding data calculates the hash value of n based on the data, and then checks if HASHTABLE[N] has data (perfect case Hashtable can even be bool type), delete the same. These operations are all O (1).

But a little reflection will find that the above-mentioned ideas can not be achieved under any circumstances. For one, it is not possible to have a single-shot hash function in any case , such as when the Data key is any integer, how does the keyword-A and a map? Secondly, even if the hash function is a single shot, the size of the hash table is not always guaranteed to be greater than all possible hash values , such as the data keyword is a positive integer, then the hash function only need to make the hash value equal to the data keyword can guarantee the single shot, but if the total data is 1000, and the maximum possible value for the data is 10000000, do we create a hash table of size 10000000?

In other words, when we actually implement a hash table, we must face both of these issues:

1. How to implement a hash function that is as close as possible to the single shot

2. How to handle this conflict when the hash value of different data keywords is the same

The first problem is obviously different from the situation, only given the data type and certain data characteristics, can write the corresponding, good hash function. For example, when the key of the data is a random positive integer, the simple hash function returns the key%tablesize directly, so there is not much problem. However, if you know that Tablesize is 100, and the data key bits and 10 bits must be 0, then such a hash function is not, must be modified.

In other words, the first problem is that there is no universal solution, and the realization of a good hash function itself is another algorithm design thing , so we do not discuss the first problem in depth. The following discussion assumes that the input data (the keyword) is a string that does not exceed 20 in length, and the hash function is as follows:

// A simple hash function that adds the ASCII value of the characters in the string and returns the result after the Tablesize int Hash (constcharint  tablesize) {    int0;      while '  / ' )        + =*target++        ; return hashval%tablesize;}

So what's the second question? Is there a universal solution to the conflict when different data is mapped to the same hash value? The answer is there, and there are many ways to solve it (but only one code is given here, and the other solutions only suggest ideas).

  There are three common methods for dealing with hash conflicts: separate links, open addressing, and double hashing. we'll give you the code for separating the links, and the other two are a little bit of a discussion.

  the idea of separating the links is simple: if more than one data is mapped to n, let the data stay in Hashtable[n].

Obviously, to have more than one data in hashtable[n], then the element type of HashTable must not be the same type as the data (if so, hashtable[n] can only save one data), but should be a linked list.

For example, assuming that tablesize is 7, according to the given hash function, the hash value of the keyword "ac" and "BB" is 0, then the hash table after inserting "ac" and "BB" should be as follows:

The way to find a hash table using a detached link is to calculate the hash value n for a given data, find Hashtable[n] (a linked list), and traverse the list in hashtable[n] to find out if there is a given data. The implementation of the deletion is to implement the Delete method of the linked list on the basis of the lookup.

It is not difficult to see that a good hash function is extremely important, assuming that the hash function always gives the same hash value, then the hash table using the Detach link method will eventually become a linked list (all data is mapped hash value n, so all data is stored in the list hashtable[n])

Now, we can start to implement a hash list, whose hash function we have given above, and its method of dealing with the conflict is a separate link.

First, the Hashtable element is a linked list, so you have to give the definition of the linked list node

#define Strsizestruct  listnode {    char  str[strsize];     struct ListNode *struct listnode *list;typedef List Position;    // position for finding and deleting

The next step is to design the hashtable itself, that is, to determine the element type of Hashtable, the simplest way is to make the struct ListNode as the Hashtable type

struct ListNode Hashtable[tablesize];

But this poses a problem: how to determine if Hashtable[n is empty or just one element? So we make the list the element type of hashtable, and the even if Hashtable element is a pointer to the first element of the list. Thus, if the linked list at Hashtable[n] is empty, then hashtable[n] is equal to null.

List Hashtable[tablesize];

But in order to make our hash list more adaptable, we want to make tablesize as a variable, that is, the size of the hash table can be given according to programming needs, so we design the hash list as follows, and use pointers in the program to access the hash table. At the same time, we give the code to initialize the hash table

structhashtbl {unsignedintsize; List*table;//table is the real hash table.};typedefstructHashtbl *hashtable;//our access to the hash table will pass through the pointer, because for example, finding such a function requires a hash table as a parameter, if passed in a struct hashtbl, it is better to pass in a struct HASHTBL *//Create a hash table header for a given sizeHashTable Initialize (unsignedinttablesize) {    //creates a hash table header and creates a hash list in the header based on the given size tablesizeHashTable h = (HashTable)malloc(sizeof(structhashtbl)); H->size =tablesize; H->table = (List *)malloc(sizeof(List) *tablesize); //initializes each element of the hash table (a pointer to the first element of the list) to null     for(inti =0; I < tablesize;++i) H->table[i] =NULL; returnh;}

Next is the code for the insert operation

//To Insert a string source into a hash table in HvoidInsert (HashTable H,Const Char*source) {    //This is actually the find () operation, but in order to find out the hash value of source, we do not use the find () directly//if source is already in the hash table, we return directlyUnsignedintHashval = Hash (source, h->size); Position P= h->Table[hashval];  while(P! = NULL && strcmp (p->str, source)) {P= p->Next; }    if(P! =NULL)return; //if source is not in a hash table, we calculate the hash value of the source and insert the source into the corresponding position of the hash tablePosition NewNode = (Position)malloc(sizeof(structListNode)); strcpy_s (NewNode-str, strsize, source); NewNode->next = h->Table[hashval]; H->table[hashval] =NewNode;}

It's easy to find and delete (the code you're looking for has already been implemented in the insert), which is not described here.

Next we talk about what is open addressing.

First, according to the basic idea of the hash table, if a data hash value is n, then it should be "addressed" to Hashtable[n], which is also the root of the separation link method (since you hash the value n, then you have to stay in Hashtable[n])

  The open addressing method as the name implies, the data is no longer "addressable", a data key hash value is n, but it is not necessarily located at Hashtable[n].

The open addressing method does this: if the Data key hash value is n, it is inserted into the hashtable[n], and if there is data in hashtable[n], it is inserted into hashtable[(n+1)%tablesize], and if there is data at that point, Insert to hashtable[(n+2)%tablesize], and so on, until you encounter a space where the data is inserted, or if the hash list still does not have gouges, the insertion fails. Such inserts are called "linear probing"

The lookup operation is: Calculate the hash value N, compare Hashtable[n] and data, if the same is found, otherwise compare hashtable[(n+1)%tablesize] and data, until an empty node, then the description did not find

Delete operation must be lazy delete, because if the deletion, the open method of insertion and lookup will be confused, that is Hashtable element type must be a new structure containing data type, its existence frequency field is used to indicate whether the data exists or the same number of data.

  

  The open addressing method can save pointer space compared to the separate link method, but it also brings two problems:

1. If the data is inserted, always follow the form of n=n+1 to find an empty hashtable[n], then the data is prone to "centralized" phenomenon. (such as inserting three hashes of 80 data and inserting two hashes of 81 and 83, then they will all "squeeze" between "hashtable[80" to hashtable[84])

2. Set the filling factor ω= the number of data inserted/tablesize, then the closer to 1 Ω, open addressing method of the operation of the slower, and it is likely to occur insertion failure

  for the first problem, there are two ways to improve, one is to use the "square probe" form of the insert , even if n+=2*++n-1, rather than n=n+1, which can reduce the concentration, but the same hash value of the data may still appear "two times the concentration" phenomenon. The other option is double hashing , which is the conflict seasonal n=n*hash2 (key), in essence, the square probe, the linear detection and the double hash are similar, all in the event of a conflict to find another place to store data, of course, this other find must be reproducible.

  for the second problem, the workaround is to hash again , that is, when Ω is greater than a certain amount, recreate the new, larger hash list, and then move the data to the new hash table. That is, "hash again."

sample program code for a hash table using the Detach link method:

Https://github.com/nchuXieWei/ForBlog-----HashTable

  

Comprehensible data Structure C language version (14)--hash list

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.