recently look at the PHP array underlying structure, using a hash table, so still honestly go back to look at the structure, here to summarize.

1. Definition of a hash table

Here's the definition of a hash table: a hash table is a data-mapping structure that looks for a value based on the key code, which may feel a bit complicated by looking at the location of the key code mapping to find the place where the value is stored, and I think I'll take an example and you'll see that the most typical example is a dictionary. We estimate that the primary school also used a lot of Xinhua dictionary, if I want to get " press " The word details, I will definitely go to the pinyin an to find Phonetic index (of course, can also be a radical index), we first to check an in the dictionary position, checked a bit to get "Ann", The results are as follows. This process is the key code mapping, in the formula, is to find the key to F (key). Where, by the keyword (key), F () is the dictionary index, which is the hash function, the page number 4 is the hash value.

2. Hash conflicts

But the problem comes again, we have to look for " press ", not " Ann , but their pinyin is the same." That is, the keyword press and the keyword ann can be mapped to the same dictionary page 4 location, which is the hash conflict (also known as hash collision), expressed in the formula is Key1=key2, but F (key1) =f (Key2). Conflict will bring trouble to find, you think, you originally looked for "press", but found "Ann" word, you have to turn back one or two pages, in the computer is the same reason.

But hash collisions are unavoidable, why do you say so, because if you want to completely avoid this situation, you can only open a new page per dictionary, and then each word in the index has a corresponding page number, which can avoid conflict. But it can cause space to grow (each word has a page).

Since it is unavoidable, it is only possible to minimize the damage caused by the conflict, and a good hashing function needs to have the following characteristics:

1. Try to make the corresponding records of the keyword evenly distributed in the hash table (for example, a manufacturer sells 30 houses, evenly divides the ABC3 area, if you divide a area 1 houses, B area 1 houses, c area 28 houses, someone to find a house in the C area the worst case is to look for 28 times).

2. Minimal changes in the keyword can cause a significant change in the hash value.

The better hash function is the time33 algorithm. The array of PHP is to use this as a hash function.

The core algorithm is as follows:

long Hash (constChar* key) { long hash=0; for (int i=0; I<strlen (key); i++) { = hash*+str[i]; } return hash;}

3. Hash conflict resolution

If there is a conflict, how does the hash table generally solve it? There are many specific methods, Baidu will also have a bunch, the most commonly used is the development of addressing law and chain address method.

1. Development of addressing methods

What happens when you encounter a conflict? Just find the rest of the hash table, find free space and insert it. Just like you went to the store to buy things, and found that things sold out, how to do? Find the next business to buy something to sell.

As I did not go into the experiment, I pasted the explanation in the book:

2. Link Address method

The principle of the development approach described above is that when encountering a conflict, the lookup follows the original hash address to find the next idle address and then inserts it, but there is also the problem that if there is not enough space, he cannot handle the conflict and cannot insert the data, so the filling factor (space/Insert data) >=1 is required.

Is there a way to solve this problem? The chain address method can, when the principle of the chain address method encounters a conflict, he creates a new space at the original address, and then inserts it into the space as a linked table node . I feel that the most used in the industry is the chain address method. The following from Baidu intercepted a picture, you can clearly understand the structure of the reaction below. For example, I have a bunch of data {1,12,26,337,353 ...}, and my hashing algorithm is H (key) =key MoD 16, the hash value of the first Data 1 f (1) = 1, inserted after the 1 node, the second data 12 hash value f (12) = 12, inserted into the 12 node, The hash value of the third data 26, F (26) = 10, inserted after the 10 node, 4th Data 337, calculated that the hash value is 1, encountered a conflict, but still only need to find the last link node of the 1 node insertion, the same 353.

The following is an analysis of how to implement the chain address method in C + +.

The first step.

Must be building a hash table.

First define the link node, which is shown in struct node, where node has three attributes, one is the key value, one value, and the other is a pointer to the list. There is also a hash table as a class.

#defineHashsize 10typedef unsignedint UINT; typedefstructnode{Const Char*key; Const Char*value; Node*Next;} Node;classhashtable{Private: Node*Node[hashsize]; Public: HashTable (); UINTHashConst Char*key); Node* Lookup (Const Char*key); BOOLInstallConst Char* Key,Const Char*value); Const Char*Get(Const Char*key); voiddisplay ();};

Then define how to construct the hash table

hashtable::hashtable () { for (int0; i < hashsize; + + i) { = NULL;} }

The second step.

hash algorithm for defining hash tables, where I use the time33 algorithm.

UINT Hashtable::hash (constChar* key) { uint hash=0; for (; *key; + +key) { hash=hash*+*key; } return hash%hashsize;}

The third step.

Define a lookup node based on key method, first, the hash function to calculate the head address, and then according to the head address down one by one to find the node, if the node key and find the key value is the same, the match succeeds.

node* hashtable::lookup (constChar* key) { *np; UINT index; = hash (key); for (np=node[index];np;np=np->Next) { if(!strcmp (key,np->key)) return np; } return NULL;}

Fourth step.

Define a method for inserting a node, first of all to see if the node for that key value exists, and if so, to change the value of it, or insert a new node if it does not exist.

BOOLHashtable::install (Const Char* Key,Const Char*value) { UINTindex; Node*NP; if(! (np=lookup (key))) {Index=hash (key); NP= (node*)malloc(sizeof(Node)); if(!NP)return false; NP->key=key; NP->next =Node[index]; Node[index]=NP; } NP->value=value; return true;}

Finally, the complete code is attached.

Click here to download

Hash table of data structure