Reprinted please indicate the source and the author contact: http://blog.csdn.net/mimepp
Contact information: Yu Tao <yut616 at Sohu dot com>
Hash is a complicated thing. It is not so exaggerated to understand it. Take a note here.
Hash: Translation of Chinese characters into messy things. Some people call it "hash ".
To put it simply, hash is used to convert a complex string to obtain a simple number (usually a number ).
For example, "ABCD" is directly added with the values of each character, and then the remainder of 10 is obtained (A + B + C + D) % 10 to get a number, for example, if the result is 5, this 5 represents the ABCD string in a certain sense. In other words, this 5 is also a mark of the string, and it is a simplified Mark, so someone calls this 5 as the string's abstract or fingerprint.
This 5 can be used as the subscript of an array. For example, I construct a pointer array void * hash_array [10], then I can fill in a pointer at the position 5, for example, pointing to the ABCD string.
In this case, if I want to query whether a string exists, I do not need to use the string loop to compare the slow operation on an array, and directly obtain the hash value of a string first, use this hash value to search for it directly in the array subscript, which is much faster, especially when there is a large amount of data.
We can see that when the hash value is calculated above, the result may not start from 0, for example, 5. That is to say, this 5 is an uncertain position in the array, or it can be called a position that is merged. Other locations may remain empty. This is why the array or table is called a hash table.
But there is a problem. The above conversion method is directly added, and then an remainder is obtained. When the string is changed to ABDC, the result is still number 5. This is a problem with the above algorithm, that is, it cannot guarantee a uniqueness. Therefore, many hash algorithms, such as md4, MD5, and SHA-1, have been studied to ensure uniqueness.
However, this algorithm can still be used. After ABDC returns the hash value of 5, check whether 5 is occupied. If so, add the number to 1, it is 6. If 6 is not used, fill in the value. If the value of a subsequent string is 6, but 6 is occupied, add 1 to it and save it again.
When retrieving data, you can first calculate the hash value and then check whether the content in it is what you want. If not, add 1 to check and finally get one.
So here, the content of the hash table is not organized as an array at the very beginning, but gradually increased in the future.
The content stored in the hash table can generally be a pointer, which can point to a large structure. This structure can contain key and value information.
The hash table can also be an array. You can organize it into a linked list. The node Structure in the linked list can contain a parameter hash_value of the number for quick search.
Although hash is often used for encryption and other occasions, it can also be used in common application code to store simple data, which will improve the code efficiency.