After several years of development, I have never understood the principles of the hash algorithm. Today, I know from Baidu that I have the same understanding:
This issue is a bit difficult to clarify. Let me make a metaphor. We have many piglets, each of which has different weights. Assuming that the weights are evenly distributed (we consider the kilogram level), we divide them into 100 pigtails by weight. Then, each pig is pushed to its pigsty by weight to record the files. Well, what if we want to find a pig? Do we need to compare each pigsty and each pig? Of course not required. Let's first look at the weight of the pigsty we are looking for, and then find the corresponding pigsty. The number of piglets in this pigsty is relatively small. We can quickly find the pigsty we want to find in this pigsty. Corresponding to the hash algorithm. It is to allocate different pigsty circles according to the hashcode and put the pigs with the same hashcode in one pigsty. When searching, first find the pigsty corresponding to the hashcode, and then compare the piglets one by one. So the key to the problem is how many pigtails can be built. If the weight of each pig is all different (taking into account the MG level) and each pig is built, we can find the pig as quickly as possible. The disadvantage is that the cost of building so many pigtails is too high. If we divide the pigsty by 10 kilograms, there will be only a few pig circles, so there will be a lot of pig in each circle. Although we can quickly find the pigsty, it is also very tiring to identify the pigsty one by one from this pigsty. Therefore, the good hashcode can be based on the actual situation, according to the specific needs, in the time cost (more pigsty, faster speed) and Space (less pigsty, lower Space requirements.
The examples are very relevant and easy to understand, but not comprehensive.
Hash is usually translated as "hash", which is also directly translated as "hash", that is, input of any length (also called pre- ing, pre-image ), the hash algorithm is used to convert an output with a fixed length. The output is the hash value. This type of conversion is a compression ing, that is, the space of hash values is usually much smaller than the input space, and different inputs may be hashed into the same output, instead, it is impossible to uniquely determine the input value from the hash value. Simply put, a function compresses messages of any length to a fixed-length message digest.
Hash is mainly used for encryption algorithms in the information security field. It converts information of different lengths into messy 128-bit codes. These encoding values are called hash values. it can also be said that hash is to find a ing between the data content and the data storage address.
For example, the hash algorithm of string hello
Char * value = "hello"; int key = (27 * (INT) 'H' + 27) * (INT) 'E') + 27) * (INT) 'l') + 27) * (INT) 'l' + 27) * 27) + (INT) 'O ';.
Next, let's talk about the hash table (hash table). As a data structure, the hash table has the features of fast query and easy operation. This can be different from arrays and linked lists. The features of arrays are as follows: it is easy to address, but difficult to insert and delete. The chain table is characterized by hard addressing and easy to insert and delete. Hash Tables make up for the shortcomings of the two. Hash Tables have different implementation methods. To make it easier for everyone to understand, we will introduce in detail the following common methods: zipper, we can understand it as addressing with a linear array. The elements in the array are a linked list structure and used as a collection of elements (such as insertion and deletion ), in this way, you can quickly perform addressing, insertion, and deletion operations. The following figure shows the operations:
The article is just a brief introduction, which makes it easier for everyone to understand what a hash algorithm is. If there is anything wrong, please let us know.
Hash Algorithm details