From: http://hi.baidu.com/bangongquan/blog/item/62560623a49b87569922ed07.html
Data structure special hash function 2007/01/13
Author: rushed out of the universe from hour41 (www.hour41.com)
In computing theory, there is no hash function, but one-way function. The so-called one-way function is a complicated definition. You can look at the computing theory or cryptography data. One-way function is described in the language "Human class": If a function is given input, it is easy to calculate the result. When a result is given, it is difficult to calculate the input. This is a single function. Various encryption functions can be considered as the approximation of one-way functions. A hash function (or a hash function) can also be considered as an approximation of a unidirectional function. That is, it is close to the one-way function definition.
The hash function has another meaning. In practice, the hash function maps a large range to a small range. The purpose of ing a large scope to a small scope is often to save space and make data easy to save. In addition, hash functions are often used for search. Therefore, before using the hash function, you need to understand the following restrictions:
1. The main principle of hash is to map a large range to a small range. Therefore, the number of actual values you enter must be equal to or smaller than a small range. Otherwise, there will be many conflicts.
2. As hash approaches unidirectional functions, you can use it to encrypt data.
3. different applications have different requirements on the hash function. For example, the hash function used for encryption mainly considers the gap between it and a single function, the hash function used for searching mainly considers the conflict rate mapped to a small range.
There have been too many discussions about hash functions used in encryption. I will give you a more detailed introduction in the author's blog. Therefore, this article only discusses the hash functions used for searching.
The main object used by the hash function is an array (such as a string), and its target is generally an int type. We will describe this method as follows.
Generally, hash functions can be divided into the following categories:
1. Add hash;
2. bitwise operation hash;
3. Multiplication hash;
4. Division hash;
5. query table hash;
6. Hybrid hash;
The following describes in detail the use of the above methods in practice.
1. Add hash
The so-called addition hash is to add the input elements one by one to form the final result. The structure of the standard addition hash is as follows:
Static int additivehash (string key, int prime)
{
Int hash, I;
For (hash = key. Length (), I = 0; I <key. Length (); I ++)
Hash + = key. charat (I );
Return (hash % prime );
}
Here, prime is any prime number. We can see that the value of the result is [0, prime-1].
Binary operation hash
This type of hash function uses a variety of bitwise operations (usually shift and XOR) to fully mix input elements. For example, the structure of the standard rotating hash is as follows:
Static int rotatinghash (string key, int prime)
{
Int hash, I;
For (hash = key. Length (), I = 0; I
Hash = (hash <4> 28) ^ key. charat (I );
Return (hash % prime );
}
First shift, and then perform a variety of bitwise operations is the main feature of this type of hash function. For example, the hash calculation code above can also have the following variants:
1. Hash = (hash <5> 27) ^ key. charat (I );
2. Hash + = key. charat (I );
Hash + = (hash <10 );
Hash ^ = (hash> 6 );
3. If (I & 1) = 0)
{
Hash ^ = (hash <7> 3 );
}
Else
{
Hash ^ = ~ (Hash <11> 5 ));
}
4. Hash + = (hash <5>
5. Hash = key. charat (I) + (hash <6> 16)-Hash;
6. Hash ^ = (hash <5> 2 ));
Three-way hash
This type of hash function uses the non-relevance of multiplication (this property of multiplication is most famous for its random number generation algorithm, although this algorithm is not effective ). For example,
Static int Bernstein (string key)
{
Int hash = 0;
Int I;
For (I = 0; I
Return hash;
}
The hashcode () method of the string class in jdk5.0 also uses multiplication hash. However, it uses a multiplier of 31. The recommended multiplier is 131,131 3, 13131,131 313, and so on.
The famous hash functions used in this method include:
// 32-bit FNV Algorithm
Int m_shift = 0;
Public int fnvhash (byte [] data)
{
Int hash = (INT) 2166136261l;
For (byte B: Data)
Hash = (hash * 16777619) ^ B;
If (m_shift = 0)
Return hash;
Return (hash ^ (hash> m_shift) & m_mask;
}
And the improved FNV algorithm:
Public static int fnvhash1 (string data)
{
Final int P = 16777619;
Int hash = (INT) 2166136261l;
For (INT I = 0; I
Hash = (hash ^ data. charat (I) * P;
Hash + = hash <13;
Hash ^ = hash> 7;
Hash + = hash <3;
Hash ^ = hash> 17;
Hash + = hash <5;
Return hash;
}
In addition to multiplying a fixed number, it is common to multiply it by a constantly changing number, for example:
Static int rshash (string Str)
{
Int B = 378551;
Int A = 63689;
Int hash = 0;
For (INT I = 0; I <Str. Length (); I ++)
{
Hash = hash * A + Str. charat (I );
A = A * B;
}
Return (hash & 0x7fffffff );
}
Although the adler32 algorithm is not widely used in CRC32, it may be the most famous one in multiplication hash. For more information, see the RFC 1950 standard.
Division hash
Division, like multiplication, also has seemingly non-relevance. However, because division is too slow, this method almost cannot find the real application. Note that the hash result we see earlier is divided by a prime to ensure the range of results. If you do not need it to limit a range, you can use the following code to replace "hash % prime": Hash = hash ^ (hash> 10) ^ (hash> 20 ).
Five-Table hash
The most famous example of table hash is the CRC series algorithm. Although the CRC algorithms are not table-based algorithms, table-based algorithms are the fastest way to implement them. The implementation of CRC32 is as follows:
Static int cralb [256] = {
0x00000000, 0x77073096, numeric, 0x990951ba, numeric, 0x97d2d988, numeric, 0x7eb17cbd, numeric, 0x90bf1d91, 0x1db71064, 0x6ab020f2, primary, 0x84be41de, 0x1ad47d, 0x6ddde4eb, primary, 0x83d1_c7, primary, 0x14015c4f, primary, 0xa2677172, primary, primary, secondary, values, numbers, 0xdbbbc9d6, numbers, 0x32d86e3, numbers, 0x26d930ac, 0x51de003a, numbers, 0x56b3c423, 0xcfba9599, numbers, numbers, 0x5f058808, primary, 0xb10be924, primary, 0x58684c11, 0xc1611dab, primary, primary, 0x01db7106, 0x98d220bc, primary, 0x086d3d2d, values, 0xe6635c01, numbers, 0x1c6c6162, 0x856530d8, 0xf262004e, numbers, 0x4db26158, 0x3ab551ce, 0xa3bc0074, primary, 0x0000ed9fc, 0xad678846, primary, 0x000042d73, 0x33031de5, primary, primary, 0x5005713c, primary, 0xbe0b1010, primary, primary, secondary, primary, values, numbers, 0x59b33d17, numbers, 0xc0ba6cad, numbers, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, hour, hour, 0xe40ecf0b, hour, 0x0a00ae27, hour, 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
0x196c3671, primary, 0xfed41b76, primary, primary, 0x67dd4acc, primary, primary, 0x17b7be43, primary, 0x38d8c2c4, primary, 0x48b2364b, 0xd80d2133, primary, primary, 0x36034af6, 0x000047a60, 0xdf60efc3, expires, expires, 0x4669be79, expires, 0x220216b9, 0x5505262f, 0xc5ba3bbe, expires, expires, expires, 0x5bdeae1d, expires, 0xec63f226, 0x756aa39c, 0x026d930a, expires, expires, 0x72076785, 0x05005713, expires, 0x7cdcefb7, 0x0bdbdf21, hour, hour, 0x68ddb3f8, hour, 0x81be16cd, hour, hour, 0x18b74777, hour, 0x11010b5c, hour, 0xf862ae69, 0x616bffd3, hour, 0x3903b3c2, 0xa7672661, primary, 0x4969474d, primary, primary, 0xd9d65adc, primary, 0xcabac28a, primary, primary, 0xbad03605, primary, 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
};
Int CRC32 (string key, int hash)
{
Int I;
For (hash = key. Length (), I = 0; I
Hash = (hash> 8) ^ crctl [(hash & 0xff) ^ K. charat (I)];
Return hash;
}
Examples of hash in the query table are: Universal hashing and Zobrist hashing. Their tables are all randomly generated.
Hybrid hash
The hybrid hash algorithm utilizes the preceding methods. Various common hash algorithms, such as MD5 and tiger, are in this range. They are rarely used in search-oriented hash functions.
7. Comment on the hash algorithm
The http://www.burtleburtle.net/bob/hash/doobs.html page provides a comment on several popular hash algorithms. Our suggestions for the hash function are as follows:
1. String hash. The simplest way is to use the basic multiplication hash. When the multiplier is 33, it has a good hash effect for English words (there are no conflicts if there are less than 6 lowercase letters ). To be more complex, you can use the FNV algorithm (and its improved form). It provides good speed and performance for long strings.
2. Hash of the long array. You can use the formula http://burtleburtle.net/bob/c/lookup3.c to calculate the speed.