String Hash Function

Source: Internet
Author: User

Basic Concepts
The so-called perfect hash function refers to a hash function without conflict, that is, to any key1! = Key2 has h (key1 )! = H (key2 ).
Set the custom domain to X, the value range to Y, n = | X |, m = | Y |, then there must be m> = n. For different key1, key2 belongs to X, h (key1 )! = H (key2), then h is called the perfect hash function. When m = n, h is called the minimum perfect hash function (this is a one-to-one ing ).

When processing large-scale string data, it is often necessary to assign an integer ID to each string. This requires a string hash function. How can we find a perfect string hash function?
There are some common string hash functions. Such as BKDRHash, APHash, DJBHash, JSHash, RSHash, SDBMHash, PJWHash, and ELFHash. They are all classic.

The following is a reprinted analysis of several common string hash functions:
Http://www.cnblogs.com/atlantis13579/archive/2010/02/06/1664792.html

Common string Hash functions, such as ELFHash and APHash, are simple and effective methods. These functions use bitwise operationsEach character affects the final function value.. There are also Hash Functions Represented by MD5 and SHA1, which are almost impossible to find a collision.

Common string hash functions include BKDRHash, APHash, DJBHash, JSHash, RSHash, SDBMHash, PJWHash, and ELFHash. I have made a small evaluation of the above hash functions.

Hash Function Data 1 Data 2 Data 3 Data 4 Data 1 score Data 2 score Data 3 score Data 4 score Average score
BKDRHash 2 0 4774 481 96.55 100 90.95 82.05 92.64
APHash 2 3 4754 493 96.55 88.46 100 51.28 86.28
DJBHash 2 2 4975 474 96.55 92.31 0 100 83.43
JSHash 1 4 4761 506 100 84.62 96.83 17.95 81.94
RSHash 1 0 4861 505 100 100 51.58 20.51 75.96
SDBMHash 3 2 4849 504 93.1 92.31 57.01 23.08 72.41
PJWHash 30 26 4878 513 0 0 43.89 0 21.95
ELFHash 30 26 4878 513 0 0 43.89 0 21.95

The number of hash conflicts between a random string consisting of 100000 letters and numbers is 1. Data 2 is the number of hash conflicts between 100000 meaningful English sentences. The hash value of data 3 is the number of conflicts stored in the linear table after the modulo of data 1 and 1000003 (large prime number. Data 4 is the number of conflicting values stored in the linear table after modulo the hash value of data 1 and 10000019 (larger prime number.

After comparison, the above average score is obtained. The mean is the square average. We can find that BKDRHash is the most effective in both actual and encoding implementation. APHash is also an excellent algorithm. DJBHash, JSHash, RSHash, and SDBMHash have their own merits. PJWHash and ELFHash have the worst effect, but their scores are similar, and their algorithms are essentially similar.

unsigned int SDBMHash(char *str)
{
unsigned int hash = 0;

while (*str)
{
// equivalent to: hash = 65599*hash + (*str++);
hash = (*str++) + (hash << 6) + (hash << 16) - hash;
}

return (hash & 0x7FFFFFFF);
}

// RS Hash Function
unsigned int RSHash(char *str)
{
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;

while (*str)
{
hash = hash * a + (*str++);
a *= b;
}

return (hash & 0x7FFFFFFF);
}

// JS Hash Function
unsigned int JSHash(char *str)
{
unsigned int hash = 1315423911;

while (*str)
{
hash ^= ((hash << 5) + (*str++) + (hash >> 2));
}

return (hash & 0x7FFFFFFF);
}

// P. J. Weinberger Hash Function
unsigned int PJWHash(char *str)
{
unsigned int BitsInUnignedInt = (unsigned int)(sizeof(unsigned int) * 8);
unsigned int ThreeQuarters = (unsigned int)((BitsInUnignedInt * 3) / 4);
unsigned int OneEighth = (unsigned int)(BitsInUnignedInt / 8);
unsigned int HighBits = (unsigned int)(0xFFFFFFFF) << (BitsInUnignedInt - OneEighth);
unsigned int hash = 0;
unsigned int test = 0;

while (*str)
{
hash = (hash << OneEighth) + (*str++);
if ((test = hash & HighBits) != 0)
{
hash = ((hash ^ (test >> ThreeQuarters)) & (~HighBits));
}
}

return (hash & 0x7FFFFFFF);
}

// ELF Hash Function
unsigned int ELFHash(char *str)
{
unsigned int hash = 0;
unsigned int x = 0;

while (*str)
{
hash = (hash << 4) + (*str++);
if ((x = hash & 0xF0000000L) != 0)
{
hash ^= (x >> 24);
hash &= ~x;
}
}

return (hash & 0x7FFFFFFF);
}

// BKDR Hash Function
unsigned int BKDRHash(char *str)
{
unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
unsigned int hash = 0;

while (*str)
{
hash = hash * seed + (*str++);
}

return (hash & 0x7FFFFFFF);
}

// DJB Hash Function
unsigned int DJBHash(char *str)
{
unsigned int hash = 5381;

while (*str)
{
hash += (hash << 5) + (*str++);
}

return (hash & 0x7FFFFFFF);
}

// AP Hash Function
unsigned int APHash(char *str)
{
unsigned int hash = 0;
int i;

for (i=0; *str; i++)
{
if ((i & 1) == 0)
{
hash ^= ((hash << 7) ^ (*str++) ^ (hash >> 3));
}
else
{
hash ^= (~((hash << 11) ^ (*str++) ^ (hash >> 5)));
}
}

return (hash & 0x7FFFFFFF);
}

Programming a hash function in Pearl River

// Use the prime number closest to the number of elements as the size of the hash # define NHASH 29989 # define MULT 31 unsigned in hash (char * p) {unsigned int h = 0; for (; * p; p ++) h = MULT * h + * p; return h % NHASH ;}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.