Blizzard's algorithm of hash algorithm for string matching

Source: Internet
Author: User
Tags comparison hash blizzard

Successfully solved the MD5 multi-hash cuda parallel through the technology mentioned inside


This article turns from http://blog.csdn.net/shanzhizi/article/details/7736526


Let's start with a simple question, if you have a large array of strings, and then give you a separate string that lets you find out if you have this string from this array and find it, what you'll do. There is a way to the simplest, honestly from the tail, a comparison, until found, I think as long as the people who have learned the program design can make such a process, but if there are programmers to give such a program to the user, I can only use no language to evaluate, maybe it really can work, but ... This is the only way. The most appropriate algorithm is the use of Hashtable ( hash table ), the first introduction of the basic knowledge, the so-called hash, is generally an integer, through an algorithm, you can put a string "compressed" into an integer, this number is called hash, of course, anyway, A 32-bit integer cannot correspond back to a string, but in the program, the hash value of the two-string calculation is probably very small, and the following is a look at the hash algorithm in MPQ:

[CPP]  View plain copy unsigned long hashstring (char*lpszfilename, unsigned long  Dwhashtype)     {        unsigned char*key =  ( unsigned char*) lpszfilename;        unsigned long seed1  =0x7fed7fed, seed2 =0xeeeeeeee;        int ch;            while (*key !=0)         {             ch = toupper (*key );                seed1 = crypttable[( dwhashtype <<8)  ch] ^  (seed1 seed2);             seed2 = ch+ seed1+ seed2 + (seed2 <<5)  3;         }        return seed1;   }  
Blizzard's algorithm is very efficient, known as the "one-way Hash", for example, the string "UNITNEUTRALACRITTER.GRP" results from this algorithm is 0XA26067F3. is not the first algorithm to improve, to compare the Hash value of the string can be, the answer is, far from enough, to get the fastest algorithm, you can not do a comparison, is usually constructed a hash table (hash table, http://blog.csdn.net/ Shanzhizi) To solve the problem, the hash table is a large array, the capacity of the array is defined according to the requirements of the program, for example, 1024, each hash value by the modulo operation (MoD) corresponds to a position in the array, so long as the comparison of the string hash value of the position of the pair is not occupied, You can get the final result and think about what the speed is. Yes, it's the fastest O (1), now take a closer look at the algorithm.


[CPP]View plain copy int gethashtablepos (char*lpszstring, somestructure *lptable, int ntablesize) {int nhash = HASHS           Tring (lpszstring), Nhashpos = nhash% Ntablesize;        if (lptable[nhashpos].bexists &&!strcmp (lptable[nhashpos].pstring, lpszstring)) return nHashPos; else return-1; Error Value}
Seeing this, I think we all think of a very serious problem: "What if two strings are in the same position in the hash table?" "There is a great possibility that an array's capacity is limited. There are many ways to solve this problem, my first thought is to use "linked list", thanks to the University of the data structure taught this hundred test lark, I encountered many algorithms can be converted into a list to solve, as long as the hash table at each entrance to hang a list, save all the corresponding string is OK. There seems to be a perfect ending to this, and if the problem is left to me alone, at this point I might start defining the data structure and writing the code. However, the method used by blizzard programmers is a more sophisticated approach. The rationale is that they do not use a hash in the hash table but instead use a three hash value to validate the string. China has an old saying "again and again two can not weighed", it seems that Blizzard also won the essence of the words, if said two different strings through a hashing algorithm to get the same entry point is possible, but with three different hashing algorithm to calculate the entry point is consistent, it is almost certainly impossible, The odds are 1:1.,888,946,593,147,86e,+22, which is about 10 of the 22.3-point one, which is safe enough for a game program. Now go back to the data structure, the hash table used by Blizzard does not use a linked list, and the "deferred" way to solve the problem, look at this algorithm:

[CPP]  View plain copy Int gethashtablepos (char*lpszstring, mpqhashtable *lptable, int  ntablesize)     {        constint hash_offset =0,  HASH_A =1, HASH_B =2;        int nhash =  hashstring (lpszstring, hash_offset);        int nHashA  = hashstring (lpszstring, hash_a);        int nhashb =  hashstring (lpszstring, hash_b);        int nhashstart =  nHash % nTableSize, nHashPos = nHashStart;                while  (lptable[nhashpos].bexists)          {            if  (Lptable[nhashpos]. nhasha ==&NBSP;NHASHA&NBSP;&AMP;&AMP;&NBSP;LPTABLE[NHASHPOS].NHASHB&NBSP;==&NBSP;NHASHB)                  return nHashPos;             else                 nHashPos =  (nhashpos 1)  % nTableSize;   

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.