Blizzard has a classic string hash formula.
First, I would like to raise a simple question. Suppose there is a huge string array, and then I will give you a separate string, so that you can find out whether the string exists in this array and find it, what do you do?
There is one method that is the easiest, honestly from the beginning to the end, one by one comparison, until it is found, I think anyone who has learned programming can make such a program, however, if a programmer gives such a program to the user, I can only comment it without words. Maybe it can really work, but it can only do so.
The most suitable algorithm is to use hashtable (hash table). First, we will introduce the basic knowledge. The so-called hash is generally an integer, you can compress a string into an integer, which is called hash. Of course, a 32-bit integer cannot correspond to a string in any case, but in the program, the hash values calculated by the two strings are very small. Let's look at the hash algorithm in mpq.
Unsigned long hashstring (char * lpszfilename, unsigned long dwhashtype)
{
Unsigned char * Key = (unsigned char *) lpszfilename;
Unsigned long seed1 = 0x7fed7fed, seed2 = 0 xeeeeeeeeee;
Int ch;
While (* key! = 0)
{
Ch = toupper (* Key );
Seed1 = crypttable [(dwhashtype <8) CH] ^ (seed1 seed2 );
Seed2 = CH seed1 seed2 (seed2 <5) 3;
}
Return seed1;
}
This algorithm of blizzard is very efficient and called "one-way hash". For example, the result of the string "unitneutralacritter. GRP" obtained through this algorithm is 0xa26067f3.
Is it possible to improve the first algorithm by comparing the hash values of strings one by one? The answer is: it is far from enough. If you want to get the fastest algorithm, you cannot compare the values one by one, A hash table is usually constructed to solve the problem. A hash table is a large array, and the size of this array is defined according to program requirements, such as 1024, each hash value corresponds to a position in the array through mod, so that as long as the position corresponding to the hash value of this string is not occupied, you can get the final result. Think about the speed? Yes, it is the fastest O (1). Now let's take a closer look at this algorithm.
Int gethashtablepos (char * lpszstring, somestructure * lptable, int ntablesize)
{
Int nhash = hashstring (lpszstring), nhashpos = nhash % ntablesize;
If (lptable [nhashpos]. bexists &&! Strcmp (lptable [nhashpos]. pstring, lpszstring ))
Return nhashpos;
Else
Return-1; // error value
}
Seeing this, I think everyone is thinking about a very serious problem: "What if the two strings are in the same position in the hash table? ", The size of an array is limited, which is very likely. There are many ways to solve this problem. The first thing I think of is to use a "Linked List". Thanks to the data structure I learned in college, I have taught you the magic weapon of trying to solve this problem, many algorithms I have encountered can be converted into linked lists. As long as a linked list is mounted at each entry of the hash table, it is okay to save all the corresponding strings.
This seems to have a perfect ending. If you leave the problem to me alone, then I may have to define the data structure and write the code. However, blizzard programmers use more sophisticated methods. The basic principle is: they do not use a hash value in the hash table, but use three hash values to verify the string.
There is an old saying in China that "no more than two times and no more than four times". It seems that blizzard also understands the essence of this statement. It is possible to say that the two strings are consistent with the entry points obtained through a hash algorithm, however, if we use three different hash algorithms to calculate the same entry point, it is almost impossible. The probability is 1: 18889465931478580854784, which is probably one of the 10 power points, it is safe enough for a game program.
Now back to the data structure, the hash table used by Blizzard does not use the linked list, but uses the "extend" method to solve the problem. Let's look at this algorithm:
Int gethashtablepos (char * lpszstring, mpqhashtable * lptable, int ntablesize)
{
Const int hash_offset = 0, hash_a = 1, hash_ B = 2;
Int nhash = hashstring (lpszstring, hash_offset );
Int nhasha = hashstring (lpszstring, hash_a );
Int nhashb = hashstring (lpszstring, hash_ B );
Int nhashstart = nhash % ntablesize, nhashpos = nhashstart;
While (lptable [nhashpos]. bexists)
{
If (lptable [nhashpos]. nhasha = nhasha & lptable [nhashpos]. nhashb = nhashb)
Return nhashpos;
Else
Nhashpos = (nhashpos 1) % ntablesize;
If (nhashpos = nhashstart)
Break;
}
Return-1; // error value
}
1. Calculate the three hash values of the string (one is used to determine the position, and the other two are used for verification)
2. view the position in the hash table
3. is the position in the hash table empty? If it is null, the string does not exist.
4. If yes, check whether the other two hash values match. If yes, the string is found and the result is returned.
5. Move to the next position. If the cross-border access is exceeded, it indicates no result is found.
6. check whether it is back to the original position. If yes, the returned result is not found.
7. Return to 3
How about it? It's a simple algorithm, but it's really a genius idea. In fact, the best algorithm is usually a simple and effective algorithm.