Create the fastest hash table (dialog with Blizzard)

Source: Internet
Author: User
Tags blizzard
Create the fastest hash table (dialog with Blizzard)


Kaiyuan recently learned about the mpq file format of Blizzard and has some experiences. One of them is his understanding of hasttable and he wants to share it with everyone, thanks to Justin olbrantz's article "Inside mopaq", most of them come from this.

 

First, I would like to raise a simple question. If there is a huge string array, then I will give you a separate string so that you can find out whether the string exists in the array and find it, what do you do?

There is one method that is the easiest, honestly from the beginning to the end, one by one comparison, until it is found, I think anyone who has learned programming can make such a program, but if a programmer gives such a program to a user, I can only comment it without words. Maybe it can really work,... this is the only way to do this.

The most suitable algorithm is to use hashtable (hash table). First, we will introduce the basic knowledge. The so-called hash is generally an integer, you can compress a string into an integer, which is called hash. Of course, a 32-bit integer cannot correspond to a string in any case, but in the program, the hash values calculated by the two strings are very small. Let's look at the hash algorithm in mpq.

Unsigned long hashstring (char * lpszfilename, unsigned long dwhashtype)
{
Unsigned char * Key = (unsigned char *) lpszfilename;
Unsigned long seed1 = 0x7fed7fed, seed2 = 0 xeeeeeeeeee;
Int ch;

While (* key! = 0)
{
Ch = toupper (* Key ++ );

Seed1 = crypttable [(dwhashtype <8) + CH] ^ (seed1 + seed2 );
Seed2 = CH + seed1 + seed2 + (seed2 <5) + 3;
}
Return seed1;
}

This algorithm of blizzard is very efficient and called "one-way hash". For example, the result of the string "unitneutralacritter. GRP" obtained through this algorithm is 0xa26067f3.
Is it possible to improve the first algorithm by comparing the hash values of strings one by one? The answer is: it is far from enough. If you want to get the fastest algorithm, you cannot compare the values one by one, A hash table is usually constructed to solve the problem. A hash table is a large array, and the size of this array is defined according to program requirements, such as 1024, each hash value corresponds to a position in the array through mod, so that as long as the position corresponding to the hash value of this string is not occupied, you can get the final result. Think about the speed? Yes, it is the fastest O (1). Now let's take a closer look at this algorithm.
Int gethashtablepos (char * lpszstring, somestructure * lptable, int ntablesize)
{
Int nhash = hashstring (lpszstring), nhashpos = nhash % ntablesize;

If (lptable [nhashpos]. bexists &&! Strcmp (lptable [nhashpos]. pstring, lpszstring ))
Return nhashpos;
Else
Return-1; // error value
}

Seeing this, I think everyone is thinking about a very serious problem: "What if the two strings are in the same position in the hash table? ", After all, the size of an array is limited, which is very likely. There are many ways to solve this problem. The first thing I think of is to use a "Linked List". Thanks to the data structure I learned in college, I have taught you the magic weapon of trying to solve this problem, many algorithms I have encountered can be converted into linked lists. As long as a linked list is mounted at each entry of the hash table, it is okay to save all the corresponding strings.

This seems to have a perfect ending. If you leave the problem to me alone, then I may have to define the data structure and write the code. However, blizzard programmers use more sophisticated methods. The basic principle is: they do not use a hash value in the hash table, but use three hash values to verify the string.

There is an old saying in China that "no more than two times and no more than four times". It seems that blizzard also has the essence of this statement. It is possible to say that the two strings are consistent with the entry points obtained through a hash algorithm, however, if we use three different hash algorithms to calculate the same entry point, it is almost impossible. The probability is 1: 18889465931478580854784, which is probably one of the 10 power points, it is safe enough for a game program.

Now back to the data structure, the hash table used by Blizzard does not use the linked list, but uses the "extend" method to solve the problem. Let's look at this algorithm:
Int gethashtablepos (char * lpszstring, mpqhashtable * lptable, int ntablesize)
{
Const int hash_offset = 0, hash_a = 1, hash_ B = 2;
Int nhash = hashstring (lpszstring, hash_offset );
Int nhasha = hashstring (lpszstring, hash_a );
Int nhashb = hashstring (lpszstring, hash_ B );
Int nhashstart = nhash % ntablesize, nhashpos = nhashstart;

While (lptable [nhashpos]. bexists)
{
If (lptable [nhashpos]. nhasha = nhasha & lptable [nhashpos]. nhashb = nhashb)
Return nhashpos;
Else
Nhashpos = (nhashpos + 1) % ntablesize;

If (nhashpos = nhashstart)
Break;
}

Return-1; // error value
}

1. Calculate the three hash values of the string (one is used to determine the position, and the other two are used for verification)
2. view the position in the hash table
3. is the position in the hash table empty? If it is null, the string does not exist.
4. If yes, check whether the other two hash values match. If yes, the string is found and the result is returned.
5. Move to the next position. If it has exceeded the border, it indicates that it cannot be found.
6. check whether it is back to the original position. If yes, the returned result is not found.
7. Return to 3

How about it? It's a simple algorithm, but it's really a genius idea. In fact, the best algorithm is usually a simple and effective algorithm,
Blizzard is called the best gaming company.

Http://blog.blogchina.com/article_85296.361466.html see at the same time: http://www.googlechinablog.com/atom.xmlBeautiful Math Series 21-bloom Filter)

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.