Turn: blizzard mpq file search algorithm MPQ file operation interface

Source: Internet
Author: User
Tags blizzard

I recently learned about the MPQ file format of Blizzard. One of them is my understanding of HastTable and I would like to share it with you. I would like to thank Justin Olbrantz for his article "Inside MoPaQ". most of them come from this.

First, I would like to raise a simple question. If there is a huge string array, then I will give you a separate string so that you can find out whether the string exists in the array and find it, what do you do? There is one method that is the easiest, honestly from the beginning to the end, one by one comparison, until it is found, I think anyone who has learned programming can make such a program, however, if a programmer gives such a program to the user
It can be evaluated without words. Maybe it can really work, but it can only be said. The most suitable algorithm is to use HashTable (Hash table). First, we will introduce the basic knowledge. The so-called Hash is generally an integer, you can compress a string into an integer, which is called Hash. Of course, a 32-bit integer cannot correspond to a string in any case, but in the program, there are two characters
The Hash values calculated by the string are very small. Let's look at the Hash algorithm in MPQ.
Unsigned long HashString (char * lpszFileName, unsigned long dwHashType)
{
Unsigned char * key = (unsigned char *) lpszFileName;
Unsigned long seed1 = 0x7FED7FED, seed2 = 0 xeeeeeeeeee;
Int ch;
While (* key! = 0)
{
Ch = toupper (* key ++ );
Seed1 = cryptTable [(dwHashType <8) + ch] ^ (seed1 + seed2 );
Seed2 = ch + seed1 + seed2 + (seed2 <5) + 3;
}
Return seed1;
}
This algorithm of Blizzard is very efficient and called "One-Way Hash". For example, the result of the string "unitneutralacritter. grp" obtained through this algorithm is 0xA26067F3.
Instead of improving the first algorithm, just change it to comparing the Hash values of strings one by one. The answer is, it is far from enough. If you want to get the fastest algorithm, you cannot compare the values one by one, A Hash Table is usually constructed to solve the problem. A Hash Table is a large array, and the size of this array is defined according to program requirements, such as 1024, each Hash value corresponds to a position in the array through mod, so that as long as the position corresponding to the Hash value of this string is not occupied, you can get the final result. Think about the speed? Yes, it is the fastest O (1). Now let's take a closer look at this algorithm.
Int GetHashTablePos (char * lpszString, SOMESTRUCTURE * lpTable, int nTableSize)

{
Int nHash = HashString (lpszString), nHashPos = nHash % nTableSize;
If (lpTable [nHashPos]. bExists &&! Strcmp (lpTable [nHashPos]. pString, lpszString
))
Return nHashPos;
Else
Return-1; // Error value
}
As you can see, I think everyone is thinking about a very serious problem: "What if the two strings are in the same position in the hash table? ", After all, the size of an array is limited, which is very likely. There are many ways to solve this problem. The first thing I think of is to use the "Linked List". The data structure I learned in college has taught me the magic weapon of trying to solve this problem, many algorithms I have encountered can be converted into linked lists. As long as a linked list is mounted at each entry of the hash table, it is okay to save all the corresponding strings. This seems to have a perfect ending. If you leave the problem to me alone, then I may have to define the data structure and write the code. However, Blizzard programmers use more sophisticated methods. The basic principle is: they do not use a hash value in the hash table, but use three hash values to verify the string. It seems that Blizzard has also gained the essence of this sentence. It is possible to say that the two strings are consistent with the entry points obtained by a hash algorithm, however, if the entry points calculated using three different hash algorithms are the same, it is almost impossible. The probability is: 1888946593147858085478.
4, which is probably one of 10 to the power of 22.3, is safe enough for a game program. Now back to the data structure, the hash table used by Blizzard does not use the linked list, but uses the "extend" method to solve the problem. Let's look at this algorithm:
Int GetHashTablePos (char * lpszString, MPQHASHTABLE * lpTable, int nTableSize)

{
Const int HASH_OFFSET = 0, HASH_A = 1, HASH_ B = 2;
Int nHash = HashString (lpszString, HASH_OFFSET );
Int nHashA = HashString (lpszString, HASH_A );
Int nHashB = HashString (lpszString, HASH_ B );
Int nHashStart = nHash % nTableSize, nHashPos = nHashStart;
While (lpTable [nHashPos]. bExists)
{
If (lpTable [nHashPos]. nHashA = nHashA & lpTable [nHashPos]. nHashB = nHashB
)
Return nHashPos;
Else
NHashPos = (nHashPos + 1) % nTableSize;

If (nHashPos = nHashStart)
Break;
}
Return-1; // Error value
}
1. Calculate the three hash values of the string (one is used to determine the position, and the other two are used for verification)
2. view the position in the hash table
3. is the position in the hash table empty? If it is null, the string does not exist.
4. If yes, check whether the other two hash values match. If yes, the string is found.
Back
5. Move to the next position. If it has exceeded the border, it indicates that it cannot be found.
6. check whether it is back to the original position. If yes, the returned result is not found.
7. Return to 3
How about it? It's a simple algorithm, but it's really a genius idea. In fact, the best algorithm is usually a simple and effective algorithm.

//////////////////////////////////////// //////////////////////////////////////// //////////////////////////////////////// /////////////////////////////////////////

1. There are many ways to solve the hash table collision. The chaining mentioned at the beginning is one of them. This method is not a "bad" method I originally thought. On the contrary, this method is the most common, and the hash table in the C ++ standard library uses this method.

2. Blizzard does not use a linked list, but uses a Linear mining method (Linear probing) because it does not need to dynamically allocate memory and makes it easier to serialize data to disk files.

2. A better method is the "power mining" method (quadratic probing). In the case of a collision, follow "+ 1,-1, + 4,-4, + 9, -9 ,..., + n ^ 2,-n ^ 2 ". In this mode, the length of the container must be a prime number to increase the speed.

3. These methods are not secrets. They are included in a slightly detailed data structure course. Therefore, it is certain that it is no harm to listen carefully in class.

4. the last piece of good news is that the hash table already exists in the 2008 C ++ standard, but the official name is changed to "unordered"

Interface:

Http://download.csdn.net/source/354842

Resource files in Blizzard's games (Warcraft, World of Warcraft, Diablo, and Starcraft) are encapsulated in MPQ compressed files, using this mpq api function, you can easily operate on MPQ files, including the APIs of VC and VB.
Take VC as an example to add
# Include "SFmpqapi. h", add SFmpq. lib to the included library, and then you can call the export function in SFmpqapi. h for programming,
Finally, put SFapi. dll and the compiled. EXE file in the same folder.
In this interface, Archive refers to the MPQ compressed File, and File refers to the File contained in MPQ, in the format of "XXX. XXX ", you can use the list function to check what files are contained in MPQ, or use tools such as MPQMaster.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.