MPQ technology insider

Source: Internet
Author: User
Tags blizzard

I started to try to translate some English articles. Recently I was interested in mpq. I saw an article called inside MPQ. So I need to translate it and exercise myself. This article is not so honest. It stops at key points, and there are no signs of updates. Depressing. However, it is better than some poor articles that do not study MPQ in China. After reading these articles, I am very sorry for the lag of Chinese technologies, hacking technology, and poor reverse engineering technology. We always use it as an application. There are too few original articles.

LEGAL COPYRIGHTS

The MPQ Format The copyrights to the MPQ format are held by Havas Interactive, Blizzard Entertainment's parent company, all rights reserved This Article The copyrights to this document and content are held by Justin Olbrantz (Quantam), all rights reserved. you may freely distrisponthis document provided that you do not derive profit from the distribution, and that the document remains complete and unchanged. you may quote this document ONLY with my explicit permission. contact me to obtain permission to quote. also, although I wowould appreciate recognition for your use of this information, I will not be held legally responsible for anything you may do with it. anyway that you misuse this information is your problem, and I will not be responsible for it.

 

I will not translate this LEGAL COPYRIGHTS. A

For my translation article, the statement is as follows:

It can be reproduced, but it must be noted that the author is Wang Yu, and the entire content includes the integrity of the above content. And I am not responsible for all consequences.

MPQ technology insider
Author Justin Olbrantz (Quantam)
Translator Wang Yu

Chapter 2

MPQ Introduction

MPQ, or MoPaQ, is a private copyrighted file format created by Mike o'brien. Mike o'brien is a brilliant player in Blizzard's multiplayer gaming engine. He developed this archive file format for the purpose of Diablo in 1996. In addition, the narcissistic "Mike O 'Brien PaCK" name MPQ in this format. However, the copyright of the document is owned by Havas Interactive (the father of Blizzard. So even if Mike leaves blizzard, blizzard still has the right to use MPQ format. MPQ formats include Diablo, Starcraft, Warcraft, Diablo 2, and BNE (I don't know what the game is), and Lords of Magic (developed by siider, this company is also affiliated with Havas) and other games.

An archive file is a file that contains other files and often exists in the form of compression. Havas uses MPQ to include almost everything in the game. Such as installation files and game data. The MPQ encapsulation of game data is very important. These MPQ includes images, sounds, levels, strings, and story line information. Obviusly, the potential for mizmization is astounding.

Before MPQ

A long time before the invention of MPQ, there was a WAR (Warcraft ARchive) format. This format is used to store data in warcraft 2 or even 1. This kind of young bird format is very simple and not optimized. It always looks like a real New file format. Files in files are addressed by coordinates. The only optimization is that some compression technology is used. However, although it is simple, it completes the tasks it needs to complete. It provides a fast but dirty way to compress and store a lot of files. But soon, the disadvantages began to be exposed. Addressing by coordinates means that a long entry table must be saved for the programmer to call certain files in the file. As the table grows, the work becomes longer and longer. In addition, this simple format means that hackers can easily crack the format within 15 minutes, and then do something on these files as they wish. These problems may not seem too bad at the beginning, but when the persistent characters required by Diablo, I don't understand it), the popularity of website networks makes these problems unacceptable.

Why is MPQ?

As mentioned above, the MPQ format is designed to make up for some very serious defects of WAR. However, it still adds many new features. In general, MPQ has the following features:

Security: Blizzard is most reluctant to play games that are like cracking warcraft 2. Blizzard may already think that the MPQ format should be applied to Starcraft. In any case, security is the most important. This can be seen from the effort by blizzard to maintain this format.

Efficiency: MPQ needs to complete a series of tasks, from the simplest pre-read data to the complex real-time stream. There is nothing for pre-read data, but for real-time streams, because the data must be decompressed while playing the game at a very high speed, the speed is mandatory.

Multi-language: at the very beginning, blizzard planned to bring its products to the global market, so it hoped that its game translation would be as easy as possible. Therefore, it uses an innovative method to put multilingual skills in the MPQ format.

Scalability: Obviously, it is silly to put all the data of a game into one file. Not only is there no efficiency, but the speed is slow, and after-sales upgrades will become very troublesome. Blizzard certainly knows this. Therefore, in order to make the after-sales upgrade simple, effective, and elegant, it considers this issue in the MPQ format design.

 

Storm

Many programmers usually encapsulate some common code into a shared library to prevent redundant code. These shared libraries provide common functions for programmers. This reduces redundancy and program size. Therefore, blizzard uses a shared library called Storm (Storm on Microsoft platform. dll, which is called Storm on the Apple platform. bin) This library is used by current blizzard games to store important functions, such as MPQ reading, war network, and even image routing. When blizzard releases a new game, it will add functions in storm, but will not modify the old functions. This means that an old game can use a new Storm library without any problems. Like any shared library, Storm functions can be used by anyone, making it less secure. That is why Storm only contains the READ function of MPQ, while the write function of MPQ is the private property of blizzard. It does not allow anyone to use it.

StarCraft task Editor

We all know that the StarCraft task editor can edit tasks. But the StarCraft task is MPQ! This means that the interstellar task editor can create MPQ, so there are MPQ creation functions. However, the StarCraft task editor is not a shared library, so it uses a series of strange hacker technologies to crack it. Therefore, the mpq api library is available.

 

Chapter 2

Basic

The history of most computers has evolved because of special problems that need to be addressed. In this chapter, we will learn about the MPQ format and their solutions.

Hash

Q: You have a large array of escape characters. You have another str string to determine whether it exists in this array. You may compare the content in the array one by one in order. However, in practical applications, you will find that this method is far slower than the actual needs. Some optimization is required. But how do you know whether the character seek exists without comparing it with all other characters in the array?

Solution: hash. Hash is a smaller data type (such as a number) to replace a larger data type (such as a string ). In this case, you can store the character escape array as a hash array. Then you can compare the hash of another str string with all the hashes In the stored hash array. If a hash in the hash array matches the hash of str, the character escape represented by this hash can be compared with str to determine whether the hash is the same. This method is called the subscript (indexing). It can increase the speed by nearly 100 times according to the array size and character escape length.

Unsigned long HashString (char * lpszString)
{
Unsigned long ulHash = 0xf1e2d3c4;

While (* lpszString! = 0)
{
UlHash <= 1;
UlHash + = * lpszString ++;
}

Return ulHash;
}

The above code shows a very simple hash algorithm. The function calculates the number of characters in the escape. Before each character is added, the hash value is shifted to one place. Using this algorithm, the character "arr \ units. dat" will be hashed to 0x5A858026, and "unit \ neutral \ acritter. grp" will be hashed to 0x694CD020. It is undeniable that this algorithm is very simple and useless. Because it produces a relatively predictable result. There will also be many conflicts. Chogntu refers to the conversion of multiple characters to the same value. On the other hand, the MPQ format uses a very complex hash algorithm (as shown below) to generate a completely unpredictable hash value. In fact, this hash algorithm is called one-way hash ). One-way hashing refers to a hashing algorithm that cannot be pushed back to find the source character escape Based on the hash value. Using this MPQ algorithm, the file name "arr \ units. dat" will be hashed to 0xF4E6C69D, while "unit \ neutral \ acritter. grp" will be hashed to 0xA26067F3.

Unsigned long HashString (char * lpszFileName, unsigned long dwHashType)
{
Unsigned char * key = (unsigned char *) lpszFileName;
Unsigned long seed1 = 0x7FED7FED, seed2 = 0 xeeeeeeeeee;
Int ch;

While (* key! = 0)
{
Ch = toupper (* key ++ );

Seed1 = cryptTable [(dwHashType <8) + ch] ^ (seed1 + seed2 );
Seed2 = ch + seed1 + seed2 + (seed2 <5) + 3;
}
Return seed1;
}

 

Hash table

Problem: you try to use the subscript method in the previous example, but your program requires a very strict speed limit. At this time, you will find that the subscript method is not fast enough. At this time, the way you make it faster is not to let it check all the hash in the array. Or, it is better that the string can be compared once with an element in the array to determine whether the escape character exists in the array. It sounds so good that it is impossible, right?

Solution: hash table. A hash table is a string hash array. I mean, we construct a fixed-length array different from the string array for this hash table (we position the number of its elements to an even power of and 2 ). When you want to know whether a string is in a hash table, you must first calculate the position of the string in the hash table. First, we calculate the hash of this string, and then use the hash to take the previous table length (1024) to get the location value. Therefore, if you use the previous simple hash algorithm, "arr \ units. dat "will be hashed to 0x5A858026, and the obtained position value is 0x26 (0x5A858026 modulo 0x400 quotient 0x16A160 remainder is 0x26 ). The string at the position 0x26 (if any) will be read and compared with the target string. If the string 0x26 does not match the target string or the string 0x26 does not exist, the target string does not exist in the array. The following code illustrates this:

Int GetHashTablePos (char * lpszString, SOMESTRUCTURE * lpTable, int nTableSize)
{
Int nHash = HashString (lpszString), nHashPos = nHash % nTableSize;
If (lpTable [nHashPos]. bExists &&! Strcmp (lpTable [nHashPos]. pString, lpszString ))
Return nHashPos;
Else
Return-1; // Error value
}

However, this algorithm has a huge defect. What do you think happens when a conflict occurs (two characters are hashed to the same value? Obviously, they cannot use the same element in the hash table. Generally, this defect is implemented by making every element in the hash table a linked list. Each chain tag contains the characters with the same hash value. MPQ uses a file name hash table to track all internal files. However, the format of this table is somewhat different from that of a normal hash table. First, it does not use Hash as the subscript and stores the actual file name in the table for verification. In fact, it does not store the file name at all. Instead, three different hashes are used: a subscript for the hash table, and two for verification. The two verification hashing Replace the actual file name. Of course, two different file names will be hashed to three identical hashes. However, the probability of this occurrence is an average of 1: 18889465931478580854784. This probability should be small enough for anyone. Different MPQ hash tables use the usual linked list conflict resolution method. When a conflict occurs, the element is moved down to the next empty position. See the following code to find the file name through MPQ:

Int GetHashTablePos (char * lpszString, MPQHASHTABLE * lpTable, int nTableSize)
{
Const int HASH_OFFSET = 0, HASH_A = 1, HASH_ B = 2;
Int nHash = HashString (lpszString, HASH_OFFSET ),

NHashA = HashString (lpszString, HASH_A ),

NHashB = HashString (lpszString, HASH_ B ),

NHashStart = nHash % nTableSize,
NHashPos = nHashStart;
While (lpTable [nHashPos]. bExists)
{
If (lpTable [nHashPos]. nHashA = nHashA & lpTable [nHashPos]. nHashB = nHashB)
Return nHashPos;
Else
NHashPos = (nHashPos + 1) % nTableSize;

If (nHashPos = nHashStart)
Break;
}
Return-1; // Error value
}

Although this Code may seem confusing to you, the theory behind it is not complicated. It follows the following steps when reading a file:

1. Calculate 3 hashes (1 subscript hash and 2 checkhash) and store them into variables.
2. Move to the element indicated by the subscript hash
3. Does this element exist? If it does not exist, stop searching and return "file not found"
The two checks on the 4 element are hashed. Is the check Hash of the file we searched for matched? If they match, the current element is returned.
5. Move the current subscript to the next one. If the last subscript is reached, the system returns to the 1st
6. Check whether the subscript hash of the element we just created is the same (whether the entire table is searched). If yes, stop searching and return "file not found"
7. Return to step 2

If you pay attention, you will find that in my explanations and examples, the MPQ hash table needs to save all the file names. However, have you ever wondered what will happen when all the rows in the hash table are filled up? The answer may surprise you: you cannot add any more files. Someone asked me why there is a limit on the number of files in an MPQ. Is there any way to solve this restriction. You have reached the answer to the first question. Sorry, you cannot solve the limit on the number of such files. Because the hash table cannot be changed without affecting the entire file. This is because the hash of each element in the hash table is changed because of the size change of the hash table, so we cannot get the position of the file in the new hash table, so we cannot get the file name.

Compression

Problem: You have a large program (such as 50 MB) that you want to send to Inter. However, a 50 MB download will be very large, and people may not want to wait for a few hours to download such a thing.

Solution: compression. Compression refers to expressing a large amount of data in a very small format. There are many compression algorithms in the world, each working in different ways. The data compression algorithm used by our MPQ is the data compression library of PKWare. This library is too complicated to explain here. So here I want to explain a relatively simple winning compression algorithm.
This section is not completed due to the author's ability.

 

Encryption

A system has always been a constant topic in protecting the eyes of spies. People have tried to send personal information to others for centuries. From hand-written correspondence sent by the ancient Greek messenger on foot to the radio of the Nazi submarine during war 2, to today's online credit card transaction. It is essential to ensure that others cannot obtain your information. This complex protection method is called encryption. Although we do not know who invented the first encryption algorithm, we know that there are too many data encryption algorithms in the world. Everything, from simple data encoding to decryption algorithms, is used again and again. This article, of course, does not explain, and does not expect to explain an encryption algorithm, but understanding encryption is a must for you to work with MPQ.

Let's first look at an encryption algorithm published on Basic Lab Notes:

Void EncryptBlock (void * lpvBlock, int nBlockLen, char * lpszPassword)
{
Int nPWLen = strlen (lpszPassword), nCount = 0;
Char * lpsPassBuff = (char *) _ alloca (nPWLen );
Memcpy (lpsPassBuff, lpszPassword, nPWLen );
For (int nChar = 0; nCount <nBlockLen; nCount ++)
{
Char cPW = lpsPassBuff [nCount];
LpvBlock [nChar] ^ = cPW;
LpsPassBuff [nCount] = cPW + 13;
NCount = (nCount + 1) % nPWLen;
}
Return;
}

As shown in the hash code, this code is also very simple and cannot be used in actual programs that require security. Even if the code looks mysterious, it is very simple to do. It encrypts the entire input block. Exclusive or password. Then add 13 to the result (the reason for choosing 13 is that 13 is a prime number ). This makes the code more difficult to confirm. In this case, the string "encryption" (65 6E 63 72 79 70 74 69 6F 6E) in the password "MPQ" (4D 50 51) the following code will be encrypted to (28 3E 32 28 24 2E 13 03 04 1A). Now, this code is symmetric. Symmetric means that the Encrypted Key is the same as the decrypted key. In fact, the same encryption algorithm can be used for decryption because of an identical or symmetric operation. Note that most symmetric encryption algorithms are not completely symmetric, so the encryption and decryption functions are different. Well, now things are getting in trouble. If you want to directly use the MPQ format, you must know its encryption and decryption algorithms. And I will teach you how to use it. The MPQ encryption algorithm is an interesting hybrid of some other encryption algorithms. It creates an encryption table (also used in the hash function), and then uses the encryption key of a file to remove some numbers from the encryption table, and then shares these numbers with the encrypted data for XOR. The current method of doing things is very strange, so some code may look very complicated. The following code generates an encrypted table with a length of 0x500.

 

Void prepareCryptTable ()
{
Unsigned long seed = 0x00100001, index1 = 0, index2 = 0, I;
For (index1 = 0; index1 <0x100; index1 ++)
{
For (index2 = index1, I = 0; I <5; I ++, index2 + = 0x100)
{
Unsigned long temp1, temp2;
Seed = (seed * 125 + 3) % 0x2AAAAB;
Temp1 = (seed & 0 xFFFF) <0x10;
Seed = (seed * 125 + 3) % 0x2AAAAB;
Temp2 = (seed & 0 xFFFF );
CryptTable [index2] = (temp1 | temp2 );
}
}
}

Do you feel a little bit that blizzard hired a super non‑character calculus professor to write this code? At least that's how I feel. Even if you cannot understand the code, there is no big problem. If you want to use MPQ directly, you may need these functions. You don't have to fully understand them. In any case, after the encryption table is initialized, we can use the following function to decrypt the MPQ data (don't expect me to explain this code to you because I didn't understand it either ):

Void DecryptBlock (void * block, long length, unsigned long key)
{
Unsigned long seed = 0 xeeeeeeeeee, unsigned long ch;
Unsigned long * castBlock = (unsigned long *) block;
// Round to longs
Length> = 2;
While (length --> 0)
{
Seed + = stormBuffer [0x400 + (key & 0xFF)];
Ch = * castBlock ^ (key + seed );
Key = ((~ Key <0x15) + 0x11111111) | (key> 0x0B );
Seed = ch + seed + (seed <5) + 3;
* CastBlock ++ = ch;
}
}

Translation Postscript:

This is just something I have nothing to do with translation. I have not carefully reviewed the translation statements, and I have not even understood some of them, or you can clearly understand that the translation is not good, but you can still write it down. I am not even interested in reading this article from start to end. Only chapter 2nd is translated because Chapter 3 describes how Storm and Starcraft Campaign Editor and the mpq api Library are used and there is no translation value. The authors of the really wonderful chapter 5 and 6 have not finished writing. So the author is not honest. Let's look at the source of the English source. I hope that my translation will only inspire others to read the original English version. In many cases, the loss of translation information is quite serious.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.