In simple terms, an integer compression algorithm is a method of placing a 32-bit integer (usually 4 bytes) into as little storage space as possible (1, 2, or 4 bytes).
Integer compression algorithms are widely used in. net/cli PE files, such as various metadata signatures, #Blob和 #us streams, etc. In these places, you need to use an integer value to record the number of entries or the size of the data block. If you simply use a 32-bit integer, because most of the number or size of the value is not small, will cause a large number of bytes are placed into meaningless 0 values. The use of compression algorithms in these scenarios can effectively save disk space or network bandwidth occupied by PE files.
The following are some scenarios in the PE file that are used to compress integers:
At the beginning of each entry in the Blob heap (#Blob流和 the storage format used by the #us stream), a compressed unsigned integer is used to indicate the size of the entry;
In the metadata signature of a method, a compressed unsigned integer is used to store the number of parameters;
An array subscript in a metadata signature that is stored with a compressed, signed integer.
Note that the compression and decompression algorithms described in this article are for 32-bit integers. In addition, in the introduction to this article, if there is no special mention, the integer that appears is represented by the large mantissa (the highest weight byte is placed on the left or above).
Compression of unsigned integers and decompression of unsigned integers
The compression of unsigned integers is relatively simple, and the entire range of unsigned integers is divided into sections, and the integer value is placed in 1, 2, or 4 bytes, depending on the section in which it is located. Table 1 lists the section partitioning and compression methods for unsigned integers.
Section |
Number of bytes |
Masks |
Binary form |
[00000000h, 0000007Fh] |
1 |
80h |
0BBBBBBBB |
[00000080h, 00003FFFh] |
2 |
c0h |
10BBBBBB BBBBBBBB |
[00004000h, 1FFFFFFFh] |
4 |
e0h |
110BBBBB bbbbbbbb bbbbbbbb bbbbbbbb |