Memorandum-compress integers and decompress them
Original: Anders Liu
This article is outdated. See compressed integers used in. NET/CLI metadata: http://www.cnblogs.com/andresliu/archive/2010/02/09/compressed-integer-in-metadata.html.
Abstract :. NET/cli pe files widely use an integer compression algorithm, which can put a 32-bit unsigned integer in 1, 2 or 4 bytes according to its size. This article introduces this compression algorithm and provides reference implementation for understanding compression.
References
ECMA-335 -- Common Language Infrastructure (CLI) 4th Edition, June 2006
Introduction
The integer compression algorithm introduced in this article is for 0x00000000 ~ The 32-bit unsigned integer between 0x1FFFFFFF, which is divided into three intervals [0x00000000 ~ 0x0000007F], [0x00000080 ~ 0x00003FFF], [0x000040000 ~ 0x1FFFFFFF], which are stored in 1, 2, and 4 bytes respectively. 32-bit unsigned integers greater than 0x1FFFFFFF are not suitable for compression using this algorithm.
This algorithm is widely used in. NET/cli pe files, such as various metadata signatures, # Blob and # US streams. The purpose of the budget estimate method is to reduce the size of Disk Files and bandwidth overhead, because the. NET assembly is intended to run over the network. The application of the budget estimate method mainly involves data size and number of data entries. In these aspects, the value of an integer is usually not too large, therefore, the number of bytes saved by the budget estimate method is considerable.
Compression Algorithm Description
- When the integer size is between 0x00000000 (00000000 00000000 000000000000b) and 0x0000007F (00000000 00000000 00000000 01111111B), an integer is used to store an integer. The maximum byte value is 0. The compressed value is like [0 bbbbbbb] B.
- When the integer is between 0x00000080 (00000000 00000000 0000000b) and 0x00003FFF (00000000 00000000 00000000 11111111B), two bytes are used to store the integer, the highest bit of the first byte is 1, and the second bit is 0. The compressed value is like [10 bbbbbb bbbbbbbb] B.
- When the integer is between 0x00004000 (00000000 00000000 000000000000b) and 0x1FFFFFFF (01000000 00011111 11111111 11111111B), the integer is rounded up to 4 bytes, the highest and second bits of the first byte are 1, and the third bits are 0. The compressed value is like [110 bbbbb bbbbbbbbbbbbbbbb bbbbbbbbbb] B.
- The compression algorithm adopts the big ending number method, that is, the first byte is the highest byte of the original integer.
Figure 1 intuitively shows the interval division.
Figure 1-integer Compression Algorithm
Extract Algorithm Description
- If the first byte b0 is read, for example, 0 bbbbb (bitwise AND operation with 0x80, and the result is 0x00), an integer is stored in one byte. Original integer = b0.
- If the first byte b0 is read, for example, 10 bbbbbb (bitwise AND operation with 0xC0, and the result is 0x80), two bytes are used to store the integer, read one byte b1. Original integer = (b0 & 0x3F) <8 | b1.
- If the first byte b0 is read, for example, 110 bbbbb (bitwise AND operation with 0xD0, and the result is 0xC0), the integer is stored in four bytes, you need to read three more bytes b1, b2, and b3. Original integer = (b0 & 0x1F) <24 | b1 <16 | b2 <8 | b3.
Decompression algorithm reference implementation
Listing 1-Implementation of the decompression algorithm (C # description)
Public static UInt32 Decompress (this Byte [] data) <br/>{< br/> if (data = null) <br/> throw new ArgumentNullException ("data "); </p> <p> if (data. length = 0) <br/> throw new InvalidCompressedIntegerException (); </p> <p> if (data [0] & 0x80/* (Limit 00b) */) = 0 // use one byte for storage (0 bbbbbbb B) <br/> & data. length = 1) <br/>{< br/> return (UInt32) data [0]; <br/>}< br/> else if (data [0] & 0xC0/* (111000000b) */) = 0x80/* (0000000b) * /// use two bytes to store the size (10 bbbbbbbb B) <br/> & data. length = 2) <br/>{< br/> return (UInt32) (data [0] & 0x3F/* (00111111B )*/) <8 | data [1]); <br/>}< br/> else if (data [0] & 0xE0/* (11100000B )*/) = 0xC0/* (11000000B) * // use four bytes to store the size (110 bbbbb bbbbbbbbbbbbbbbbbb B) <br/> & data. length = 4) <br/>{< br/> return (UInt32) (data [0] & 0x1F/* (00011111B )*/) <24 | data [1] <16 | data [2] <8 | data [3]); <br/>}< br/> else <br/>{< br/> throw new InvalidCompressedIntegerException (); <br/>}< br/>}
EOF.