Memorandum-compress integers and decompress them

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: Anders Liu

This article is outdated. See compressed integers used in. NET/CLI metadata: http://www.cnblogs.com/andresliu/archive/2010/02/09/compressed-integer-in-metadata.html.

Abstract :. NET/cli pe files widely use an integer compression algorithm, which can put a 32-bit unsigned integer in 1, 2 or 4 bytes according to its size. This article introduces this compression algorithm and provides reference implementation for understanding compression.

References

ECMA-335 -- Common Language Infrastructure (CLI) 4th Edition, June 2006

Introduction

The integer compression algorithm introduced in this article is for 0x00000000 ~ The 32-bit unsigned integer between 0x1FFFFFFF, which is divided into three intervals [0x00000000 ~ 0x0000007F], [0x00000080 ~ 0x00003FFF], [0x000040000 ~ 0x1FFFFFFF], which are stored in 1, 2, and 4 bytes respectively. 32-bit unsigned integers greater than 0x1FFFFFFF are not suitable for compression using this algorithm.

This algorithm is widely used in. NET/cli pe files, such as various metadata signatures, # Blob and # US streams. The purpose of the budget estimate method is to reduce the size of Disk Files and bandwidth overhead, because the. NET assembly is intended to run over the network. The application of the budget estimate method mainly involves data size and number of data entries. In these aspects, the value of an integer is usually not too large, therefore, the number of bytes saved by the budget estimate method is considerable.

Compression Algorithm Description

When the integer size is between 0x00000000 (00000000 00000000 000000000000b) and 0x0000007F (00000000 00000000 00000000 01111111B), an integer is used to store an integer. The maximum byte value is 0. The compressed value is like [0 bbbbbbb] B.
When the integer is between 0x00000080 (00000000 00000000 0000000b) and 0x00003FFF (00000000 00000000 00000000 11111111B), two bytes are used to store the integer, the highest bit of the first byte is 1, and the second bit is 0. The compressed value is like [10 bbbbbb bbbbbbbb] B.
When the integer is between 0x00004000 (00000000 00000000 000000000000b) and 0x1FFFFFFF (01000000 00011111 11111111 11111111B), the integer is rounded up to 4 bytes, the highest and second bits of the first byte are 1, and the third bits are 0. The compressed value is like [110 bbbbb bbbbbbbbbbbbbbbb bbbbbbbbbb] B.
The compression algorithm adopts the big ending number method, that is, the first byte is the highest byte of the original integer.

Figure 1 intuitively shows the interval division.

Figure 1-integer Compression Algorithm

Extract Algorithm Description

If the first byte b0 is read, for example, 0 bbbbb (bitwise AND operation with 0x80, and the result is 0x00), an integer is stored in one byte. Original integer = b0.
If the first byte b0 is read, for example, 10 bbbbbb (bitwise AND operation with 0xC0, and the result is 0x80), two bytes are used to store the integer, read one byte b1. Original integer = (b0 & 0x3F) <8 | b1.
If the first byte b0 is read, for example, 110 bbbbb (bitwise AND operation with 0xD0, and the result is 0xC0), the integer is stored in four bytes, you need to read three more bytes b1, b2, and b3. Original integer = (b0 & 0x1F) <24 | b1 <16 | b2 <8 | b3.

Decompression algorithm reference implementation

Listing 1-Implementation of the decompression algorithm (C # description)

Public static UInt32 Decompress (this Byte [] data) { if (data = null) throw new ArgumentNullException ("data "); if (data. length = 0) throw new InvalidCompressedIntegerException (); if (data [0] & 0x80/* (Limit 00b) */) = 0 // use one byte for storage (0 bbbbbbb B) & data. length = 1) { return (UInt32) data [0]; } else if (data [0] & 0xC0/* (111000000b) */) = 0x80/* (0000000b) * /// use two bytes to store the size (10 bbbbbbbb B) & data. length = 2) { return (UInt32) (data [0] & 0x3F/* (00111111B )*/) <8 | data [1]); } else if (data [0] & 0xE0/* (11100000B )*/) = 0xC0/* (11000000B) * // use four bytes to store the size (110 bbbbb bbbbbbbbbbbbbbbbbb B) & data. length = 4) { return (UInt32) (data [0] & 0x1F/* (00011111B )*/) <24 | data [1] <16 | data [2] <8 | data [3]); } else { throw new InvalidCompressedIntegerException (); } }

EOF.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Memorandum-compress integers and decompress them

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support