Gzip was first created by Jean-Loup gailly and Mark Adler for File compression in UNIX systems. We often use files suffixed with .gz in linux, which are in GZIP format. Nowadays, it has become a widely used data compression format on the Internet, or a file format. Gzip encoding on HTTP is a technology used to improve the performance of Web applications. Large-Traffic web sites often use gzip compression technology to make users feel faster.
Gzip itself is only a file format, which usually uses the deflate data format internally, while deflate uses the lz77 compression algorithm to compress data.
A gzip file consists of one or more "blocks". Generally, there is only one block. Each block contains three parts: Header, data, and end. The Block overview is as follows:
+ --- + = /// = + ==================/// ===============+ --- + | Id1 | Id2 | cm | flg | mtime | XFL | OS | additional header field | compressed data | CRC32 | isize | + --- + --- + ==============//====================/ /=============+ --- +
1. Header
Id1 and Id2: each 1 byte. Fixed value: id1 = 31 (0x1f), Id2 = 139 (0x8b), indicating GZIP format.
CM: 1 byte. Compression Method. Currently, only one method is available: Cm = 8, indicating the deflate method.
Flg: 1 byte. Flag.Bit 0 ftext-indicates text data
Bit 1 fhcrc-indicates the existence of the crc16 header verification field
Bit 2 fextra-indicates that an option field exists
Bit 3 fname-indicates that the original file name field exists
Bit 4 fcomment-indicates there is a comment field
Bit 5-7 Reserved
Mtime: 4 bytes. Change time. Ubid format.
XFL: 1 byte. Additional flag. When CM = 8, XFL = 2-maximum compression but slowest algorithm; XFL = 4-fastest but least Compression Algorithm
OS: 1 byte. The operating system should be a file system. It has the following definitions:0-fat (MS-DOS, OS/2, NT/Win32)
1-Amiga
2-VMS/OpenVMS
3-Unix
4-VM/CMS
5-Atari TOS
6-HPFs File System (OS/2, NT)
7-Macintosh
8-z-System
9-CP/M
10-tops-20
11-NTFS file system (NT)
12-qdos
13-Acorn Riscos
255-Unknown
Additional header fields:(If flg. fextra = 1)
+ --- + = ==+ | Si1 | I2 | xlen | optional xlen bytes | + --- + ======== =================================+
(If flg. fname = 1)
+ =========================================== ===========+ | Original file name (ending with null) | + ========================================= =============+
(If flg. fcomment = 1)
+ =========================================== ===========+ | Comment text (only iso-8859-1 characters are allowed, end with null) | + ========================================= =============+
(If flg. fhcrc = 1)
+---+---+| CRC16 |+---+---+
When an additional option exists, Si1 and si_2 indicate the option ID, and xlen indicate the number of optional bytes. For example, if Si1 is set to 0x41 ('A') and Si1 is set to 0x70 ('P'), it indicates that the optional values are additional data in the format of the Apollo file.
2. Data SectionThe deflate data format contains a series of child data blocks. The overview of the sub-block is as follows:
+ ...... + ...... + ...... + ================================+ | Bfinal | btype | data | +...... + ...... + ...... + ====================/// ===================+
Bfinal: 1 bit. 0-there are subsequent child blocks; 1-This child block is the last one.
Btype: 2 bits. 00-no compression; 01-static Huffman encoding; 10-dynamic Huffman encoding; 11-Reserved.For the handling process of various situations, refer to the RFC documentation listed later.
3. Tail score
CRC32: 4 bytes. The 32-bit checksum of the original (uncompressed) data.
Isize: 4 bytes. The length of the original (uncompressed) data is 32 characters lower.In gzip, the byte order is LSB, that is, little-Endian, which is the opposite of zlib.
The following is a brief analysis of GZIP file gzip-1.3.3.tar.gz format:
Gzip has a deep relationship with zlib. For more information about zlib, Gzip, and deflate, see RFC 1950-1952. You can also find other references from these documents.
Gzip has become a group of GNU projects. its official website is www.gzip. org. You can download the gzip source code here. The latest version is 1.2.4 and 1.3.3 in Beta.
[Related resources]
Gzip Official Website: www.gzip.org
RFC 1950-zlib compressed data format specification version 3.3
RFC 1951-Deflate compressed data format specification version 1.3
RFC 1952-GZIP file format specification version 4.3
Kernel studio: www.kernelstudio.com
First Release: 2003-12-16
Last revised: 2003-12-16