GZIP, ZLIB, DEFLATE, file format--zz

Source: Internet
Author: User
Tags crc32 rfc
GZIP, ZLIB, DEFLATE, file format from: http://apps.hi.baidu.com/share/detail/19276079 2009-12-07 19:29

Gzip was first created by Jean-loup Gailly and Mark Adler for file compression in UNIX systems. We often use the file suffix. gz in Linux, which is the gzip format. Today has become a very popular data compression format, or a file format, used on the Internet. The gzip encoding on the HTTP protocol is a technique used to improve the performance of Web applications. Large-volume Web sites often use gzip compression technology to allow users to experience faster speeds.

Gzip itself is only a file format, which typically uses deflate data format, while deflate uses LZ77 compression algorithms to compress data.

The gzip file consists of 1 to multiple "blocks", and in fact it is usually only 1 pieces. Each block contains the header, the data, and the tail three parts. The general outline of the block is as follows:

+---+---+---+---+---+---+---+---+---+---+========//========+===========//==========+---+---+---+---+---+---+--- +---+
| id1| id2| cm| Flg|     Mtime     | xfl| os|   Extra header Fields   |       Compressed Data      |     CRC32     |     Isize     |
+---+---+---+---+---+---+---+---+---+---+========//========+===========//==========+---+---+---+---+---+---+--- +---+
1. The first partID1 and ID2:1 bytes each. Fixed value, ID1 = to (0x1F), ID2 = 139 (0X8B), indicating gzip format. Cm:1 bytes. Compression method. At present there is only one: CM = 8, indicating the deflate method. Flg:1 bytes. Sign.

Bit 0 Ftext-Indicates text data
Bit 1 FHCRC-Indicates existence of CRC16 header checksum field
Bit 2 Fextra-Indicates the existence of an option field
Bit 3 FNAME-Indicates the existence of the original file name segment
Bit 4 Fcomment-Indicates that there are annotation fields
Bit 5-7 preserves mtime:4 bytes. Change the time. Uinx format. Xfl:1 bytes. The attached flag. When cm = 8 o'clock, XFL = 2-Maximum compressed but slowest algorithm; XFL = 4-the fastest but least compressed algorithm os:1 bytes. The operating system, exactly, should be the file system. There are the following definitions:

0-fat File System (MS-DOS, OS/2, Nt/win32)
1-amiga
2-vms/openvms
3-unix
4-vm/cms
5-atari TOS
6-hpfs file System (OS/2, NT)
7-macintosh
8-z-system
9-cp/m
10-tops-20
11-ntfs file System (NT)
12-qdos
13-acorn Riscos
255-Unknown Extra header field:

(If FLG.) Fextra = 1)

+---+---+---+---+===============//================+

| si1| si2|  Xlen |      Options with length of Xlen bytes     |

+---+---+---+---+===============//================+

(If FLG.) FNAME = 1)

+=======================//========================+

|               Original filename (null-terminated)              |

+=======================//========================+

(If FLG.) Fcomment = 1)

+=======================//========================+

|   Comment Text (use only iso-8859-1 characters, ending in null)  |
+=======================//========================+

(If FLG.) FHCRC = 1)

+---+---+

| CRC16 |
+---+---+

When there are additional options available, SI1 and SI2 indicate that the option Id,xlen indicates the number of bytes that can be selected. such as SI1 = 0x41 (' A '), SI2 = 0x70 (' P '), which indicates that an option is an additional data in the Apollo file format. 2. The data section

Deflate data format that contains a series of child data blocks. The outline of the sub blocks is as follows:

+......+......+......+=============//============+

| bfinal|    Btype    |            Data           |
+......+......+......+=============//============+

Bfinal:1 bit. 0-There are subsequent child blocks, 1-the child block is the last piece. Btype:2 bit. 00-Uncompressed, 01-Static Huffman coding compression, 10-Dynamic Huffman encoding compression; 11-retention.

For a variety of situations, refer to the RFC documentation listed later. 3. Tail part Crc32:4 byte. 32-bit checksum of raw (uncompressed) data. Isize:4 bytes. The original (uncompressed) data has a low 32-bit length.

The byte order in gzip is the LSB, that is, the Little-endian, as opposed to the zlib.

Gzip and zlib have a deep source. For more detailed instructions on zlib, gzip and deflate, refer to RFC 1950-1952. Other references can also be found in these documents.

Gzip has become an integral part of GNU project and its official site is www.gzip.org. Here you can download to the GZIP source code. The latest version is 1.2.4, as well as the beta version of 1.3.3.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.