Gzip file format

Source: Internet
Author: User
Tags crc32

Gzip was first created by Jean-Loup gailly and Mark Adler for File compression in UNIX systems. We often use files suffixed with .gz in linux, which are in GZIP format. Nowadays, it has become a widely used data compression format on the Internet, or a file format. Gzip encoding on HTTP is a technology used to improve the performance of Web applications. Large-Traffic web sites often use gzip compression technology to make users feel faster.

Gzip itself is only a file format, which usually uses the deflate data format internally, while deflate uses the lz77 compression algorithm to compress data.

A gzip file consists of one or more "blocks". Generally, there is only one block. Each block contains three parts: Header, data, and end. The Block overview is as follows:

+ --- + = /// = + ==================/// ===============+ --- + | Id1 | Id2 | cm | flg | mtime | XFL | OS | additional header field | compressed data | CRC32 | isize | + --- + --- + ==============//====================/ /=============+ --- +
1. Header
  • Id1 and Id2: each 1 byte. Fixed value: id1 = 31 (0x1f), Id2 = 139 (0x8b), indicating GZIP format.
  • CM: 1 byte. Compression Method. Currently, only one method is available: Cm = 8, indicating the deflate method.
  • Flg: 1 byte. Flag.

    Bit 0 ftext-indicates text data
    Bit 1 fhcrc-indicates the existence of the crc16 header verification field
    Bit 2 fextra-indicates that an option field exists
    Bit 3 fname-indicates that the original file name field exists
    Bit 4 fcomment-indicates there is a comment field
    Bit 5-7 Reserved

  • Mtime: 4 bytes. Change time. Ubid format.
  • XFL: 1 byte. Additional flag. When CM = 8, XFL = 2-maximum compression but slowest algorithm; XFL = 4-fastest but least Compression Algorithm
  • OS: 1 byte. The operating system should be a file system. It has the following definitions:

    0-fat (MS-DOS, OS/2, NT/Win32)
    1-Amiga
    2-VMS/OpenVMS
    3-Unix
    4-VM/CMS
    5-Atari TOS
    6-HPFs File System (OS/2, NT)
    7-Macintosh
    8-z-System
    9-CP/M
    10-tops-20
    11-NTFS file system (NT)
    12-qdos
    13-Acorn Riscos
    255-Unknown

  • Additional header fields:

    (If flg. fextra = 1)

    + --- + = ==+ | Si1 | I2 | xlen | optional xlen bytes | + --- + ======== =================================+

    (If flg. fname = 1)

    + =========================================== ===========+ | Original file name (ending with null) | + ========================================= =============+

    (If flg. fcomment = 1)

    + =========================================== ===========+ | Comment text (only iso-8859-1 characters are allowed, end with null) | + ========================================= =============+

    (If flg. fhcrc = 1)

    +---+---+| CRC16 |+---+---+

    When an additional option exists, Si1 and si_2 indicate the option ID, and xlen indicate the number of optional bytes. For example, if Si1 is set to 0x41 ('A') and Si1 is set to 0x70 ('P'), it indicates that the optional values are additional data in the format of the Apollo file.

    2. Data Section

    The deflate data format contains a series of child data blocks. The overview of the sub-block is as follows:

    + ...... + ...... + ...... + ================================+ | Bfinal | btype | data | +...... + ...... + ...... + ====================/// ===================+
  • Bfinal: 1 bit. 0-there are subsequent child blocks; 1-This child block is the last one.
  • Btype: 2 bits. 00-no compression; 01-static Huffman encoding; 10-dynamic Huffman encoding; 11-Reserved.

    For the handling process of various situations, refer to the RFC documentation listed later.

    3. Tail score
  • CRC32: 4 bytes. The 32-bit checksum of the original (uncompressed) data.
  • Isize: 4 bytes. The length of the original (uncompressed) data is 32 characters lower.

    In gzip, the byte order is LSB, that is, little-Endian, which is the opposite of zlib.

    The following is a brief analysis of GZIP file gzip-1.3.3.tar.gz format:

    Gzip has a deep relationship with zlib. For more information about zlib, Gzip, and deflate, see RFC 1950-1952. You can also find other references from these documents.

    Gzip has become a group of GNU projects. its official website is www.gzip. org. You can download the gzip source code here. The latest version is 1.2.4 and 1.3.3 in Beta.

    [Related resources]

  • Gzip Official Website: www.gzip.org
  • RFC 1950-zlib compressed data format specification version 3.3
  • RFC 1951-Deflate compressed data format specification version 1.3
  • RFC 1952-GZIP file format specification version 4.3
  • Kernel studio: www.kernelstudio.com

    First Release: 2003-12-16
    Last revised: 2003-12-16

     

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.