Gzip and deflate: in-depth analysis of gzip algorithm principles

Source: Internet
Author: User
Tags crc32 rfc

Gzip and deflate: in-depth analysis of gzip algorithm principles-

Gzip and deflate: in-depth analysis of gzip algorithm principles

[Copy link]

   
Verohan Verohan Offline
Q Bean
22
Qcoin
0
Online time
1 hour
Points
35
Excellent
1
Read Permission
150
Registration Time
2011-4-22
Last login
2011-6-16
Post
7
Peakar card  

4

Topic

1

Audience

35

Points

Official team

Q Bean
22
Qcoin
0
Registration Time
2011-4-22
Share
0
Points
35
UID
198
    • Listen to ta
    • Send message
Direct elevator

LandlordPosted on 17:05:15 |View the author only|Reverse browsing

This post was last edited by verohan

Decompress deflate and GzipCodeAlmost identical. A piece of code can be merged.
The only difference is:

  • Deflate uses inflateinit (), while gzip uses inflateinit2 () for initialization. Another parameter than inflateinit () is-max_wbits, which indicates processing raw deflate data. Because zlib in gzip data does not contain two bytes of zlib header. When inflateinit2 is used, zlib library is required to ignore zlib header. In the zlib manual, windowbits is required to be 8 .. 15, but in fact the data in other ranges has a special effect. See the notes in zlib. H. For example, a negative number indicates raw deflate.
  • The deflate variant of Apache may not have the zlib header. You need to add a false header for post-processing. That is, the MS error deflate (raw deflate). zlib header 1st bytes are generally 0 × 78, and the dual-byte combination of 2nd bytes and the first byte should be able to be divisible by 31. For details, see rfc1950. For example, the false header of zlib in Firefox is 0 × 7801, and the result header of Python zlib. Compress () is 0x789c.

Check the comments in zlib. H again and find such a passage in the zlib-1.2.3/zlib. h line 500:

The windowbits parameter is the base two logarithm of the window size
(The size of the History buffer). It shocould be in the range 8 .. 15 for this
Version of the library. Larger values of this parameter result in better
Compression at the expense of memory usage. The default value is 15 if
Deflateinit is used instead.
Windowbits can also be-8 ..-15 for raw deflate. In this case,-windowbits
Determines the window size. Deflate () will then generate raw deflate data
With no zlib header or trailer, and will not compute an adler32 check value.
Windowbits can also be greater than 15 for optional gzip encoding. Add
16 to windowbits to write a simple gzip header and trailer around
Compressed data instead of a zlib wrapper. The gzip header will have no
File name, no extra data, no comment, no modification time (set to zero ),
No header CRC, and the operating system will be set to 255 (unknown). If
Gzip stream is being written, STRM-> Adler is a CRC32 instead of an adler32.

Let's look at the implementation of nginx and Apache:
Nginx-0.6.34/src/HTTP/modules/ngx_http_gzip_filter_module.c line 335:

Rc = deflateinit2 (& CTX-> zstream, (INT) conf-> level, z_deflated,
-Wbits, memlevel, z_default_strategy );

Httpd-2.0.63/modules/filters/mod_deflate.c line 374:

ZrC = deflateinit2 (& CTX-> stream, C-> compressionlevel, z_deflated,
C-> windowsize, C-> memlevel,
Z_default_strategy );
(Line 153: C-> windowsize = I *-1 ;)

That is to say, nginx and ApacheProgramRaw deflate data and windowbits are all negative. Why does content-encoding write gzip instead of deflate?
In mod_deflate.c of Apache, the following action is first found:

/* RFC 1952 section 2.3 dictates the gzip header:
*
* +-+
* | Id1 | Id2 | cm | flg | mtime | XFL | OS |
* +-+
*
* If we wish to populate in mtime (as hinted in RFC 1952), do:
* Putlong (date_array, apr_time_now ()/apr_usec_per_sec );
* Where date_array is a char [4] And then print date_array in
* Mtime position. Warning: endianness issue here.
*/
Buf = apr_psprintf (R-> pool, "% C", deflate_magic [0],
Deflate_magic [1], z_deflated, 0/* flags */,
0, 0, 0, 0/* 4 chars for mtime */,
0/* xflags */, OS _code );

Deflate_magic is defined as follows:

/* Magic header */
Static char deflate_magic [2] = {'\ 037',' \ 213 ′};

While OS _code is in zutil. as defined in H, amiga is 1, vaxc is 2, os2 is 6, Win32 is 11, and UNIX is 3 by default (from this order, we can see the development history of the operating system)
Count, 10 bytes, and then think of the boss's 18 bytes. Look carefully and finally find such an additional tail action in line 462:

Buf = apr_palloc (R-> pool, 8 );
Putlong (unsigned char *) & Buf [0], CTX-> CRC );
Putlong (unsigned char *) & Buf [4], CTX-> stream. total_in );
B = apr_bucket_pool_create (BUF, 8, R-> pool, F-> C-> bucket_alloc );
Apr_brigade_insert_tail (CTX-> BB, B );

Not much, 8 bytes. The header of 10 bytes and the tail of 8 bytes are the 18 extra bytes that the boss said. Apache calls the zlib interface to generate raw defalte data, and then manually adds the gzip header and tail.
Similarly, in nginx's ngx_http_gzip_filter_module.c, we first saw in Line 179 that Igor Sysoev was very irresponsible and defined such a gzip header:

Static u_char gzheader [10] = {0x1f, 0x8b, z_deflated, 0, 0, 0, 0, 0, 0, 3 };

Take a closer look at the last one! I wrote 3 directly! Will this cause the decompress on the client to be abnormal when nginx compiled on Windows outputs the gzip compressed page? Go back and check the decompressed files in zlib.AlgorithmIn the code, how does one deal with this OS _code.
Continue searching. In line 351, the author also wrote a comment (although the more I see it, the less I understand what he is trying to express ):

B-> memory = 1;
B-> Pos = gzheader;
B-> last = B-> POS + 10;
Out. Buf = B;
Out. Next = NULL;
/*
* We pass the gzheader to the next filter now to avoid its linking
* To the CTX-> busy chain. zlib does not usually output the compressed
* Data in the initial iterations, so the gzheader that was linked
* To the CTX-> busy chain wocould be flushed by ngx_http_write_filter ().
*/

Generally, gzheader is passed to the next filter for processing. This filter only generates raw deflate data and the additional tail. At line 605:

# If (ngx_have_little_endian & ngx_have_nonaligned)
Trailer-> CRC32 = CTX-> CRC32;
Trailer-> zlen = CTX-> Zin;
# Else
Trailer-> CRC32 [0] = (u_char) (CTX-> CRC32 & 0xff );
Trailer-> CRC32 [1] = (u_char) (CTX-> CRC32 >>& 0xff );
Trailer-> CRC32 [2] = (u_char) (CTX-> CRC32> 16) & 0xff );
Trailer-> CRC32 [3] = (u_char) (CTX-> CRC32> 24) & 0xff );
Trailer-> zlen [0] = (u_char) (CTX-> Zin & 0xff );
Trailer-> zlen [1] = (u_char) (CTX-> Zin >>& 0xff );
Trailer-> zlen [2] = (u_char) (CTX-> Zin> 16) & 0xff );
Trailer-> zlen [3] = (u_char) (CTX-> Zin> 24) & 0xff );
# Endif

Fortunately, IBM Motorola built the big endian machine, so that the meaning of this Code can no longer be understood.
It took about three hours to connect to the Internet to search for data and read the code. Now, we know about the following issues:

  • Deflate is the most basic algorithm, which is implemented in zlib.
  • Gzip adds a 10-byte gzheader before the raw data of deflate, an 8-byte validation byte (CRC32 and adler32 optional) at the end, and a length Identifier byte, the magic number of Gzip is 0x1f, 0x8b
  • Zlib also has header and tail validation data. If deflateinit is used instead of deflateinit2, or windowbits is set to 8 ~ 15 words
  • Zlib windowbitsSet to 16When the 4th-bit value is set to 1 (that is, add 16 based on the original value, and thank Antonio for his correction), zlib will generate a gzip header and tail, in this case, OS _code is set to 255 (unknown), and CRC32 is used for Tail verification. The problem is that since zlib itself provides this function, why does Apache and nginx not need to be used, instead, they all choose to be manually added?
  • To add deflate support for nginx, you only need to remove the header and tail in the output, and change content-encoding to deflate. This saves 18 bytes.
  • Continue to support deflate compression for nginx. The company is actually using the latest development version of nginx 0.7.33. Although 502 Bad Gateway appears from time to time, the old master does not agree. Open the 0.7.33 code and check that it is much cleaner than version 0.6. gzip adds the header and the final action is encapsulated into a separate function, it is no longer a big function written from the beginning to the end, making progress.
    At first, I wanted to write a separate module that is parallel to gzip for deflate compression, and use the c file (src/HTTP/modules/ngx_http_gzip_filter_module.c) of the original gzip module to "Search"-"replace ", the compilation is successful, but the new module is not called. Think about it too. The processing of the HTTP request header is not in this c file. It will not work if you modify this file.
    Next, we will change the original gzip C file. The function for adding the header will return directly, and the function for adding the end will also remove the specific adding action. Finally, we will change the content-encoding, one test, haha, it really saves 18 bytes!
    However, Gzip does not support this function. More seriously, if a client (although less likely) only supports gzip and does not support deflate, then it cannot parse the request results. When viewing the ngx_http_gzip_ OK function in src/HTTP/ngx_http_core_module.c, we finally found the processing of accept encoding in the header submitted by the client:
    Ngx_strcasestrn (R-> headers_in.accept_encoding-> value. Data,
    "Gzip", 4-1) = NULL

    while ngx_http_request_t * r is available in almost every function. In order to minimize the number of modified files and code, and to minimize the risk, I chose to make the following judgment for every change in the src/HTTP/modules/ngx_http_gzip_filter_module.c file: whether the client supports deflate. If so, modify it according to deflate, if not, keep the original GZIP format.
    compile, test, UBUNTU 8.10 + Firefox 3.0.4, and httpfox.
    Patch: http://code.google.com/p/fulin/s .... 7.33.deflate.patch
    effect:
    browser accept-encoding:
    gzip: Use gzip
    deflate: Use deflate
    gzip, deflate: use deflate
    none: Do not compress
    nginx version: nginx-0.7.33
    patch usage method: place the patch file at the same level as the nginx-0.7.33 directory, use the command:
    patch-P0 then follow the normal process configure, make, make install

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.