HTTP protocol file compression

Last Update:2015-02-28 Source: Internet

Author: User

Tags webp

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First , HTTP protocol header:
The server automatically sends the most appropriate version based on certain fields in the request header sent by the client. There are two types of request header fields that can be used for this mechanism: the Accept field, the other fields.

Request Header Field	Description	Response Header Field
Accept-encoding	Tell the server which compression method to use	Content-encoding

For example, the client sends a request header:

accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-encoding:gzip, deflate, SDCH
accept-language:zh-cn,zh;q=0.8

Accept: Preferential acceptance of text/html .... After accepting IMAGE/WEBP ... Accept-encoding: Supports compressed resources using gzip, deflate, or SDCH Accept-language: Supports ZH-CN and zh two languages, q denotes weight value (0~1) between
Browser response header:

Content-encoding:gzip
content-type:text/html; Charset=utf-8

The exact MIME type for this document is text/html, the document content is gzip compressed, the response header does not have a content-language field, and the returned version of the language is exactly the one with the highest weight in the request header accept-language.
1) Next we will analyze the full process of HTTP compression in detail A, the browser sends an HTTP request to the Web server, the request has accept-encoding:gzip,deflate. B. After the Web server receives the request, it generates the original response, which has the original Content-type and Content-length. C, the Web server through Gzip encoding response, after the header has Content-type and content-length (compressed size), and added Content-encoding:gzip. Then send the response to the browser. D, after the browser received response, according to Content-encoding:gzip to encode the response. After you get to the original response, the page is then displayed. The entire flowchart is as follows:
2) through Fiddler capture packet observation before decoding display:
After decoding show:

second, the browser supports three kinds of HTTP transmission compression algorithm
1) SDCH compression algorithm SDCH is the abbreviation of Shared dictionary compression over HTTP, which compresses the same content in each page through a dictionary compression algorithm, reducing the same content transmission. such as: A site is generally common head and tail, and even some sidebar is common. Before the way each page opens, these common information will be reloaded, but using SDCH compression, the common content is only transmitted once.
SDCH is divided into 3 main sections: first request, download dictionary, other requests.
------------First Request----------client: ACCEPT-ENCODING:SDCH server: get-dictionary:/path/to/dict
-----------Download the dictionary-----------the client downloads a dictionary based on the value of get-dictionary, a normal HTTP request.
-----------Other Requests-----------client: accept-encoding:sdchavail-dictionary:xxx Server side: SDCH encoding based on the value of Avail-dictionary, If there is gzip in the accept-encoding, the data will also be compressed by gzip and then returned.

SDCH and AJAX+PUSHSTATESDCH are compressed in order to reduce the transmission of the same content, and ajax+pushstate the same is to reduce the transmission of the same content. SDCH is Google out, but Pushstate is a standard for H5, now has Chrome and Firefox support, then there will be more and more browser support.
2) gzip compression algorithm means that the entity uses the GNU ZIP Code, lossless compression algorithm, to reduce the size of the transmission message, will not lead to loss of information. The most efficient and most widely used. Compression: Find similar strings in a text file and temporarily replace them to make the entire file smaller. This form of compression works well for the web because HTML and CSS files often contain a large number of duplicate strings, such as spaces, tags. Gzip can set the compression ratio, the value range between 1 (lowest) to 9 (highest), is not recommended to set too high, although there is a high compression rate, but the use of more CPU resources. Gzip compression is relatively poor for picture compression a) explains the fundamentals of gzip using compression algorithm gzip for files to be compressed, first use a variant of the LZ77 algorithm to compress, and then use the Huffman encoding method for the resulting results (gzip will choose to use the static Huffman encoding or Dynamic Huffman encoding) for compression. So understand the LZ77 algorithm and Huffman code compression principle, also understand the principle of gzip compression.
A-1) LZ77 Algorithm Compression principle if there are two pieces of content in the file, then as long as you know the position and size of the previous piece, we can determine the contents of the latter piece. So we can replace the latter piece of information with a pair of messages (the distance between the two, the length of the same content). Because (the distance between the two, the length of the same content) this pair of information size, less than the size of the replaced content, so the file is compressed. Example: There is a file content a.conf http://www.baidu.com/http://m.baidu.com/Some parts of the content, the previous has appeared, the following () is enclosed in the section is the same part. http://www.baidu.com/(http://) m (. baidu.com) We use (the distance between the two, the length of the same content) such a pair of information to replace the latter piece of content http://www.baidu.com/(7) m (30 , 11) (22, 7), 22 is the distance between the same content block and the current position, and 7 is the length of the same content. (30, 11), 30 is the distance between the same content block and the current position, and 11 is the length of the same content. Because (the distance between the two, the length of the same content) this pair of information size, less than the size of the replaced content, so the file is compressed.
A-2) Huffman encoding We think of a positional long value in a file as a symbol, such as a 8-bit long 256 value, which is the 256 value of the byte as a symbol. We re-encode these symbols based on how often these symbols appear in the file. For a very large number of occurrences, we use fewer bits to represent, for the occurrence of very little, we use more bits to represent. In this way, some parts of the file are less bits, some parts of the number is more, because the smaller part of the larger part, so the size of the whole file will be reduced, so the file is compressed.
3) Deflate compression algorithm deflate is a lossless data compression algorithm using LZ77 algorithm and Huffman coding (Huffman Coding). It was originally defined by Phil Katz for his second edition of the PKZip Archive tool, which was later defined in the RFC 1951 specification.
It is generally accepted that deflate is not subject to any patents, and that the format is applied in the gzip compressed file as well as the PNG image file, in addition to being applied in the ZIP file format, prior to the invalidation of the patent associated with LZW (GIF file format).
Deflate compression and decompression of the source code can be found on the free, Universal compression library zlib.
The higher compression rate of the deflate is achieved by 7-zip. AdvanceCOMP also uses this implementation, which compresses gzip, PNG, MNG, and zip files to get smaller file sizes than zlib. In Ken Silverman's Kzip and PNGOut, a more efficient deflate program that requires more user input is used.
Deflate is a compression algorithm, which is a kind of enhancement of Huffman coding.
Deflate is almost identical to the code that gzip extracts, and you can synthesize a piece of code.
iii. the difference between gzip and deflate
Deflate uses Inflateinit (), while Gzip initializes with InflateInit2 (), one more parameter than Inflateinit ():-max_wbits, which represents processing raw deflate data. Because the zlib compressed data block in the GZIP data does not have a zlib header of two bytes. The Zlib library is required to ignore the zlib header when using InflateInit2. In the Zlib manual, Windowbits is required to be 8. 15, but in fact other ranges of data have a special effect, see Zlib.h in the comments, such as negative numbers indicate raw deflate.deflate is the most basic algorithm, Gzip added 10 bytes of Gzheader in front of deflate raw data, the tail adds 8 bytes of check bytes (optional CRC32 and ADLER32) and length identification bytes.

HTTP protocol file compression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More