Analysis of chunked encoding in HTTP1.1

Source: Internet
Author: User
Tags flush rfc

HTTP1.1 Analysis of chunked encoding

In general HTTP communication, the Content-length header information is used to inform the user agent (usually the browser) of the length of the document content sent by the server, which is defined in the HTTP1.0 Protocol RFC 1945 10.4 section. After receiving this header information, the browser begins parsing the page after accepting the length bytes defined in Content-length, but if there is some delay in sending the server side, the browser white screen will appear, resulting in a poor user experience.

The solution is in the HTTP1.1 protocol, the transfer-encoding:chunked header information defined in section 14.41 of RFC 2616, chunked encoding defined in 3.6.1, all HTTP1.1 The application supports this method of dynamically providing the length of the body content using trunked encoding. The HTTP data to be transmitted in chunked encoding is set at the message header:transfer-encoding:chunked indicates that the content body will transmit the contents with chunked encoding. By definition, the browser does not need to wait until the content bytes are fully downloaded, as long as a chunked block is received to parse the page. And you can download page content defined in HTML, including Js,css,image.

There are two options for using chunked encoding, one is to set the server's IO buffer length so that the server automatically flush the contents of buffer, and the other is to manually invoke the flush function in IO. The flush feature is available in different language IO:

L Php:ob_flush (); Flush ();

L Perl:stdout->autoflush (1);

L Java:out.flush ();

L Python:sys.stdout.flush ()

L Ruby:stdout.flush

Use HTTP1.1 's transfer-encoding:chunked, and flush the IO buffer so that the browser can download the page companion resources earlier. When the length of the newspaper style cannot be predetermined, it is impossible to include the Content-length field in the head to indicate the style length of the newspaper, and it is necessary to determine the newspaper style length through the transfer-encoding domain.

The chunked encoding is typically concatenated with several chunk, ending with a chunk marked with a length of 0. Each chunk is divided into the head and the body two parts, the head content Specifies the total number of characters in the next paragraph of the body (not 0 hexadecimal digits) and quantity units (typically not written, representing bytes). The body part is the actual content of the specified length, separated by a carriage return line (CRLF) between the two parts. In the last chunk of length 0 is the content called footer, which is some additional header information (which can usually be ignored directly).

The above explanation is too official, in short,the basic method ofchunked encoding is to break up chunks of data into chunks of small data, each of which can be of a specified length, in the following format (BNF grammar):

Chunked-body = *chunk//0 up to a chunk

Last-chunk//Last Chunk

Trailer//Tail

CRLF//End Marker

Chunk = chunk-size [Chunk-extension] CRLF

Chunk-data CRLF

Chunk-size = 1*hex

Last-chunk = 1* ("0") [Chunk-extension] CRLF

chunk-extension= * (";" chunk-ext-name ["=" chunk-ext-val])

Chunk-ext-name = Token

Chunk-ext-val = Token | Quoted-string

Chunk-data = Chunk-size (OCTET)

Trailer = * (Entity-header CRLF)

Explain:

L Chunked-body means the newspaper style after Chunked coding. Newspaper style can be divided into chunk, Last-chunk,trailer and Terminator four parts. The number of chunk in the newspaper style can be at least 0, no limit;

L The length of each chunk is self-specified, that is, the starting data must be a string of 16 digits, representing the length (in bytes) of the back chunk-data. The first character of this 16-binary string, if "0", indicates that Chunk-size is 0, the chunk is last-chunk, and there is no chunk-data part.

The optional chunk-extension is determined by the communication parties themselves, and can be ignored if the recipient does not understand its meaning.

L trailer is an additional header field attached at the tail, usually containing some meta data (metadata, meta means "about information"), which can be appended after decoding after the existing header field

Below is an analysis of the results of using ethereal to use Firefox to communicate with a website (starting from the beginning of the field Terminator):

Address 0 ....... ............. F

000C0 31

000d0 0d 0a ...//ASCII code: 1ffc/r/n, Chunk-data data starting address is 000d5, and so on.

Obviously, "1FFC" is the first chunk of the chunk-size, and the conversion to int is 8188. Since 1FFC, it is CRLF immediately, so there is no chunk-extension. The starting address of the Chunk-data is 000d5, calculating the beginning of the next chunk

The address is 000D5+1FFC + 2=020d3, as follows:

020d0. 0d 0a to + 0d 0a ....//ASCII code:/r/n1ffc/r/n

The previous 0d0a is the end tag of the previous chunk, and the latter 0d0a is the delimiter for Chunk-size and Chunk-data.

The length of this block chunk is also 8188, and so on, until the last piece

100e0 0d 0a 31

100f0 0d 0a ...//asii code:/r/n/1ea9/r/n

This block length is 0x1ea9 = 7849, the next piece starts with 100f5 + 1ea9 + 2 = 11fa0, as follows:

11fa0 0d 0a 0d 0a//ascii code: 0/r/n/r/n

"0" indicates that the current chunk is last-chunk, and the first 0d 0a is the chunk terminator. The second 0d0a explains that there is no trailer part, and the whole chunk-body ends.

Decoding process:

The purpose of decoding the chunked encoding is to restore the Chunk-data integration of the block into a piece as a newspaper style, while recording the length of the block.

The decoding process included in the RFC2616 is as follows: (pseudo code)

Length: = 0//Width counter 0

Read Chunk-size, chunk-extension (if any) and CRLF//reads chunk-size, chunk-extension and CRLF

while (Chunk-size > 0)

{//Indicates not Last-chunk

Read Chunk-data and CRLF//read chunk-size size Chunk-data,skip CRLF

Append chunk-data to Entity-body//append this block chunk-data to Entity-body

Length: = length + chunk-size

Read chunk-size and CRLF//Read new chunk Chunk-size and CRLF

}

The format of Read Entity-header//entity-header is NAME:VALUECRLF, if it is empty then only CRLF

while (Entity-header not empty)//That is, not only CRLF empty lines

{

Append Entity-header to existing header fields

Read Entity-header

}

Content-length:=length//The new newspaper style Length calculated after the entire decoding process is completed, as the value of the Content-length field is written to the message

Remove "chunked" from transfer-encoding//remove chunked This tag from the Transfer-encoding field value at the same time

The last value of length is actually the sum of all chunk's chunk-size, and in the above case, a total of eight chunk-size is 0X1FFC (8188) Chunk, the remaining block is 0x1ea9 (7849), adding up altogether 73353 bytes.
Note: For the above example, the first few

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.