Transfer-Encoding chunked Encoding in HTTP 1.1 | haohtml's blog

Source: Internet
Author: User

Transfer-Encoding chunked Encoding in HTTP 1.1 | haohtml's blog

Transfer-Encoding chunked Encoding in HTTP 1.1
Posted on 2010/07/24 by admin

The Content-Length header is contained in the HTTP Headers sent by the corresponding user requests of most sites. this header is defined in Chapter 1945 of RFC 10.4 of HTTP1.0. this information is used to inform the user proxy. Generally, it is the length of the document content sent by the browser and the server. after receiving this information, the browser will parse the page after receiving the Length bytes defined in Content-Length. if some data is delayed on the server side, the browser will display a white screen. this results in poor user experience.

The solution is in the HTTP1.1 protocol. in RFC2616, the Transfer-Encoding: chunked header defined in section 14.41. chunked encoding is defined in 3.6.1. according to this definition, the browser does not need to wait until all the content bytes are downloaded. As long as a chunked block is received, the page can be parsed. you can also download the page content defined in html, including js, css, and image. two options are available for chunked encoding. One is to set the IO buffer length of the Server to automatically flush the content in the buffer, and the other is to manually call the flush function in IO. The flush function is available in different language IO:

Php: ob_flush (); flush ();
Perl: STDOUT-> autoflush (1 );
Java: out. flush ();
Python: sys. stdout. flush ()
Ruby: stdout. flush

The following two figures show that the Transfer-Encoding: chunked of HTTP1.1 is used, and the buffer of IO is flushed so that the browser can download the page's supporting resources earlier.

========================================================== ======
As described in the previous log, when the Length of the message style cannot be determined in advance, it is impossible to include the Content-Length Field in the header to specify the Length of the message style, in this case, the length of the message style needs to be determined through the Transfer-Encoding domain.
Generally, the value of the Transfer-Encoding field should be chunked, indicating that chunked Encoding is used to transmit the style of the report. Chunked encoding is defined in the HTTP/1.1 RFC. Therefore, all HTTP/1.1 applications should support this encoding method.
The basic method of chunked encoding is to split large data blocks into multiple small data blocks, each of which can have a specific length. The specific format is as follows (BNF syntax ):
Chunked-Body = * chunk // 0 at most chunk
Last-chunk // The last chunk
Trailer // tail
CRLF // end mark
Chunk = chunk-size [chunk-extension] CRLF
Chunk-data CRLF
Chunk-size = 1 * HEX
Last-chunk = 1 * ("0") [chunk-extension] CRLF
Chunk-extension = * (";" chunk-ext-name ["=" chunk-ext-val])
Chunk-ext-name = token
Chunk-ext-val = token | quoted-string
Chunk-data = chunk-size (OCTET)
Trailer = * (entity-header CRLF)
Explanation:
Chunked-Body indicates the news style after chunked encoding. The newspaper style can be divided into four parts: chunk, last-chunk, trailer, and terminator. The number of chunks can be at least 0 in the report style, with no upper limit. The length of each chunk is self-specified, that is, the starting data must be a string of hexadecimal numbers, the length of chunk-data (number of bytes ). If the first character of the hexadecimal string is "0", the chunk-size is 0, the chunk is last-chunk, and there is no chunk-data. The optional chunk-extension is determined by both parties. If the recipient does not understand its meaning, it can be ignored.
Trailer is an additional header field appended to the end. It usually contains metadata (metadata, meta means "about information") that can be appended to an existing header field after decoding.
Instance analysis:
The following is an analysis of the result of using ethereal to capture packets and using Firefox to communicate with a website (starting from the end of the domain ):
Address 0...
000c0 31
000d0 66 63 0d 0a ....../ASCII code: 1ffc \ r \ n, chunk-data start address: 000d5
Obviously, "1ffc" is the chunk-size of the first chunk and is converted to int 8188.
CRLF, so there is no starting address of the chunk-extension.chunk-data is 000d5, the calculation can know the start of the next chunk
The address is 000d5 + 1ffc + 2 = 020d3, as follows:
020d0 .. 0d 0a 31 66 63 0d 0a... // ASCII code: \ r \ n1ffc \ r \ n
The first 0d0a is the end mark of the previous chunk, And the last 0d0a is the separator of chunk-size and chunk-data.
The chunk length is also 8188, and so on until the last chunk
100e0 0d 0a 31
100f0 65 61 39 0d 0a ...... // ASII code: \ r \ n \ 1ea9 \ r \ n
The block length is 0x1ea9 = 7849, And the next block starts with 100f5 + 1ea9 + 2 = 11fa0, as shown below:
100a0 30 0d 0a 0d 0a // ASCII code: 0 \ r \ n
"0" indicates that the current chunk is last-chunk, and the first 0d 0a is the chunk Terminator. The second 0d0a indicates that there is no trailer and the entire Chunk-body ends.
Decoding process:
The purpose of chunked encoding decoding is to combine chunk-data in chunked blocks into one block as the reporting style, and record the length of the chunked block.
The decoding process included in RFC2616 is as follows: (pseudo code)
Length: = 0 // set the length counter to 0
Read chunk-size, chunk-extension (if any) and CRLF // read chunk-size, chunk-extension
// And CRLF
While (chunk-size> 0) {// indicates not last-chunk
Read chunk-data and CRLF // read chunk-size chunk-data, skip CRLF
Append chunk-data to entity-body // append this chunk-data block to entity-body
Read chunk-size and CRLF // read the chunk-size and CRLF of the new chunk
}
The format of read entity-header // entity-header is name: valueCRLF. If it is null, only CRLF
While (entity-header not empty) // that is, it is not a blank line with only CRLF
{
Append entity-header to existing header fields
Read entity-header
}
Content-Length: = length // The length of the new message style calculated after the entire decoding process is completed
// Write the message as the value of the Content-Length Field
Remove "chunked" from Transfer-Encoding // Remove the chunked mark from the domain value in Transfer-Encoding at the same time
The final value of length is actually the sum of chunk-size of all chunks. In the above packet capture instance, there are eight chunks whose chunk-size is 0x1ffc (8188, the remaining part is 0x1ea9 (7849), which is a total of 73353 bytes.
Note: the size of the first chunk in the preceding example is 8188, probably because: "1ffc" 4 bytes, "\ r \ n" 2 bytes, A chunk with two "\ r \ n" bytes at the end of the block is 8 bytes in total. Therefore, the overall size of a chunk is 8196, which may be the cache size of a TCP sending by the sending end.

As described in the previous log, when the Length of the message style cannot be determined in advance, it is impossible to include the Content-Length Field in the header to specify the Length of the message style, in this case, the length of the message style needs to be determined through the Transfer-Encoding domain. Generally, the value of the Transfer-Encoding field should be chunked, indicating that chunked Encoding is used to transmit the style of the report. Chunked encoding is defined in the HTTP/1.1 RFC. Therefore, all HTTP/1.1 applications should support this encoding method. The basic method of chunked encoding is to split large data blocks into multiple small data blocks, each of which can have a specific length. The specific format is as follows (BNF syntax ): chunked-Body = * chunk // 0 maximum chunk last-chunk // last chunk trailer // tail CRLF // end mark
Chunk = chunk-size [chunk-extension] CRLF chunk-data CRLF chunk-size = 1 * HEX last-chunk = 1 * ("0") [chunk-extension] CRLF
Chunk-extension = * (";" chunk-ext-name ["=" chunk-ext-val]) chunk-ext-name = token chunk-ext-val = token | quoted-string chunk-data = chunk-size (OCTET) trailer = * (entity-header CRLF) Explanation: chunked-Body indicates the news style after chunked encoding. The newspaper style can be divided into four parts: chunk, last-chunk, trailer, and terminator. The number of chunks can be at least 0 in the report style, with no upper limit. The length of each chunk is self-specified, that is, the starting data must be a string of hexadecimal numbers, the length of chunk-data (number of bytes ). If the first character of the hexadecimal string is "0", the chunk-size is 0, the chunk is last-chunk, and there is no chunk-data. The optional chunk-extension is determined by both parties. If the recipient does not understand its meaning, it can be ignored. Trailer is an additional header field appended to the end. It usually contains metadata (metadata, meta means "about information") that can be appended to an existing header field after decoding. Example Analysis: The following is an analysis of the result of using ethereal to capture packets and using Firefox to communicate with a website (starting with the domain Terminator ): address 0 .......................... f000c0 31000d0 66 63 0d 0a ............... // ASCII code: 1ffc \ r \ n. The starting address of chunk-data is 000d5, and "1ffc" is the chunk-size of the first chunk, which is converted to int 8188. because 1ffc is CRLF immediately after, so no chunk-extension.chunk-data starting address is 000d5, the calculation can be seen that the next chunk starting address is 000d5 + 1ffc + 2 = 020d3, as follows: 020d0 .. 0d 0a 31 66 63 0d 0a .... // ASCII code: \ r \ n1ffc \ r \ n the previous 0d0a is of the previous chunk End mark. The last 0d0a is the separator between chunk-size and chunk-data. The chunk length of this chunk is also 8188, and so on until 100e0 0d 0a 31100f0 65 61 39 0d 0a ...... // ASII code: \ r \ n \ 1ea9 \ r \ n the length of this block is 0x1ea9 = 7849, and the start of the next block is 100f5 + 1ea9 + 2 = 11fa0, as follows: 100a0 30 0d 0a 0d 0a // ASCII code: 0 \ r \ n "0" indicates that the current chunk is last-chunk, and the first 0d 0a is the chunk Terminator. The second 0d0a indicates that there is no trailer and the entire Chunk-body ends. Decoding process: the chunked encoding is decoded to combine chunk-data into one block as the reporting style and record the length of the block. The decoding process included in RFC2616 is as follows: (pseudo code) length: = 0 // The length counter sets 0 read chunk-size, chunk-extension (if any) and CRLF // read chunk-size, chunk-extension // and CRLF while (chunk-size> 0) {// indicates that the chunk is not the last-chunk read chunk-data and CRLF // read the chunk-size chunk-data, skip CRLF append chunk-data to entity-body // append the chunk-data block to the entity-body and read chunk-size and CRLF // read the chunk-size and CRLF} read entity-header // The format of entity-header is n Ame: valueCRLF. If it is null, only CRLF while (entity-header not empty, not only the empty lines of CRLF {append entity-header to existing header fields read entity-header} Content-Length: = length // write the length of the new message style calculated after the entire decoding process is completed // as the value of the Content-Length field into the message Remove "chunked" from Transfer-Encoding // in Transfer-Encoding, the domain value removes chunked. The final value of length is actually the sum of chunk-size of all chunks, in the above packet capture instance, there are a total of eight chunks whose chunk-size is 0x1ffc (8188), and the remaining chunks are 0x1ea9 (7849 ). To a total of 73353 bytes. Note: the size of the first chunk in the preceding example is 8188, probably because: "1ffc" 4 bytes, "\ r \ n" 2 bytes, A chunk with two "\ r \ n" bytes at the end of the block is 8 bytes in total. Therefore, the overall size of a chunk is 8196, which may be the cache size of a TCP sending by the sending end.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.