Tranfer-encoding in HTTP
When the length of the report style cannot be determined in advance, it is impossible to include the Content-Length Field in the header to specify the length of the report style. In this case, the transfer-encoding field must be used to determine the length of the report style.
Generally, the value of the transfer-encoding field should be chunked, indicating that chunked encoding is used to transmit the style of the report. Chunked encoding is defined in the HTTP/1.1 RFC. Therefore, all HTTP/1.1 applications should support this encoding method.
The basic method of chunked encoding is to split large data blocks into multiple small data blocks, each of which can have a specific length. The specific format is as follows (BNF syntax ):
Chunked-Body = * chunk // 0 at most chunk
Last-chunk // The last chunk
Trailer // tail
CRLF // end mark
Chunk = chunk-size [chunk-extension] CRLF
Chunk-data CRLF
Chunk-size = 1 * hex
Last-chunk = 1 * ("0") [chunk-extension] CRLF
Chunk-extension = * (";" chunk-ext-name ["=" chunk-ext-Val])
Chunk-ext-name = token
Chunk-ext-val = token | quoted-string
Chunk-Data = chunk-size (octet)
Trailer = * (entity-header CRLF)
Explanation:
Chunked-body indicates the news style after chunked encoding. The newspaper style can be divided into four parts: Chunk, last-chunk, trailer, and terminator. The number of chunks can be at least 0 in the report style, with no upper limit. The length of each chunk is self-specified, that is, the starting data must be a string of hexadecimal numbers, the length of chunk-data (number of bytes ). If the first character of the hexadecimal string is "0", the chunk-size is 0, the chunk is last-chunk, and there is no chunk-data. The optional chunk-extension is determined by both parties. If the recipient does not understand its meaning, it can be ignored.
Trailer is an additional header field appended to the end. It usually contains metadata (metadata, Meta means "about information") that can be appended to an existing header field after decoding.
Instance analysis:
The following is an analysis of the result of using Ethereal to capture packets and using Firefox to communicate with a website (starting from the end of the domain ):
Address 0...
000c0 31
000d0 66 63 0d 0a ....../ASCII code: 1ffc/R/N, chunk-data Data start address: 000d5
Obviously, "1ffc" is the chunk-size of the first chunk and is converted to int 8188.
CRLF, so there is no starting address of the chunk-extension.chunk-data is 000d5, the calculation can know the start of the next chunk
The address is 000d5 + 1ffc + 2 = 020d3, as follows:
020d0... 0d 0a 31 66 63 0d 0a... // ASCII code:/R/n1ffc/R/n
The first 0d0a is the end mark of the previous chunk, And the last 0d0a is the separator of chunk-size and chunk-data.
The chunk length is also 8188, and so on until the last chunk
100e0 0d 0a 31
100f0 65 61 39 0d 0a ...... // asii code:/R/n/1ea9/R/n
The block length is 0x1ea9 = 7849, And the next block starts with 100f5 + 1ea9 + 2 = 11fa0, as shown below:
100a0 30 0d 0a 0d 0a // ASCII code: 0/R/n
"0" indicates that the current Chunk is last-chunk, and the first 0d 0a is the chunk Terminator. The second 0d0a indicates that there is no trailer and the entire chunk-body ends.
Decoding process:
The purpose of chunked encoding decoding is to combine chunk-data in chunked blocks into one block as the reporting style, and record the length of the chunked block.
The decoding process included in rfc2616 is as follows: (pseudo code)
Length: = 0 // set the length counter to 0
Read chunk-size, chunk-extension (if any) and CRLF // read chunk-size, chunk-Extension
// And CRLF
While (chunk-size> 0) {// indicates not last-chunk
Read chunk-data and CRLF // read chunk-size chunk-data, skip CRLF
Append Chunk-data to entity-body // append this chunk-data block to entity-body
Read chunk-size and CRLF // read the chunk-size and CRLF of the new chunk
}
The format of read entity-header // entity-header is name: valuecrlf. If it is null, only CRLF
While (entity-header not empty) // that is, it is not a blank line with only CRLF
{
Append entity-header to existing header fields
Read entity-Header
}
Content-Length: = length // The length of the new message style calculated after the entire decoding process is completed
// Write the message as the value of the Content-Length Field
Remove "chunked" from transfer-encoding // remove the chunked mark from the domain value in transfer-encoding at the same time
The final value of length is actually the sum of chunk-size of all chunks. In the above packet capture instance, there are eight chunks whose chunk-size is 0x1ffc (8188, the remaining part is 0x1ea9 (7849), which is a total of 73353 bytes.
Note: the size of the first chunk in the preceding example is 8188, probably because: "1ffc" 4 bytes, "/R/N" 2 bytes, add a chunk at the end of a chunk with 8 bytes in total. Therefore, the overall Chunk is 8196, which may be the cache size sent by the sending end once by TCP.