Sometimes, the Web server generates HTTP response is unable to determine the message size in the header, when generally speaking, the server will not provide Content-length header information, and the chunked encoding dynamically provide the length of the body content.
HTTP response for chunked encoded transmissions are set at the message header:
Transfer-encoding:chunked
Indicates that the content body will use chunked encoding to transmit the contents.
The chunked code is concatenated with several chunk, ending with a chunk marked with a length of 0. Each chunk is divided into head and body two parts, the head content specifies the total number of characters of the next body (16 numbers) and the quantity unit (generally does not write), the body part is the actual content of the specified length, the two parts are separated by carriage return Line (CRLF). In the last chunk of length 0, the content is called footer, and is some additional header information (which can often be ignored directly). The specific chunk encoding format is as follows:
Chunked-body = *chunk
"0" CRLF
Footer
CRLF
Chunk = chunk-size [Chunk-ext] CRLF
Chunk-data CRLF
Hex-no-zero =
Chunk-size = Hex-no-zero *hex
Chunk-ext = * (";" chunk-ext-name ["=" chunk-ext-value])
Chunk-ext-name = Token
Chunk-ext-val = Token | Quoted-string
Chunk-data = chunk-size (octet)
Footer = *entity-header
The chunked decoding process in an RFC document is as follows:
Length: = 0
Read Chunk-size, Chunk-ext (if any) and CRLF
while (Chunk-size > 0) {
Read Chunk-data and CRLF
Append Chunk-data to Entity-body
Length: = length + chunk-size
Read Chunk-size and CRLF
}
Read Entity-header
while (Entity-header not empty) {
Append Entity-header to existing header fields
Read Entity-header
}
Content-length: = Length
Remove "chunked" from transfer-encoding
Finally provide a section of PHP version of the chunked decoding code:
$chunk _size = (integer) hexdec (fgets ($socket _fd, 4096));
while (!feof ($socket _fd) && $chunk _size > 0) {
$bodyContent. = Fread ($socket _fd, $chunk _size);
Fread ($socket _fd, 2); skip/r/n
$chunk _size = (integer) hexdec (fgets ($socket _fd, 4096));
}
Parsing of tranfer-encoding:chunked encoding in HTTP protocol
In general, the value of the transfer-encoding domain should be chunked, indicating that the Chunked encoding method is used to transmit the newspaper style. Chunked encoding is a coding method defined in the http/1.1 RFC, so all http/1.1 applications should support this approach.
The basic method of chunked encoding is to decompose large chunks of data into small chunks of data, each of which can be specified as a length, in the following format (BNF grammar):
Chunked-body = *chunk//0 at most chunk
Last-chunk//Last Chunk
Trailer//Tail
CRLF//END tag
Chunk = chunk-size [Chunk-extension] CRLF
Chunk-data CRLF
Chunk-size = 1*hex
Last-chunk = 1* ("0") [Chunk-extension] CRLF
chunk-extension= * (";" chunk-ext-name ["=" chunk-ext-val])
Chunk-ext-name = Token
Chunk-ext-val = Token | Quoted-string
Chunk-data = chunk-size (octet)
Trailer = * (Entity-header CRLF)
Explain:
Chunked-body indicates the style of the newspaper after the chunked code. Newspaper style can be divided into chunk, Last-chunk,trailer and Terminator four parts. The number of chunk can be at least 0 in literary style, no upper limit; the length of each chunk is specified, that is, the starting data must be a string of 16 digits, representing the length of the back Chunk-data (the number of bytes). The first character of this 16 binary string, if it is "0", indicates that Chunk-size is 0, the chunk is last-chunk, and there is no chunk-data part. The optional chunk-extension is determined by the communicating parties themselves and can be ignored if the receiver does not understand its meaning.
Trailer is an additional header field appended to the tail, usually containing metadata (metadata, meta means "about information"), which can be appended to the existing header field after decoding.
Example Analysis:
The following is an analysis of the results of using Firefox to communicate with a Web site with ethereal (start with a scratch field terminator):
Address 0 ................. ... F
000C0 31
000D0 is 0d 0a ...//ASCII code: 1ffc/r/n, Chunk-data data start address is 000d5., .....
Obviously, "1FFC" is the first chunk chunk-size, converted to int to 8188. Since the 1FFC is immediately
CRLF, so there is no chunk-extension.chunk-data starting address for 000d5, the calculation of the beginning of the next chunk
The address is 000D5+1FFC + 2=020d3, as follows:
020d0.. 0d 0a to $0d 0a ...//ASCII code:/r/n1ffc/r/n
The previous 0d0a is the end tag of the previous chunk, and the latter 0d0a is the chunk-size and chunk-data separator.
The length of this block chunk is also 8188, and so on, until the last piece
100e0 0d 0a 31
100f0 0d 0a ...//asii code:/r/n/1ea9/r/n
This block length is 0x1ea9 = 7849, the next one starts with 100f5 + 1ea9 + 2 = 11fa0, as follows:
100a0 0d 0a 0d 0a//ascii code: 0/r/n/r/n
"0" indicates that the current chunk is last-chunk, and the first 0d 0a is the chunk terminator. The second 0d0a shows no trailer part, and the entire chunk-body ends.
Decoding process:
The purpose of decoding the chunked encoding is to restore the chunk-data integration of the blocks into a piece as a report style, while recording the length of the block.
The decoding process included with the RFC2616 is as follows: (pseudo code)
Length: = 0//Long counter 0
Read Chunk-size, chunk-extension (if any) and CRLF//reading chunk-size, chunk-extension
and CRLF
while (Chunk-size > 0) {//indicates that it is not last-chunk
Read Chunk-data and CRLF/reading chunk-size size Chunk-data,skip CRLF
Append chunk-data to Entity-body//append this block chunk-data to Entity-body
Read chunk-size and CRLF//reading new chunk chunk-size and CRLF
}
The read Entity-header//entity-header format is NAME:VALUECRLF, and if it is null, only CRLF
while (Entity-header not empty)//That is, not just the empty line of CRLF
{
Append Entity-header to existing header fields
Read Entity-header
}
Content-length:=length//The new report style Length calculated after the whole decoding process is completed
Writes the value of a content-length field into a message
Remove "chunked" from transfer-encoding//At the same time removing chunked this tag from the transfer-encoding domain value
The last value of length is actually the sum of the chunk-size of all chunk, and in the above example, there is a total of eight chunk-size 0X1FFC (8188) chunk, and the remaining one is 0x1ea9 (7849), which adds up to 73353 bytes.
Note: For the previous example, the first few chunk are 8188, probably because: "1FFC" 4 bytes, "/r/n" 2 bytes, plus the end of a "/r/n" 2 bytes, a total of 8 bytes, so a chunk whole is 8196, It may be the same cache size that TCP sends at the end of the send.