Parsing of chunked encoding in HTTP1.1

Source: Internet
Author: User
Tags flush fread goto rfc socket

Original address: http://blog.csdn.net/zhangboyj/article/details/6236780

---------

Analysis of chunked encoding in HTTP1.1

In general HTTP communication, Content-length header information is used to inform the user agent (usually the browser) of the length of the document content sent by the server, which is defined in the HTTP1.0 protocol RFC 1945 10.4 chapters. After the browser receives this header information, it begins parsing the page after accepting the length byte defined in Content-length, but if the service side has some data delay to send, then the browser screen, which results in a rather bad user experience.

The solution is the transfer-encoding:chunked header information defined in the 14.41 chapters of RFC 2616 in the HTTP1.1 protocol, chunked encoding is defined in 3.6.1, and all HTTP1.1 applications support this Use trunked encoding to provide a dynamic way of providing the length of the body content. HTTP data for chunked encoded transmission is set at the message header:transfer-encoding:chunked indicates that the content body will transmit the contents using chunked encoding. By definition, the browser does not need to wait until the content byte is all downloaded, as long as it receives a chunked block to parse the page. And you can download the content of the page defined in HTML, including Js,css,image.

There are two options for using chunked encoding, one is to set the server IO buffer length so that the server automatically flush the contents of the buffer, the other is to manually invoke the flush function in IO. Flush functions are available in different language IO:

L Php:ob_flush (); Flush ();

L Perl:stdout->autoflush (1);

L Java:out.flush ();

L Python:sys.stdout.flush ()

L Ruby:stdout.flush

Using the HTTP1.1 transfer-encoding:chunked, and the IO buffer flush down, so that the browser to download the page earlier supporting resources. When it is not possible to determine the length of the newspaper in advance, it is impossible to include the Content-length field in the head to indicate the length of the newspaper, then the length of the newspaper should be determined by the transfer-encoding domain.

The chunked encoding is typically concatenated with several chunk, and ends with a chunk marked with a length of 0. Each chunk is divided into the head and the body two parts, the head content Specifies the total number of characters of the next body (not 0 hexadecimal digits) and the quantity unit (generally not written, the byte). The body part is the actual content of the specified length, separated by a carriage return line (CRLF) between the two parts. In the last chunk of length 0, the content is called footer, and is some additional header information (which can often be ignored directly).

This explanation is too official, in short, the basic method of chunked encoding is to decompose large chunks of data into small chunks of data, each of which can be of a specified length, in the following format (BNF grammar):

Chunked-body = *chunk//0 at most chunk

Last-chunk//Last Chunk

Trailer//Tail

CRLF//END tag

Chunk = chunk-size [Chunk-extension] CRLF

Chunk-data CRLF

Chunk-size = 1*hex

Last-chunk = 1* ("0") [Chunk-extension] CRLF

chunk-extension= * (";" chunk-ext-name ["=" chunk-ext-val])

Chunk-ext-name = Token

Chunk-ext-val = Token | Quoted-string

Chunk-data = chunk-size (octet)

Trailer = * (Entity-header CRLF)

Explain:

L Chunked-body indicates the style of the newspaper after chunked coding. Newspaper style can be divided into chunk, Last-chunk,trailer and Terminator four parts. The number of chunk can be at least 0 in the literary style, no upper limit;

L The length of each chunk is specified, that is, the starting data must be a string of 16 digits, representing the length of the back Chunk-data (the number of bytes). The first character of this 16 binary string, if it is "0", indicates that Chunk-size is 0, the chunk is last-chunk, and there is no chunk-data part.

The optional chunk-extension is determined by the communicating parties themselves and can be ignored if the receiver does not understand its meaning.

L trailer is an additional header field appended to the tail, usually containing some metadata (metadata, meta means "about information"), which can be appended to the existing header field after decoding

The following is an analysis of the results of using Firefox to communicate with a Web site with ethereal (start with a scratch field terminator):

Address 0 ................. ... F

000C0 31

000D0 is 0d 0a ...//ASCII code: 1ffc/r/n, Chunk-data data start address is 000d5., .....

Obviously, "1FFC" is the first chunk chunk-size, and the conversion to int is 8188. Since 1FFC, immediately is CRLF, so there is no chunk-extension. The starting address for the Chunk-data is 000d5, which indicates the beginning of a chunk

The address is 000D5+1FFC + 2=020d3, as follows:

020d0.. 0d 0a to $0d 0a ...//ASCII code:/r/n1ffc/r/n

The previous 0d0a is the end tag of the previous chunk, and the latter 0d0a is the chunk-size and chunk-data separator.

The length of this block chunk is also 8188, and so on, until the last piece

100e0 0d 0a 31

100f0 0d 0a ...//asii code:/r/n/1ea9/r/n

This block length is 0x1ea9 = 7849, the next one starts with 100f5 + 1ea9 + 2 = 11fa0, as follows:

11fa0 0d 0a 0d 0a//ascii code: 0/r/n/r/n

"0" indicates that the current chunk is last-chunk, and the first 0d 0a is the chunk terminator. The second 0d0a shows no trailer part, and the entire chunk-body ends.

Decoding process:

The purpose of decoding the chunked encoding is to restore the chunk-data integration of the blocks into a piece as a report style, while recording the length of the block.

The decoding process included with the RFC2616 is as follows: (pseudo code)

Length: = 0//Long counter 0

Read Chunk-size, chunk-extension (if any) and CRLF//reading chunk-size, chunk-extension and CRLF

while (Chunk-size > 0)

{//Indicate not Last-chunk

Read Chunk-data and CRLF/reading chunk-size size Chunk-data,skip CRLF

Append chunk-data to Entity-body//append this block chunk-data to Entity-body

Length: = length + chunk-size

Read chunk-size and CRLF//reading new chunk chunk-size and CRLF

}

The read Entity-header//entity-header format is NAME:VALUECRLF, and if it is null, only CRLF

while (Entity-header not empty)//That is, not just the empty line of CRLF

{

Append Entity-header to existing header fields

Read Entity-header

}

Content-length:=length//The new report style Length, computed after the completion of the decoding process, is written as the value of the Content-length field.

Remove "chunked" from transfer-encoding//At the same time removing chunked this tag from the transfer-encoding domain value

The last value of length is actually the sum of the chunk-size of all chunk, and in the above example, there is a total of eight chunk-size 0X1FFC (8188) chunk, and the remaining one is 0x1ea9 (7849), which adds up to 73353 bytes.
Note: For the previous example, the first few chunk are 8188, probably because: "1FFC" 4 bytes, "" R "N" 2 bytes, plus the end of the block one "" R "N" 2 bytes altogether 8 bytes, so a chunk whole is 8196, It may be the same cache size that TCP sends at the end of the send.

Finally provide a section of PHP version of the chunked decoding code:

$chunk _size = (integer) hexdec (fgets ($socket _fd, 4096));

while (!feof ($socket _fd) && $chunk _size > 0)

{

$bodyContent. = Fread ($socket _fd, $chunk _size);

Fread ($socket _fd, 2); skip/r/n
$chunk _size = (integer) hexdec (fgets ($socket _fd, 4096));

}

 

The decoding of its C language is as follows, Java thinking the same

int nbytes;

char* Pstart = A; A to store the data to be decoded

char* ptemp;

Char strlength[10]; The length of a chunk block

Chunk:ptemp =strstr (Pstart, "/r/n");

if (null==ptemp)

{

Free (a);

A=null;

Fclose (FP);

return-1;

}

Length=ptemp-pstart;

Copy_string (strlength,pstart,length);

pstart=ptemp+2;

Nbytes=hex2int (strlength); Gets the length of a block and converts it to decimal

if (nbytes==0)//If the length is 0 indicates the last chunk

{

Free (a);

Fclose (FP);

return 0;

}

Fwrite (Pstart,sizeof (char), NBYTES,FP);//write nbytes length of data to a file

pstart=pstart+nbytes+2; Skips over a block of data and a two-byte terminator after the data

Fflush (FP);

Goto Chunk; Goto to chunk continue processing

How to convert a decimal number to hexadecimal

Char *buf = (char *) malloc (100);

char *d = BUF;

int shift = 0;

unsigned long copy = 123445677;

while (copy) {

Copy >>= 4;

shift++;

}//first calculates the number of digits converted to 16

if (shift = 0)

shift++;

Shift <<= 2; Multiply the number of digits by 4 and shift to 8 if there are two digits

while (Shift > 0) {

shift = 4;

* (BUF) = hex_chars[(123445677 >> Shift) & 0x0f];

buf++;

}

*buf = '/0 ';


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.