After Chunk and gzip extract the message-body data in the HTTP message, the next step is to process the data. My method is to save the data as a file and then process it properly.
Most of the object data transmitted over HTTP is compressed and transmitted. Therefore, the data we get is not directly HTML text files, but compressed/encoded, so here we have to have a decoding process. There are two types of encoding involved in http: one class of compression encoding, mainly about object data compression, which aims to compress the object data volume; the other class of encoding is about transmission encoding, it is mainly based on the security and reliability of data transmission. The preceding two types of encoding HTTP protocol are described in the message. In the header of the HTTP Response Message, when transfer-encoding: chunked exists, the data is processed in blocks, when content-encoding: gzip, the object data is compressed according to the gzip specification. Refer to the HTTP protocol description http://www.w3.org/Protocols/rfc2616/rfc2616.html
In addition, it should be noted that gzip and chunked are implemented first (or how do they coordinate the work )? The answer is 1. First Use gzip to compress the original object data (here it is an HTML text file), 2. Then use chunked to block the compressed data.
The following is how to implement Chunk and gzip decoding, reference from http://nblive99.spaces.live.com/blog/cns! 74a0072781b23dfb! 130. Entry
Some encoding problems were encountered when reorganizing the TCPIP protocol stack data packets, mainly Chunk and gzip encoding.First look at Chunk:Definition of chunked in rfc2616:
Chunked-Body = * chunk
Last-chunk
Trailer
CRLF
Chunk = chunk-size [chunk-extension] CRLF
Chunk-data CRLF
Chunk-size = 1 * hex
Last-chunk = 1 * ("0") [chunk-extension] CRLF
Chunk-extension = * (";" chunk-ext-name ["=" chunk-ext-Val])
Chunk-ext-name = token
Chunk-ext-val = token | quoted-string
Chunk-Data = chunk-size (octet)
Trailer = * (entity-header CRLF)
The following are the pseudo decoding process Code :
Length: = 0 // The length of the decoded data body.
Read chunk-size, chunk-extension (if any) and CRLF // size of the first read Block
While (chunk-size> 0) {// keep repeating until the size of the read block is 0
Read chunk-data and CRLF // read the block data body and press enter to finish
Append Chunk-data to entity-body // Add the block data body to the decoded Object Data
Length: = Length + chunk-size // update the decoded object Length
Read chunk-size and CRLF // read the new block size
}
Read entity-header // The following Code reads all header tags
While (entity-header not empty ){
Append entity-header to existing header fields
Read entity-Header
}
Content-Length: = length // Add content length to the header
Remove "chunked" from transfer-encoding // The logic for removing the transfer-encoding pseudo code from the header flag is a bit confusing. After studying the logic, I wrote the C language decoding code: /// // char * unchunk (char * filename)
{
Char cmdbuf [1024];
/* If (strstr (filename, ". Trunk") = 0)
{
Strcat (filename, ". Trunk ");
Memset (cmdbuf, 0x0, sizeof (tmpfile ));
Sprintf (cmdbuf, "Mv % S % s", chunkfile, filename );
System (cmdbuf );
} */File * fp = fopen (filename, "AB + ");
Char newfile [128];
Memset (newfile, 0x0, sizeof (tmpfile ));
Strcpy (newfile, filename );
Char * PTR = strstr (newfile, ". Trunk ");
* PTR = 0;
Printf ("% s \ n", newfile); file * fp_unchunk = fopen (newfile, "WB +"); char chunk_head [8];
Memset (chunk_head, 0x0, sizeof (chunk_head ));
Fgets (chunk_head, sizeof (chunk_head), FP );
Char * P = strstr (chunk_head, "\ r \ n"); If (P)
{
Int chunk_size = strtol (chunk_head, null, 16 );
Char * chunk_data;
While (chunk_size> 0)
{
Chunk_data = (char *) malloc (chunk_size );
Memset (chunk_data, 0x0, chunk_size );
Fread (chunk_data, chunk_size, 1, FP );
Fwrite (chunk_data, chunk_size, 1, fp_unchunk); fseek (FP, 2, seek_cur); // reread chunk head
Memset (chunk_head, 0x0, sizeof (chunk_head ));
Fgets (chunk_head, sizeof (chunk_head), FP );
Char * P = strstr (chunk_head, "\ r \ n ");
If (P)
{
Chunk_size = strtol (chunk_head, null, 16 );
Free (chunk_data );
}
Else
Break;
} // Remove old file
Memset (cmdbuf, 0x0, sizeof (cmdbuf ));
Sprintf (cmdbuf, "RM % s", filename );
System (cmdbuf); fclose (fp_unchunk );
Fclose (FP); Return newfile;
}
Else
{
Fclose (fp_unchunk );
Fclose (FP); Return filename;
}
}////////////////////////////////////// Next, let's look at the decoding of gzip. The decoding of Gzip is relatively simpler. There are two methods to achieve this: one is to directly call the system gzip command for decompression without technical content; the other is to use the zlib library for higher versatility, however, to use the zlib library, the development process is a little complicated. The following shows the C code for extracting the GZIP file: //// // call the system gzip command code (no difficulty) void ungzip (char * filename)
{
Char cmdbuf [1024]; If (strstr (filename, ". GZ") = 0)
{
Memset (cmdbuf, 0x0, sizeof (cmdbuf ));
Sprintf (cmdbuf, "Mv % s g0s.gz", filename, filename );
System (cmdbuf );
} Memset (cmdbuf, 0x0, sizeof (cmdbuf ));
Sprintf (cmdbuf, "gzip-D % s", filename );
System (cmdbuf );
} // Use the zlib library code # include "zlib/zlib. H" Void uncompresstorrent (char * SRC, char * DST)
{
Gzfile * gzfp = gzopen (SRC, "rb ");
File * fp = fopen (DST, "WB ");
Char in [chunk];
Int retlen =-1; while (0! = (Retlen = gzread (gzfp, In, Chunk )))
{
Fwrite (in, 1, retlen, FP );
}
Gzclose (gzfp );
Fclose (FP );
} (Compile with the-lzlib-LZ parameter) /////////////////////////////////