Tcpip packet encoding parsing (Chunk and gzip) _ space of jialy _ Baidu Space

Source: Internet
Author: User

Tcpip packet encoding parsing (Chunk and gzip) _ space of jialy _ Baidu Space

After Chunk and gzip extract the message-body data in the HTTP message, the next step is to process the data. My method is to save the data as a file and then process it properly.

Most of the object data transmitted over HTTP is compressed and transmitted. Therefore, the data we get is not directly HTML text files, but compressed/encoded, so here we have to have a decoding process. There are two types of encoding involved in http: one class of compression encoding, mainly about object data compression, which aims to compress the object data volume; the other class of encoding is about transmission encoding, it is mainly based on the security and reliability of data transmission. The preceding two types of encoding HTTP protocol are described in the message. In the header of the HTTP Response Message, when transfer-encoding: chunked exists, the data is processed in blocks, when content-encoding: gzip, the object data is compressed according to the gzip specification. Refer to the HTTP protocol description http://www.w3.org/Protocols/rfc2616/rfc2616.html

In addition, it should be noted that gzip and chunked are implemented first (or how do they coordinate the work )? The answer is 1. First Use gzip to compress the original object data (here it is an HTML text file), 2. Then use chunked to block the compressed data.

The following is how to implement Chunk and gzip decoding, reference from http://nblive99.spaces.live.com/blog/cns! 74a0072781b23dfb! 130. Entry

Some encoding problems were encountered when reorganizing the TCPIP protocol stack data packets, mainly Chunk and gzip encoding.First look at Chunk:Definition of chunked in rfc2616:
Chunked-Body = * chunk
Last-chunk
Trailer
CRLF

Chunk = chunk-size [chunk-extension] CRLF
Chunk-data CRLF
Chunk-size = 1 * hex
Last-chunk = 1 * ("0") [chunk-extension] CRLF

Chunk-extension = * (";" chunk-ext-name ["=" chunk-ext-Val])
Chunk-ext-name = token
Chunk-ext-val = token | quoted-string
Chunk-Data = chunk-size (octet)
Trailer = * (entity-header CRLF)

The following are the pseudo decoding process Code :
Length: = 0 // The length of the decoded data body.
Read chunk-size, chunk-extension (if any) and CRLF // size of the first read Block
While (chunk-size> 0) {// keep repeating until the size of the read block is 0
Read chunk-data and CRLF // read the block data body and press enter to finish
Append Chunk-data to entity-body // Add the block data body to the decoded Object Data
Length: = Length + chunk-size // update the decoded object Length
Read chunk-size and CRLF // read the new block size
}
Read entity-header // The following Code reads all header tags
While (entity-header not empty ){
Append entity-header to existing header fields
Read entity-Header
}
Content-Length: = length // Add content length to the header
Remove "chunked" from transfer-encoding // The logic for removing the transfer-encoding pseudo code from the header flag is a bit confusing. After studying the logic, I wrote the C language decoding code: /// // char * unchunk (char * filename)
{
Char cmdbuf [1024];
/* If (strstr (filename, ". Trunk") = 0)
{
Strcat (filename, ". Trunk ");
Memset (cmdbuf, 0x0, sizeof (tmpfile ));
Sprintf (cmdbuf, "Mv % S % s", chunkfile, filename );
System (cmdbuf );
} */File * fp = fopen (filename, "AB + ");
Char newfile [128];
Memset (newfile, 0x0, sizeof (tmpfile ));
Strcpy (newfile, filename );
Char * PTR = strstr (newfile, ". Trunk ");
* PTR = 0;
Printf ("% s \ n", newfile); file * fp_unchunk = fopen (newfile, "WB +"); char chunk_head [8];
Memset (chunk_head, 0x0, sizeof (chunk_head ));
Fgets (chunk_head, sizeof (chunk_head), FP );
Char * P = strstr (chunk_head, "\ r \ n"); If (P)
{
Int chunk_size = strtol (chunk_head, null, 16 );
Char * chunk_data;
While (chunk_size> 0)
{
Chunk_data = (char *) malloc (chunk_size );
Memset (chunk_data, 0x0, chunk_size );
Fread (chunk_data, chunk_size, 1, FP );
Fwrite (chunk_data, chunk_size, 1, fp_unchunk); fseek (FP, 2, seek_cur); // reread chunk head
Memset (chunk_head, 0x0, sizeof (chunk_head ));
Fgets (chunk_head, sizeof (chunk_head), FP );
Char * P = strstr (chunk_head, "\ r \ n ");
If (P)
{
Chunk_size = strtol (chunk_head, null, 16 );
Free (chunk_data );
}
Else
Break;
} // Remove old file
Memset (cmdbuf, 0x0, sizeof (cmdbuf ));
Sprintf (cmdbuf, "RM % s", filename );
System (cmdbuf); fclose (fp_unchunk );
Fclose (FP); Return newfile;
}
Else
{
Fclose (fp_unchunk );
Fclose (FP); Return filename;
}
}////////////////////////////////////// Next, let's look at the decoding of gzip. The decoding of Gzip is relatively simpler. There are two methods to achieve this: one is to directly call the system gzip command for decompression without technical content; the other is to use the zlib library for higher versatility, however, to use the zlib library, the development process is a little complicated. The following shows the C code for extracting the GZIP file: //// // call the system gzip command code (no difficulty) void ungzip (char * filename)
{
Char cmdbuf [1024]; If (strstr (filename, ". GZ") = 0)
{
Memset (cmdbuf, 0x0, sizeof (cmdbuf ));
Sprintf (cmdbuf, "Mv % s g0s.gz", filename, filename );
System (cmdbuf );
} Memset (cmdbuf, 0x0, sizeof (cmdbuf ));
Sprintf (cmdbuf, "gzip-D % s", filename );
System (cmdbuf );
} // Use the zlib library code # include "zlib/zlib. H" Void uncompresstorrent (char * SRC, char * DST)
{
Gzfile * gzfp = gzopen (SRC, "rb ");
File * fp = fopen (DST, "WB ");
Char in [chunk];
Int retlen =-1; while (0! = (Retlen = gzread (gzfp, In, Chunk )))
{
Fwrite (in, 1, retlen, FP );
}
Gzclose (gzfp );
Fclose (FP );
} (Compile with the-lzlib-LZ parameter) /////////////////////////////////

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.