Problem:
The answer to the question is immediately displayed to you!
GZIP format RFC 1952 http://www.ietf.org/rfc/rfc1952.txt
Deflate format RFC 1951 http://www.ietf.org/rfc/rfc1951.txt
Zlib development database http://www.zlib.net/manual.html
Find Gzip
The encoding type returned by retrieving a webpage data is gzip. How can I decompress it?
HTTP header?
How to use VB to obtain XML files on the network and parse the content
About gzip Decoding
How to decompress Gzip
Can wininet perform HTTP transmission over gzip, especially during post! How can this problem be achieved?
Why do I have to save the file before using zlib to decompress HTTP data packets in GZIP format? Memory decompression Error
Unable to get $ _ server ["http_referer"]
For gzip problems, solve the high score !!
.....
And so on.
Problem:
Extract the HTTP gzip content and decompress it.
Key points:
1. Extract the content of HTTP packets, mainly in GZIP format
2 Data Packet restructuring
3. Decompress gzip data in the memory
I have been online over the past two weeks. I am very grateful for the help I have received from some netizens. To prevent this problem from continuing to plague the un_gziper later, I will close up this article.
1 data packet memory extraction:
the key point is to find the starting position of gzip memory and how to determine the size of gzip content.
Start position: "content-encoding: gzip \ r \ n "
gzip size:" Content-Length: "followed by
2 data packet restructuring, generally, the content of a webpage is rarely packed into one data packet. Therefore, you have to perform gzip and then use multiple data packets for transmission.
the key aspect is:
the ACK and seq of the GET request data packet are closely related to the ACK and seq of the HTTP returned data packet:
example:
GET request: ACK = 0, SEQ = 0
http1: seq = 0, ACK = 584
http2: seq = 1420, ACK = 584
...
simple analysis shows that our algorithm Design:
first, get the ACK of the GET request. The seq of the returned data packet is equal to this value. At the same time, write down the ACK of this data packet. The ack of the HTTP data packet sent in the next packet is this value, this is one of the key points. At the same time, you can combine
Content-Length to get all the content of gzip.
at this point, the raw data has been extracted, how to decompress the file
3. Decompress gzip
after completing step 1, save the content to the file and run the gzip command to open the file, the data integrity is verified.
then I used the uncompress function provided by zlib. Like most netizens, I made a fatal mistake and did not carefully read the zlib document! Leading to unnecessary identification again and again!
In fact, the zlib and gzib formats are different, while uncompress is used to decompress files in zlib format. This is why compress functions are used to compress data, you can use the uncompress function in the memory to decompress the data, but you cannot decompress the gzip data!
Later I tested the example in the zlib package. I have a little understanding of zlib. I should use the inflate class function to decompress it!
Of course, this problem occurs. The format is incorrect!
Post posted on the Internet: the GZIP format cannot use the inflate function. You must use inflateinit2 (& STRM, 47 );!!!!!!!!!!!!!!!!!!
Solve the problem!
Here we borrow the Netizen'sSource codeAnd thank him!
Int inflate_read (char * Source, int Len, char ** DEST, int gzip)
{
Int ret;
Unsigned have;
Z_stream STRM;
Unsigned char out [chunk];
Int totalsize = 0;
/* Allocate inflate state */
STRM. zarloc = z_null;
STRM. zfree = z_null;
STRM. opaque = z_null;
STRM. avail_in = 0;
STRM. next_in = z_null;
If (gzip)
Ret = inflateinit2 (& STRM, 47 );
Else
Ret = inflateinit (& STRM );
If (Ret! = Z_ OK)
Return ret;
STRM. avail_in = Len;
STRM. next_in = source;
/* Run inflate () on input until output buffer not full */
Do {
STRM. avail_out = chunk;
STRM. next_out = out;
Ret = inflate (& STRM, z_no_flush );
Assert (Ret! = Z_stream_error);/* State not clobbered */
Switch (RET ){
Case z_need_dict:
Ret = z_data_error;/* and fall through */
Case z_data_error:
Case z_mem_error:
Inflateend (& STRM );
Return ret;
}
Have = chunk-STRM. avail_out;
Totalsize + = have;
* DEST = realloc (* DEST, totalsize );
Memcpy (* DEST + totalsize-have, out, have );
} While (STRM. avail_out = 0 );
/* Clean up and return */
(Void) inflateend (& STRM );
Return ret = z_stream_end? Z_ OK: z_data_error;
}
Note that this process is painful, but happy!
Here I am launching a small initiative for all netizens who are concerned about gzip decompression:
We all encountered a problem that zlib documents are all in English. Some netizens have translated a small part of the beginning, but this is not enough!
Therefore, I hope that anyone who is interested can help translate zlib documents into Chinese!
If you are interested, add me!