To repair the corrupted gzip compressed file principle, and then refer to the gzip structure diagram:
650) this.width=650; "title=" Presentation 1 "style=" border-left-0px; border-right-width:0px; Background-image:none; border-bottom-width:0px; padding-top:0px; padding-left:0px; padding-right:0px; border-top-width:0px "border=" 0 "alt=" Presentation 1 "src=" http://s3.51cto.com/wyfs02/M00/57/49/ Wkiol1sxvvsigmotaaii5sfoxr0367.jpg "width=" 888 "height=" 397 "/>
As is known in the previous article, the key to repairing a corrupted gzip file is to find the starting point for the next normal compressed package. According to the information in the structure diagram, it is shown that the starting structure of each compressed package has the tail mark, the type of Huffman tree used, and the number of tree elements of 3 huffman trees. If there is a bad sector in the middle of a gzip file, to find a normal starting point after the bad sector, just shift the bitwise right and shift to a bit that can be decompressed properly, you may find the correct start of the compression packet. The compression job window of the gzip file calculates the 32KB size, and this traversal is not more than 64KB to find. Fast loops in memory can be quickly found, but there is a need for a clear method of judging the error.
The first thing to be clear is the end flag, which should be 0 (we are looking backwards from the broken point). Huffman tree type is also roughly the dynamic Huffman (0x02), the number of elements of CL1 should be a value of 257 to 286 (including the boundary), Cl2 the number of elements should be less than or equal to 30,CCL the number of elements of the value can be 1-15 (including the boundary).
In fact, can also refer to things have, untie the Huffman tree is abnormal, or through the rule of law to find the last value of 256 value, but these algorithms should be more cumbersome, there is the above algorithm to check several compressed block is sufficient.
The specific method is to modify the source code of gzip, to traverse. Due to the time relationship, no general engineering was made, and only some code was changed quickly. The approximate modification points are:
One, locate the damage point:
In the UNZIP.C,
Error ("Invalid compressed data--format violated");
Before this line, get the current decoded byte position.
Second, traverse to find the damage point:
1, inflate.c file, change
if (nl > 286 | | nd >)
if (nl > 286 | | | nd > 30| | NL <257 | | nd <1)
2. In the inflate.c file, in the int inflate_block (e) function
Before the following code
bb = b;
BK = k;
if ((t! = 2) | | (*e! = 0))
3. inflate.c file, in int inflate_block (e) function tail
The IF (t = = 0) and if (t = = 1) are returned directly to the error value 2.
4, inflate.c file, function int inflate (), change
if ((R = Inflate_block (&e))! = 0)
unsigned t; /* Block Type */
Register ULG b; /* Bit buffer */
Register unsigned k; /* Number of bits in bit buffer */
while (Inptr <= insize)
unsigned int tptr = inptr;
unsigned int tbk = BK;
unsigned long TBB = BB;
unsigned int twp = WP;
Long Long Tstart = * (Long long*) (Inbuf + tptr);
if ((R = Inflate_block (&e))! = 0)
Inptr = tptr;
bb = TBB;
BK = Tbk;
WP = TWP;
b = BB;
K = BK;
printf ("Get by www.sjhf.net!"); can also output TSTART,BB,BK value, reprint, please retain the copyright information: www.sjhf.net Yu
After this 4-step, try debugging the wrong. gz file, and, of course, you can also add a seek after interpreting the header structure in the code, and seek directly to the damaged location.
Typically, the output of printf ("Get by www.sjhf.net!") This line of code has found the correct starting bit.
After finding the starting bit, you can also construct or copy a normal gzip file header, and then splice the found bit stream, can be extracted. (If the bitstream is not byte-aligned, it is possible to do all of the displacements). After splicing a lot of compressed files can be opened even decompression, however, there may be error, mainly the tail of the checksum size error, in fact, can be ignored.
If the splicing is good under Linux, can not directly use "gzip–d" decompression, because of its CRC error, will cause the decompression to 99% error, and then delete the file, replaced by the pipeline command:
Gunzip < damaged.gz > Damaged
Methods for repairing corrupted GZ or tar.gz compressed files