PHP decompression can sometimes fail

Source: Internet
Author: User
Tags crc32 crc32 checksum rfc unpack
When collecting data from a Web site, the document that is returned is chunked encoded, gzip compressed, and the server of the Web site appears to be IIS, ...

Decoding chunked is fine, but when extracting gzip compressed documents, it occasionally fails, which affects me to extract the next set of request connections ...

Unzip the 10 groups around, there will be decompression failure situation.

This is the data before the decompression:



Extracted data:


Obviously in the last group, the decompression failed.

Here are three groups of methods that have been tried:
Private Function _decompressdata ()   {       if ($this->is_gzip) {          $this->response_body =  gzinflate ( substr ($this->response_body,10));           //           if ($temp = Gzdecode ($this->response_body)) {//               $this->response_body = $temp;/           } else {//              $this->response_body =  $this->mygzdecode ($this->response_body);/                     /}//$this Response_body =  $this->mygzdecode ($this->response_body);         $this->response_body = Gzdecode ($this->response_body);       }   }


The Mygzdecode function is this one

 /** * @desc Custom decompression function */function Mygzdecode ($data, & $filename = ", & $error =", $maxlength = null) {        $len = strlen ($data);            if ($len < | | strcmp (substr ($data, 0, 2), "\x1f\x8b")) {$error = "not in GZIP format.";  return null;  Not GZIP format (see RFC 1952)} $method = Ord (substr ($data, 2, 1));  Compression Method $flags = Ord (substr ($data, 3, 1));            Flags if ($flags &! = $flags) {$error = "Reserved bits not allowed.";        return null;        }//Note: $mtime may negative (PHP integer limitations) $mtime = Unpack ("V", substr ($data, 4, 4));        $mtime = $mtime [1];        $XFL = substr ($data, 8, 1);        $os = substr ($data, 8, 1);        $headerlen = 10;        $extralen = 0;        $extra = ""; if ($flags & 4) {//2-byte length prefixed EXTRA data in header if ($len-$headerlen-2 < 8) {REturn false;            Invalid} $extralen = unpack ("V", substr ($data, 8, 2));            $extralen = $extralen [1];  if ($len-$headerlen-2-$extralen < 8) {return false;            Invalid} $extra = substr ($data, ten, $extralen);        $headerlen + = 2 + $extralen;        } $filenamelen = 0;        $filename = ""; if ($flags & 8) {//C-style string if ($len-$headerlen-1 < 8) {return FA Lse            Invalid} $filenamelen = Strpos (substr ($data, $headerlen), Chr (0));            if ($filenamelen = = = False | | $len-$headerlen-$filenamelen-1 < 8) {return false;//invalid            } $filename = substr ($data, $headerlen, $filenamelen);        $headerlen + = $filenamelen + 1;        } $commentlen = 0;        $comment = ""; if ($flags &) {//C-style string COMMENT data in HeaDer if ($len-$headerlen-1 < 8) {return false;            Invalid} $commentlen = Strpos (substr ($data, $headerlen), Chr (0));    if ($commentlen = = = False | | $len-$headerlen-$commentlen-1 < 8) {return false;            Invalid header Format} $comment = substr ($data, $headerlen, $commentlen);        $headerlen + = $commentlen + 1;        } $HEADERCRC = "";  if ($flags & 2) {//2-bytes (lowest order) of CRC32 on header present if ($len-$headerlen-2    < 8) {return false;            Invalid} $CALCCRC = Crc32 (substr ($data, 0, $headerlen)) & 0xFFFF;            $HEADERCRC = Unpack ("V", substr ($data, $headerlen, 2));            $HEADERCRC = $HEADERCRC [1];                if ($HEADERCRC! = $CALCCRC) {$error = "Header checksum failed.";    return false;    Bad Header CRC}        $headerlen + = 2;        }//GZIP FOOTER $DATACRC = Unpack ("V", substr ($data,-8, 4));        $DATACRC = sprintf ('%u ', $DATACRC [1] & 0xFFFFFFFF);        $isize = Unpack ("V", substr ($data,-4));        $isize = $isize [1];        Decompression: $bodylen = $len-$headerlen-8;            if ($bodylen < 1) {//implementation bug!        return null;        } $body = substr ($data, $headerlen, $bodylen);        $data = ""; if ($bodylen > 0) {switch ($method) {case 8://Currently the only Suppo                    RTed Compression method: $data = Gzinflate ($body, $maxlength);                Break                    Default: $error = "Unknown compression method.";            return false;        }}//Zero-byte body content is allowed//verifiy CRC32 $CRC = sprintf ("%u", CRC32 ($data));      $crcOK = $CRC = = $DATACRC;  $lenOK = $isize = = strlen ($data); if (! $lenOK | |! $crcOK) {$error = ($lenOK?) ': ' Length check FAILED. ') . ($crcOK?            ': ' Checksum FAILED. ');        return false;    } return $data; }



In other words, when the decompression is continuous, there will be a case of decompression failure


Reply to discussion (solution)

PHP has provided the Gzdecode function
If your PHP version is really low, there is no Gzdecode function
Then the PHP code-level Gzdecode function is

function Gzdecode ($data) {$len = strlen ($data);  if ($len < | | strcmp (substr ($data, 0,2), "\x1f\x8b") {return $data;  Not GZIP format (see RFC 1952)} $method = Ord (substr ($data, 2, 1));  Compression Method $flags = Ord (substr ($data, 3, 1));   Flags if ($flags &! = $flags) {//Reserved bits is set--not allowed by RFC 1952 return data;   }//Note: $mtime may negative (PHP integer limitations) $mtime = Unpack ("V", substr ($data, bis));   $mtime = $mtime [1];   $XFL = substr ($data, 8, 1);   $os = substr ($data, 8, 1);   $headerlen = 10;   $extralen = 0;   $extra = ""; if ($flags & 4) {//2-byte length prefixed EXTRA data in header if ($len-$headerlen-2 < 8) {RET    Urn false;     Invalid format} $extralen = Unpack ("V", substr ($data, 8,2));     $extralen = $extralen [1];    if ($len-$headerlen-2-$extralen < 8) {return false; Invalid format} $extra = substr ($data, ten, $extrAlen);   $headerlen + = 2 + $extralen;   } $filenamelen = 0;   $filename = ""; if ($flags & 8) {//C-style string file NAME data in header if ($len-$headerlen-1 < 8) {return    False     Invalid format} $filenamelen = Strpos (substr ($data, 8+ $extralen), Chr (0));    if ($filenamelen = = = False | | $len-$headerlen-$filenamelen-1 < 8) {return false;     Invalid format} $filename = substr ($data, $headerlen, $filenamelen);   $headerlen + = $filenamelen + 1;   } $commentlen = 0;   $comment = ""; if ($flags &) {//C-style string COMMENT data in header if ($len-$headerlen-1 < 8) {return F    Alse;     Invalid format} $commentlen = Strpos (substr ($data, 8+ $extralen + $filenamelen), Chr (0));    if ($commentlen = = = False | | $len-$headerlen-$commentlen-1 < 8) {return false;     Invalid header Format} $comment = substr ($data, $headerlen, $commentlen); $headerlen + = $commeNtlen + 1;   } $HEADERCRC = "";       if ($flags & 1) {//2-bytes (lowest order) of CRC32 on header present if ($len-$headerlen-2 < 8) {    return false;     Invalid format} $CALCCRC = Crc32 (substr ($data, 0, $headerlen)) & 0xFFFF;     $HEADERCRC = Unpack ("V", substr ($data, $headerlen, 2));     $HEADERCRC = $HEADERCRC [1];    if ($HEADERCRC! = $CALCCRC) {return false;   Bad Header CRC} $headerlen + = 2;   }//GZIP Footer-these is negative due to PHP ' s limitations $DATACRC = Unpack ("V", substr ($data, -8,4));   $DATACRC = $DATACRC [1];   $isize = Unpack ("V", substr ($data,-4));   $isize = $isize [1];   Perform the decompression: $bodylen = $len-$headerlen-8;     if ($bodylen < 1) {//This should never happen-implementation bug!   return null;   } $body = substr ($data, $headerlen, $bodylen);   $data = ""; if ($bodylen > 0) {switch ($method) {case 8://Currently the only supported compression method: $data = Gzinflate ($body);       Break     Default://Unknown compression method return false;     }} else {//I ' m not sure if zero-byte body content is allowed.  Allow it for now ...   Do nothing ...} verifiy decompressed size and CRC32://Note:this may fail with large data sizes depending on how//PHP ' s I   Nteger limitations affect strlen () since $isize//May is negative for large sizes.  if ($isize! = strlen ($data) | | CRC32 ($data)! = $DATACRC) {//Bad format!     Length or CRC doesn ' t match!   return false; } return $data; }

Compare yourself to see if you copied the wrong

Since the function returns False when the pass-in length and the CRC32 checksum fail, then you should judge the next step of the work

PHP has provided the Gzdecode function
If your PHP version is really low, there is no Gzdecode function
Then the PHP code-level Gzdecode function is

function Gzdecode ($data) {$len = strlen ($data);  if ($len < | | strcmp (substr ($data, 0,2), "\x1f\x8b") {return $data;  Not GZIP format (see RFC 1952)} $method = Ord (substr ($data, 2, 1));  Compression Method $flags = Ord (substr ($data, 3, 1));   Flags if ($flags &! = $flags) {//Reserved bits is set--not allowed by RFC 1952 return data;   }//Note: $mtime may negative (PHP integer limitations) $mtime = Unpack ("V", substr ($data, bis));   $mtime = $mtime [1];   $XFL = substr ($data, 8, 1);   $os = substr ($data, 8, 1);   $headerlen = 10;   $extralen = 0;   $extra = ""; if ($flags & 4) {//2-byte length prefixed EXTRA data in header if ($len-$headerlen-2 < 8) {RET    Urn false;     Invalid format} $extralen = Unpack ("V", substr ($data, 8,2));     $extralen = $extralen [1];    if ($len-$headerlen-2-$extralen < 8) {return false; Invalid format} $extra = substr ($data, ten, $extrAlen);   $headerlen + = 2 + $extralen;   } $filenamelen = 0;   $filename = ""; if ($flags & 8) {//C-style string file NAME data in header if ($len-$headerlen-1 < 8) {return    False     Invalid format} $filenamelen = Strpos (substr ($data, 8+ $extralen), Chr (0));    if ($filenamelen = = = False | | $len-$headerlen-$filenamelen-1 < 8) {return false;     Invalid format} $filename = substr ($data, $headerlen, $filenamelen);   $headerlen + = $filenamelen + 1;   } $commentlen = 0;   $comment = ""; if ($flags &) {//C-style string COMMENT data in header if ($len-$headerlen-1 < 8) {return F    Alse;     Invalid format} $commentlen = Strpos (substr ($data, 8+ $extralen + $filenamelen), Chr (0));    if ($commentlen = = = False | | $len-$headerlen-$commentlen-1 < 8) {return false;     Invalid header Format} $comment = substr ($data, $headerlen, $commentlen); $headerlen + = $commeNtlen + 1;   } $HEADERCRC = "";       if ($flags & 1) {//2-bytes (lowest order) of CRC32 on header present if ($len-$headerlen-2 < 8) {    return false;     Invalid format} $CALCCRC = Crc32 (substr ($data, 0, $headerlen)) & 0xFFFF;     $HEADERCRC = Unpack ("V", substr ($data, $headerlen, 2));     $HEADERCRC = $HEADERCRC [1];    if ($HEADERCRC! = $CALCCRC) {return false;   Bad Header CRC} $headerlen + = 2;   }//GZIP Footer-these is negative due to PHP ' s limitations $DATACRC = Unpack ("V", substr ($data, -8,4));   $DATACRC = $DATACRC [1];   $isize = Unpack ("V", substr ($data,-4));   $isize = $isize [1];   Perform the decompression: $bodylen = $len-$headerlen-8;     if ($bodylen < 1) {//This should never happen-implementation bug!   return null;   } $body = substr ($data, $headerlen, $bodylen);   $data = ""; if ($bodylen > 0) {switch ($method) {case 8://Currently the only supported compression method: $data = Gzinflate ($body);       Break     Default://Unknown compression method return false;     }} else {//I ' m not sure if zero-byte body content is allowed.  Allow it for now ...   Do nothing ...} verifiy decompressed size and CRC32://Note:this may fail with large data sizes depending on how//PHP ' s I   Nteger limitations affect strlen () since $isize//May is negative for large sizes.  if ($isize! = strlen ($data) | | CRC32 ($data)! = $DATACRC) {//Bad format!     Length or CRC doesn ' t match!   return false; } return $data; }




Mine is PHP 5.6,
Gzinflate (substr ($this->response_body,10));

Gzdecode ($this->response_body)

Mygzdecode ($this->response_body);

These three methods can be used, but all encounter the same problem, continuous decompression, there will be the problem of decompression failure.


Happy New Year, huh?

Compare yourself to see if you copied the wrong

Since the function returns False when the pass-in length and the CRC32 checksum fail, then you should judge the next step of the work



Good.

Data transmitted over the network, errors are unavoidable, but the probability is not high
Reread it, usually.

Basically, you have to have a fault-tolerant strategy

Compare yourself to see if you copied the wrong

Since the function returns False when the pass-in length and the CRC32 checksum fail, then you should judge the next step of the work




Verifiy CRC32
$CRC = sprintf ("%u", CRC32 ($data));
$crcOK = $CRC = = $DATACRC;
$lenOK = $isize = = strlen ($data);
if (! $lenOK | |! $crcOK) {
$this->status = ($lenOK? ': ' Length check FAILED. ') . ($crcOK? ': ' Checksum FAILED. ');
return false;
}
return $data;
Detected, is here the calibration failed ...


Initiating a request for link http://www.cnu.cc/works/111706
Length Check FAILED. Checksum FAILED.

Data transmitted over the network, errors are unavoidable, but the probability is not high
Reread it, usually.

Basically, you have to have a fault-tolerant strategy



Right... This place really needs to be strengthened ... Only a reset connection was made and no checksum was made for the integrity of the received data.

Data transmitted over the network, errors are unavoidable, but the probability is not high
Reread it, usually.

Basically, you have to have a fault-tolerant strategy




OK, 10 minutes of continuous acquisition, no problem ... THX, touch the big

Problems with the transmission process, resulting in some data is not, and decompression failed.

The need to unzip the file into the decompression list, every 5 seconds-10 seconds to determine whether the decompression file changes, such as no change, then decompression, decompression failed to mark, continue to the next decompression.

Problems with the transmission process, resulting in some data is not, and decompression failed.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.