Php decompression sometimes fails to collect data from a website. the returned data is a chunked-encoded, gzip-compressed file. the server of this website is displayed as IIS ,...
It's okay to decode chunked, but it occasionally fails to decompress the gzip file, which affects the extraction of the next group of request connections...
Decompress the package to about 10 groups ..
Here is the data before decompression:
Decompressed data:
Obviously, in the last group, decompression failed ..
Here are three methods you have tried:
private function _deCompressData() { if($this->is_gzip) { $this->response_body = gzinflate(substr($this->response_body,10)); // // if($temp = gzdecode($this->response_body)) {// $this->response_body = $temp;// } else {// $this->response_body = $this->mygzdecode($this->response_body);// } //$this->response_body = $this->mygzdecode($this->response_body); // $this->response_body = gzdecode($this->response_body); } }
The mygzdecode function is
/*** @ Desc custom decompression function */function mygzdecode ($ data, & $ filename = '', & $ error ='', $ maxlength = null) {$ len = strlen ($ data); if ($ len <18 | strcmp (substr ($ data, 0, 2), "\ x1f \ x8b ")) {$ error = "Not in GZIP format. "; return null; // Not GZIP format (See RFC 1952)} $ method = ord (substr ($ data, 2, 1 )); // Compression method $ flags = ord (substr ($ data, 3, 1); // Flags if ($ flags & 31! = $ Flags) {$ error = "Reserved bits not allowed. "; return null;} // NOTE: $ mtime may be negative (PHP integer limitations) $ mtime = unpack (" V ", substr ($ data, 4, 4 )); $ mtime = $ mtime [1]; $ xfl = substr ($ data, 8, 1); $ OS = substr ($ data, 8, 1); $ headerlen = 10; $ extralen = 0; $ extra = ""; if ($ flags & 4) {// 2-byte length prefixed EXTRA data in header if ($ len-$ headerlen-2 <8) {return false ;// Invalid} $ extralen = unpack ("v", substr ($ data, 8, 2); $ extralen = $ extralen [1]; if ($ len-$ headerlen-2-$ extralen <8) {return false; // invalid} $ extra = substr ($ data, 10, $ extralen ); $ headerlen + = 2 + $ extralen;} $ filenamelen = 0; $ filename = ""; if ($ flags & 8) {// C-style string if ($ len-$ headerlen-1 <8) {return false; // invalid} $ filenamelen = strpos (substr ($ data, $ headerle N), chr (0); if ($ filenamelen = false | $ len-$ headerlen-$ filenamelen-1 <8) {return false; // invalid} $ filename = substr ($ data, $ headerlen, $ filenamelen); $ headerlen + = $ filenamelen + 1;} $ commentlen = 0; $ comment = ""; if ($ flags & 16) {// C-style string COMMENT data in header if ($ len-$ headerlen-1 <8) {return false; // invalid} $ commentlen = strpos (substr ($ data, $ headerlen ), Chr (0); if ($ commentlen = false | $ len-$ headerlen-$ commentlen-1 <8) {return false; // Invalid header format} $ comment = substr ($ data, $ headerlen, $ commentlen); $ headerlen + = $ commentlen + 1;} $ headercrc = ""; if ($ flags & 2) {// 2-bytes (lowest order) of CRC32 on header present if ($ len-$ headerlen-2 <8) {return false; // invalid} $ calccrc = crc32 (substr ($ data, 0, $ headerlen)) & 0 xffff; $ headercrc = unpack ("v", substr ($ data, $ headerlen, 2); $ headercrc = $ headercrc [1]; if ($ headercrc! = $ Calccrc) {$ error = "Header checksum failed. "; return false; // Bad header CRC} $ headerlen + = 2;} // gzip footer $ datacrc = unpack (" V ", substr ($ data,-8, 4); $ datacrc = sprintf ('% u', $ datacrc [1] & 0 xFFFFFFFF); $ isize = unpack ("V", substr ($ data, -4); $ isize = $ isize [1]; // decompression: $ bodylen = $ len-$ headerlen-8; if ($ bodylen <1) {// implementation bug! Return null;} $ body = substr ($ data, $ headerlen, $ bodylen); $ data = ""; if ($ bodylen> 0) {switch ($ method) {case 8: // Currently the only supported compression method: $ data = gzinflate ($ body, $ maxlength); break; default: $ error = "Unknown compression method. "; return false ;}// zero-byte body content is allowed // Verifiy CRC32 $ crc = sprintf (" % u ", crc32 ($ data )); $ crcOK = $ crc = $ datacrc; $ LenOK = $ isize = strlen ($ data); if (! $ LenOK |! $ CrcOK) {$ error = ($ lenOK? '': 'Length check FAILED. '). ($ crcOK? '': 'Checksum FAILED. '); return false;} return $ data ;}
That is to say, continuous decompression may fail.
Reply to discussion (solution)
Php already provides the gzdecode function
If your php version is really low, there is no gzdecode function
So the php code-level gzdecode function is
function gzdecode($data) { $len = strlen($data); if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { return $data; // Not GZIP format (See RFC 1952) } $method = ord(substr($data,2,1)); // Compression method $flags = ord(substr($data,3,1)); // Flags if ($flags & 31 != $flags) { // Reserved bits are set -- NOT ALLOWED by RFC 1952 return data; } // NOTE: $mtime may be negative (PHP integer limitations) $mtime = unpack("V", substr($data,4,4)); $mtime = $mtime[1]; $xfl = substr($data,8,1); $os = substr($data,8,1); $headerlen = 10; $extralen = 0; $extra = ""; if ($flags & 4) { // 2-byte length prefixed EXTRA data in header if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $extralen = unpack("v",substr($data,8,2)); $extralen = $extralen[1]; if ($len - $headerlen - 2 - $extralen < 8) { return false; // Invalid format } $extra = substr($data,10,$extralen); $headerlen += 2 + $extralen; } $filenamelen = 0; $filename = ""; if ($flags & 8) { // C-style string file NAME data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $filenamelen = strpos(substr($data,8+$extralen),chr(0)); if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // Invalid format } $filename = substr($data,$headerlen,$filenamelen); $headerlen += $filenamelen + 1; } $commentlen = 0; $comment = ""; if ($flags & 16) { // C-style string COMMENT data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format } $comment = substr($data,$headerlen,$commentlen); $headerlen += $commentlen + 1; } $headercrc = ""; if ($flags & 1) { // 2-bytes (lowest order) of CRC32 on header present if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; $headercrc = unpack("v", substr($data,$headerlen,2)); $headercrc = $headercrc[1]; if ($headercrc != $calccrc) { return false; // Bad header CRC } $headerlen += 2; } // GZIP FOOTER - These be negative due to PHP's limitations $datacrc = unpack("V",substr($data,-8,4)); $datacrc = $datacrc[1]; $isize = unpack("V",substr($data,-4)); $isize = $isize[1]; // Perform the decompression: $bodylen = $len-$headerlen-8; if ($bodylen < 1) { // This should never happen - IMPLEMENTATION BUG! return null; } $body = substr($data,$headerlen,$bodylen); $data = ""; if ($bodylen > 0) { switch ($method) { case 8: // Currently the only supported compression method: $data = gzinflate($body); break; default: // Unknown compression method return false; } } else { // I'm not sure if zero-byte body content is allowed. // Allow it for now... Do nothing... } // Verifiy decompressed size and CRC32: // NOTE: This may fail with large data sizes depending on how // PHP's integer limitations affect strlen() since $isize // may be negative for large sizes. if ($isize != strlen($data) || crc32($data) != $datacrc) { // Bad format! Length or CRC doesn't match! return false; } return $data; }
Compare it by yourself to see if you have copied it wrong.
Since the function will return false when the input length and crc32 verification fail, you should judge and proceed to the next step.
Php already provides the gzdecode function
If your php version is really low, there is no gzdecode function
So the php code-level gzdecode function is
function gzdecode($data) { $len = strlen($data); if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { return $data; // Not GZIP format (See RFC 1952) } $method = ord(substr($data,2,1)); // Compression method $flags = ord(substr($data,3,1)); // Flags if ($flags & 31 != $flags) { // Reserved bits are set -- NOT ALLOWED by RFC 1952 return data; } // NOTE: $mtime may be negative (PHP integer limitations) $mtime = unpack("V", substr($data,4,4)); $mtime = $mtime[1]; $xfl = substr($data,8,1); $os = substr($data,8,1); $headerlen = 10; $extralen = 0; $extra = ""; if ($flags & 4) { // 2-byte length prefixed EXTRA data in header if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $extralen = unpack("v",substr($data,8,2)); $extralen = $extralen[1]; if ($len - $headerlen - 2 - $extralen < 8) { return false; // Invalid format } $extra = substr($data,10,$extralen); $headerlen += 2 + $extralen; } $filenamelen = 0; $filename = ""; if ($flags & 8) { // C-style string file NAME data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $filenamelen = strpos(substr($data,8+$extralen),chr(0)); if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // Invalid format } $filename = substr($data,$headerlen,$filenamelen); $headerlen += $filenamelen + 1; } $commentlen = 0; $comment = ""; if ($flags & 16) { // C-style string COMMENT data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format } $comment = substr($data,$headerlen,$commentlen); $headerlen += $commentlen + 1; } $headercrc = ""; if ($flags & 1) { // 2-bytes (lowest order) of CRC32 on header present if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; $headercrc = unpack("v", substr($data,$headerlen,2)); $headercrc = $headercrc[1]; if ($headercrc != $calccrc) { return false; // Bad header CRC } $headerlen += 2; } // GZIP FOOTER - These be negative due to PHP's limitations $datacrc = unpack("V",substr($data,-8,4)); $datacrc = $datacrc[1]; $isize = unpack("V",substr($data,-4)); $isize = $isize[1]; // Perform the decompression: $bodylen = $len-$headerlen-8; if ($bodylen < 1) { // This should never happen - IMPLEMENTATION BUG! return null; } $body = substr($data,$headerlen,$bodylen); $data = ""; if ($bodylen > 0) { switch ($method) { case 8: // Currently the only supported compression method: $data = gzinflate($body); break; default: // Unknown compression method return false; } } else { // I'm not sure if zero-byte body content is allowed. // Allow it for now... Do nothing... } // Verifiy decompressed size and CRC32: // NOTE: This may fail with large data sizes depending on how // PHP's integer limitations affect strlen() since $isize // may be negative for large sizes. if ($isize != strlen($data) || crc32($data) != $datacrc) { // Bad format! Length or CRC doesn't match! return false; } return $data; }
My PHP 5.6,
Gzinflate (substr ($ this-> response_body, 10 ));
Gzdecode ($ this-> response_body)
Mygzdecode ($ this-> response_body );
These three methods can be used, but all of them encounter the same problem. during continuous decompression, the decompression failure may occur.
Happy new year
Compare it by yourself to see if you have copied it wrong.
Since the function will return false when the input length and crc32 verification fail, you should judge and proceed to the next step.
Okay.
Errors of data transmitted over the network are inevitable, but the probability is low.
Repeat it. Generally, you can.
You must have a fault tolerance policy.
Compare it by yourself to see if you have copied it wrong.
Since the function will return false when the input length and crc32 verification fail, you should judge and proceed to the next step.
// Verifiy CRC32
$ Crc = sprintf ("% u", crc32 ($ data ));
$ CrcOK = $ crc = $ datacrc;
$ LenOK = $ isize = strlen ($ data );
If (! $ LenOK |! $ CrcOK ){
$ This-> status = ($ lenOK? '': 'Length check FAILED. '). ($ crcOK? '': 'Checksum FAILED .');
Return false;
}
Return $ data;
The verification failed...
Http://www.cnu.cc/works/111706request
Length check FAILED. Checksum FAILED.
Errors of data transmitted over the network are inevitable, but the probability is low.
Repeat it. Generally, you can.
You must have a fault tolerance policy.
Yes... This place really needs to be strengthened... Only reset connections are performed, and the integrity of received data is not verified ..
Errors of data transmitted over the network are inevitable, but the probability is low.
Repeat it. Generally, you can.
You must have a fault tolerance policy.
OK, continuous acquisition for 10 minutes, no problem... THX, touch big
If a problem occurs during transmission, some data is lost, but decompression fails.
Add the files to be decompressed to the decompressed list and check whether the files are changed every 5 seconds to 10 seconds. if the files do not change, decompress the files. if the files do not change, mark the decompressed files and continue to decompress the files.
If a problem occurs during transmission, some data is lost, but decompression fails.