Php decompression sometimes fails

Source: Internet
Author: User
Tags rfc
Php decompression sometimes fails to collect data from a website. the returned data is a chunked-encoded, gzip-compressed file. the server of this website is displayed as IIS ,...

It's okay to decode chunked, but it occasionally fails to decompress the gzip file, which affects the extraction of the next group of request connections...

Decompress the package to about 10 groups ..

Here is the data before decompression:



Decompressed data:


Obviously, in the last group, decompression failed ..

Here are three methods you have tried:
 private function _deCompressData()   {       if($this->is_gzip) {          $this->response_body =  gzinflate(substr($this->response_body,10));           //           //           if($temp = gzdecode($this->response_body)) {//               $this->response_body = $temp;//           } else {//              $this->response_body =  $this->mygzdecode($this->response_body);//           }                     //$this->response_body =  $this->mygzdecode($this->response_body);             //         $this->response_body = gzdecode($this->response_body);       }   }


The mygzdecode function is

/*** @ Desc custom decompression function */function mygzdecode ($ data, & $ filename = '', & $ error ='', $ maxlength = null) {$ len = strlen ($ data); if ($ len <18 | strcmp (substr ($ data, 0, 2), "\ x1f \ x8b ")) {$ error = "Not in GZIP format. "; return null; // Not GZIP format (See RFC 1952)} $ method = ord (substr ($ data, 2, 1 )); // Compression method $ flags = ord (substr ($ data, 3, 1); // Flags if ($ flags & 31! = $ Flags) {$ error = "Reserved bits not allowed. "; return null;} // NOTE: $ mtime may be negative (PHP integer limitations) $ mtime = unpack (" V ", substr ($ data, 4, 4 )); $ mtime = $ mtime [1]; $ xfl = substr ($ data, 8, 1); $ OS = substr ($ data, 8, 1); $ headerlen = 10; $ extralen = 0; $ extra = ""; if ($ flags & 4) {// 2-byte length prefixed EXTRA data in header if ($ len-$ headerlen-2 <8) {return false ;// Invalid} $ extralen = unpack ("v", substr ($ data, 8, 2); $ extralen = $ extralen [1]; if ($ len-$ headerlen-2-$ extralen <8) {return false; // invalid} $ extra = substr ($ data, 10, $ extralen ); $ headerlen + = 2 + $ extralen;} $ filenamelen = 0; $ filename = ""; if ($ flags & 8) {// C-style string if ($ len-$ headerlen-1 <8) {return false; // invalid} $ filenamelen = strpos (substr ($ data, $ headerle N), chr (0); if ($ filenamelen = false | $ len-$ headerlen-$ filenamelen-1 <8) {return false; // invalid} $ filename = substr ($ data, $ headerlen, $ filenamelen); $ headerlen + = $ filenamelen + 1;} $ commentlen = 0; $ comment = ""; if ($ flags & 16) {// C-style string COMMENT data in header if ($ len-$ headerlen-1 <8) {return false; // invalid} $ commentlen = strpos (substr ($ data, $ headerlen ), Chr (0); if ($ commentlen = false | $ len-$ headerlen-$ commentlen-1 <8) {return false; // Invalid header format} $ comment = substr ($ data, $ headerlen, $ commentlen); $ headerlen + = $ commentlen + 1;} $ headercrc = ""; if ($ flags & 2) {// 2-bytes (lowest order) of CRC32 on header present if ($ len-$ headerlen-2 <8) {return false; // invalid} $ calccrc = crc32 (substr ($ data, 0, $ headerlen)) & 0 xffff; $ headercrc = unpack ("v", substr ($ data, $ headerlen, 2); $ headercrc = $ headercrc [1]; if ($ headercrc! = $ Calccrc) {$ error = "Header checksum failed. "; return false; // Bad header CRC} $ headerlen + = 2;} // gzip footer $ datacrc = unpack (" V ", substr ($ data,-8, 4); $ datacrc = sprintf ('% u', $ datacrc [1] & 0 xFFFFFFFF); $ isize = unpack ("V", substr ($ data, -4); $ isize = $ isize [1]; // decompression: $ bodylen = $ len-$ headerlen-8; if ($ bodylen <1) {// implementation bug! Return null;} $ body = substr ($ data, $ headerlen, $ bodylen); $ data = ""; if ($ bodylen> 0) {switch ($ method) {case 8: // Currently the only supported compression method: $ data = gzinflate ($ body, $ maxlength); break; default: $ error = "Unknown compression method. "; return false ;}// zero-byte body content is allowed // Verifiy CRC32 $ crc = sprintf (" % u ", crc32 ($ data )); $ crcOK = $ crc = $ datacrc; $ LenOK = $ isize = strlen ($ data); if (! $ LenOK |! $ CrcOK) {$ error = ($ lenOK? '': 'Length check FAILED. '). ($ crcOK? '': 'Checksum FAILED. '); return false;} return $ data ;}



That is to say, continuous decompression may fail.


Reply to discussion (solution)

Php already provides the gzdecode function
If your php version is really low, there is no gzdecode function
So the php code-level gzdecode function is

function gzdecode($data) {   $len = strlen($data);   if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {     return $data;  // Not GZIP format (See RFC 1952)   }   $method = ord(substr($data,2,1));  // Compression method   $flags  = ord(substr($data,3,1));  // Flags   if ($flags & 31 != $flags) {     // Reserved bits are set -- NOT ALLOWED by RFC 1952     return data;   }   // NOTE: $mtime may be negative (PHP integer limitations)   $mtime = unpack("V", substr($data,4,4));   $mtime = $mtime[1];   $xfl   = substr($data,8,1);   $os    = substr($data,8,1);   $headerlen = 10;   $extralen  = 0;   $extra     = "";   if ($flags & 4) {     // 2-byte length prefixed EXTRA data in header     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $extralen = unpack("v",substr($data,8,2));     $extralen = $extralen[1];     if ($len - $headerlen - 2 - $extralen < 8) {       return false;    // Invalid format     }     $extra = substr($data,10,$extralen);     $headerlen += 2 + $extralen;   }   $filenamelen = 0;   $filename = "";   if ($flags & 8) {     // C-style string file NAME data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $filenamelen = strpos(substr($data,8+$extralen),chr(0));     if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {       return false;    // Invalid format     }     $filename = substr($data,$headerlen,$filenamelen);     $headerlen += $filenamelen + 1;   }   $commentlen = 0;   $comment = "";   if ($flags & 16) {     // C-style string COMMENT data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));     if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {       return false;    // Invalid header format     }     $comment = substr($data,$headerlen,$commentlen);     $headerlen += $commentlen + 1;   }   $headercrc = "";   if ($flags & 1) {     // 2-bytes (lowest order) of CRC32 on header present     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;     $headercrc = unpack("v", substr($data,$headerlen,2));     $headercrc = $headercrc[1];     if ($headercrc != $calccrc) {       return false;    // Bad header CRC     }     $headerlen += 2;   }   // GZIP FOOTER - These be negative due to PHP's limitations   $datacrc = unpack("V",substr($data,-8,4));   $datacrc = $datacrc[1];   $isize = unpack("V",substr($data,-4));   $isize = $isize[1];   // Perform the decompression:   $bodylen = $len-$headerlen-8;   if ($bodylen < 1) {     // This should never happen - IMPLEMENTATION BUG!     return null;   }   $body = substr($data,$headerlen,$bodylen);   $data = "";   if ($bodylen > 0) {     switch ($method) {       case 8:         // Currently the only supported compression method:         $data = gzinflate($body);         break;       default:         // Unknown compression method         return false;     }   } else {     // I'm not sure if zero-byte body content is allowed.     // Allow it for now...  Do nothing...   }   // Verifiy decompressed size and CRC32:   // NOTE: This may fail with large data sizes depending on how   //       PHP's integer limitations affect strlen() since $isize   //       may be negative for large sizes.   if ($isize != strlen($data) || crc32($data) != $datacrc) {     // Bad format!  Length or CRC doesn't match!     return false;   }   return $data; }

Compare it by yourself to see if you have copied it wrong.

Since the function will return false when the input length and crc32 verification fail, you should judge and proceed to the next step.

Php already provides the gzdecode function
If your php version is really low, there is no gzdecode function
So the php code-level gzdecode function is

function gzdecode($data) {   $len = strlen($data);   if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {     return $data;  // Not GZIP format (See RFC 1952)   }   $method = ord(substr($data,2,1));  // Compression method   $flags  = ord(substr($data,3,1));  // Flags   if ($flags & 31 != $flags) {     // Reserved bits are set -- NOT ALLOWED by RFC 1952     return data;   }   // NOTE: $mtime may be negative (PHP integer limitations)   $mtime = unpack("V", substr($data,4,4));   $mtime = $mtime[1];   $xfl   = substr($data,8,1);   $os    = substr($data,8,1);   $headerlen = 10;   $extralen  = 0;   $extra     = "";   if ($flags & 4) {     // 2-byte length prefixed EXTRA data in header     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $extralen = unpack("v",substr($data,8,2));     $extralen = $extralen[1];     if ($len - $headerlen - 2 - $extralen < 8) {       return false;    // Invalid format     }     $extra = substr($data,10,$extralen);     $headerlen += 2 + $extralen;   }   $filenamelen = 0;   $filename = "";   if ($flags & 8) {     // C-style string file NAME data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $filenamelen = strpos(substr($data,8+$extralen),chr(0));     if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {       return false;    // Invalid format     }     $filename = substr($data,$headerlen,$filenamelen);     $headerlen += $filenamelen + 1;   }   $commentlen = 0;   $comment = "";   if ($flags & 16) {     // C-style string COMMENT data in header     if ($len - $headerlen - 1 < 8) {       return false;    // Invalid format     }     $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));     if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {       return false;    // Invalid header format     }     $comment = substr($data,$headerlen,$commentlen);     $headerlen += $commentlen + 1;   }   $headercrc = "";   if ($flags & 1) {     // 2-bytes (lowest order) of CRC32 on header present     if ($len - $headerlen - 2 < 8) {       return false;    // Invalid format     }     $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;     $headercrc = unpack("v", substr($data,$headerlen,2));     $headercrc = $headercrc[1];     if ($headercrc != $calccrc) {       return false;    // Bad header CRC     }     $headerlen += 2;   }   // GZIP FOOTER - These be negative due to PHP's limitations   $datacrc = unpack("V",substr($data,-8,4));   $datacrc = $datacrc[1];   $isize = unpack("V",substr($data,-4));   $isize = $isize[1];   // Perform the decompression:   $bodylen = $len-$headerlen-8;   if ($bodylen < 1) {     // This should never happen - IMPLEMENTATION BUG!     return null;   }   $body = substr($data,$headerlen,$bodylen);   $data = "";   if ($bodylen > 0) {     switch ($method) {       case 8:         // Currently the only supported compression method:         $data = gzinflate($body);         break;       default:         // Unknown compression method         return false;     }   } else {     // I'm not sure if zero-byte body content is allowed.     // Allow it for now...  Do nothing...   }   // Verifiy decompressed size and CRC32:   // NOTE: This may fail with large data sizes depending on how   //       PHP's integer limitations affect strlen() since $isize   //       may be negative for large sizes.   if ($isize != strlen($data) || crc32($data) != $datacrc) {     // Bad format!  Length or CRC doesn't match!     return false;   }   return $data; }




My PHP 5.6,
Gzinflate (substr ($ this-> response_body, 10 ));

Gzdecode ($ this-> response_body)

Mygzdecode ($ this-> response_body );

These three methods can be used, but all of them encounter the same problem. during continuous decompression, the decompression failure may occur.


Happy new year

Compare it by yourself to see if you have copied it wrong.

Since the function will return false when the input length and crc32 verification fail, you should judge and proceed to the next step.



Okay.

Errors of data transmitted over the network are inevitable, but the probability is low.
Repeat it. Generally, you can.

You must have a fault tolerance policy.

Compare it by yourself to see if you have copied it wrong.

Since the function will return false when the input length and crc32 verification fail, you should judge and proceed to the next step.




// Verifiy CRC32
$ Crc = sprintf ("% u", crc32 ($ data ));
$ CrcOK = $ crc = $ datacrc;
$ LenOK = $ isize = strlen ($ data );
If (! $ LenOK |! $ CrcOK ){
$ This-> status = ($ lenOK? '': 'Length check FAILED. '). ($ crcOK? '': 'Checksum FAILED .');
Return false;
}
Return $ data;
The verification failed...


Http://www.cnu.cc/works/111706request
Length check FAILED. Checksum FAILED.

Errors of data transmitted over the network are inevitable, but the probability is low.
Repeat it. Generally, you can.

You must have a fault tolerance policy.



Yes... This place really needs to be strengthened... Only reset connections are performed, and the integrity of received data is not verified ..

Errors of data transmitted over the network are inevitable, but the probability is low.
Repeat it. Generally, you can.

You must have a fault tolerance policy.




OK, continuous acquisition for 10 minutes, no problem... THX, touch big

If a problem occurs during transmission, some data is lost, but decompression fails.

Add the files to be decompressed to the decompressed list and check whether the files are changed every 5 seconds to 10 seconds. if the files do not change, decompress the files. if the files do not change, mark the decompressed files and continue to decompress the files.

If a problem occurs during transmission, some data is lost, but decompression fails.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.