Today with the Curl_init function crawl Sohu Web page, found the collection of Web pages garbled, after analysis found that the original server opened gzip compression function. Just add multiple options to the function curl_setopt curlopt_encoding resolve gzip to decode correctly.
And if the page is crawled GBK encoded, but the script is Utf-8 encoding, but also to the crawl of the Web page again with function mb_convert_encoding conversion.
$tmp = Sys_get_temp_dir ();
$cookieDump = Tempnam ($tmp, ' cookies ');
$url = ' http://tv.sohu.com ';
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_header, 1);//Displays the contents of the HEADER area returned
curl_setopt ($ch, curlopt_followlocation, 1); Use Auto Jump
curl_setopt ($ch, Curlopt_timeout, 10);//Set timeout limit
curl_setopt ($ch, Curlopt_returntransfer, 1); Gets the information returned as a file stream
curl_setopt ($ch, curlopt_connecttimeout,10);//Link Timeout limit
curl_setopt ($ch, Curlopt_httpheader,array (' accept-encoding:gzip, deflate '));//Set HTTP header information
curl_setopt ($ch, curlopt_encoding, ' gzip,deflate ');//Add gzip decoding options, even if the page doesn't have gzip enabled
curl_setopt ($ch, Curlopt_cookiejar, $cookieDump); File name for storing cookie information
$content = curl_exec ($ch);
Convert crawled pages from GBK to UTF-8
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>
$url = ' http://tv.sohu.com ';
As soon as you add the compress.zlib option, even if the server has gzip compression enabled, you can decode the
$content = file_get_contents ("compress.zlib://". $url);
Convert crawled pages from GBK to UTF-8
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>
Original: http://woqilin.blogspot.com/2014/05/curl-filegetcontents.html
The above describes the curl and File_get_contents Crawl Web page garbled solution, including the file_get_contents aspects of the content, I hope that the PHP tutorial interested in a friend helpful.