Today with the Curl_init function crawl Sohu Web page, found the collection of Web pages garbled, after analysis found that the original server opened gzip compression function. Just add multiple options to the function curl_setopt curlopt_encoding resolve gzip to decode correctly.
And if the page is crawled GBK encoded, but the script is Utf-8 encoding, but also to the crawl of the Web page again with function mb_convert_encoding conversion.
!--? php $tmp = Sys_get_temp_dir (); $cookieDump = Tempnam ($tmp, & #39; cookies& #39;); $url = & #39; http://tv.sohu.com& #39;; $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $url); curl_setopt ($ch, Curlopt_header, 1);//Displays the contents of the HEADER area returned curl_setopt ($ch, curlopt_followlocation, 1); Use automatic jump curl_setopt ($ch, Curlopt_timeout, 10);//Set timeout limit curl_setopt ($ch, Curlopt_returntransfer, 1); The information obtained is returned as a file stream curl_setopt ($ch, curlopt_connecttimeout,10);//link time-out limit curl_setopt ($ch, Curlopt_httpheader,array ( & #39; Accept-encoding:gzip, deflate& #39;));//Set HTTP header information curl_setopt ($ch, curlopt_encoding, & #39; gzip,deflate& #39;); /Add gzip decoding option, even if the webpage does not have gzip enabled curl_setopt ($ch, Curlopt_cookiejar, $cookieDump); The file name that holds the cookie information $content = curl_exec ($ch); The crawled pages are converted from GBK to UTF-8 $content = mb_convert_encoding ($content, "UTF-8", "GBK"); -->