Curl and File_get_contents Crawl Web page garbled solution of the way

Source: Internet
Author: User
Today with the Curl_init function crawl Sohu Web page, found the collection of Web pages garbled, after analysis found that the original server opened gzip compression function. Just add multiple options to the function curl_setopt curlopt_encoding resolve gzip to decode correctly.


And if the page is crawled GBK encoded, but the script is Utf-8 encoding, but also to the crawl of the Web page again with function mb_convert_encoding conversion.

!--? php $tmp = Sys_get_temp_dir ();    $cookieDump = Tempnam ($tmp, & #39; cookies& #39;);    $url = & #39; http://tv.sohu.com& #39;;    $ch = Curl_init ();    curl_setopt ($ch, Curlopt_url, $url); curl_setopt ($ch, Curlopt_header, 1);//Displays the contents of the HEADER area returned curl_setopt ($ch, curlopt_followlocation, 1); Use automatic jump curl_setopt ($ch, Curlopt_timeout, 10);//Set timeout limit curl_setopt ($ch, Curlopt_returntransfer, 1); The information obtained is returned as a file stream curl_setopt ($ch, curlopt_connecttimeout,10);//link time-out limit curl_setopt ($ch, Curlopt_httpheader,array ( & #39; Accept-encoding:gzip, deflate& #39;));//Set HTTP header information curl_setopt ($ch, curlopt_encoding, & #39; gzip,deflate& #39;);  /Add gzip decoding option, even if the webpage does not have gzip enabled curl_setopt ($ch, Curlopt_cookiejar, $cookieDump);    The file name that holds the cookie information $content = curl_exec ($ch); The crawled pages are converted from GBK to UTF-8 $content = mb_convert_encoding ($content, "UTF-8", "GBK"); -->
 
  
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.