Curl and File_get_contents Crawl Web page garbled solution filegetcontents Timeout js file get contents WP file get contents

Source: Internet
Author: User
Today with the Curl_init function crawl Sohu Web page, found the collection of Web pages garbled, after analysis found that the original server opened gzip compression function. Just add multiple options to the function curl_setopt curlopt_encoding resolve gzip to decode correctly.
And if the page is crawled GBK encoded, but the script is Utf-8 encoding, but also to the crawl of the Web page again with function mb_convert_encoding conversion.
$tmp = Sys_get_temp_dir ();
$cookieDump = Tempnam ($tmp, ' cookies ');
$url = ' http://tv.sohu.com ';
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_header, 1);//Displays the contents of the HEADER area returned
curl_setopt ($ch, curlopt_followlocation, 1); Use Auto Jump
curl_setopt ($ch, Curlopt_timeout, 10);//Set timeout limit
curl_setopt ($ch, Curlopt_returntransfer, 1); Gets the information returned as a file stream
curl_setopt ($ch, curlopt_connecttimeout,10);//Link Timeout limit
curl_setopt ($ch, Curlopt_httpheader,array (' accept-encoding:gzip, deflate '));//Set HTTP header information
curl_setopt ($ch, curlopt_encoding, ' gzip,deflate ');//Add gzip decoding options, even if the page doesn't have gzip enabled
curl_setopt ($ch, Curlopt_cookiejar, $cookieDump); File name for storing cookie information
$content = curl_exec ($ch);
Convert crawled pages from GBK to UTF-8
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>
$url = ' http://tv.sohu.com ';
As soon as you add the compress.zlib option, even if the server has gzip compression enabled, you can decode the
$content = file_get_contents ("compress.zlib://". $url);
Convert crawled pages from GBK to UTF-8
$content = mb_convert_encoding ($content, "UTF-8", "GBK");
?>
Original: http://woqilin.blogspot.com/2014/05/curl-filegetcontents.html

The above describes the curl and File_get_contents Crawl Web page garbled solution, including the file_get_contents aspects of the content, I hope that the PHP tutorial interested in a friend helpful.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.