PHP Crawl Web page

Source: Internet
Author: User
Using PHP to crawl the content of a page is very useful in actual development, such as a simple content collector, extract some of the content in the Web page, and so on, grab the content in the regular expression to do a filter to get the content you want, the following is a few commonly used PHP to crawl the content of the Web page method.
1.file_get_contents
PHP code
$url = "http://www.phpzixue.cn";
$contents = file_get_contents ($url);
If there is a Chinese garbled use the following code
$getcontent = Iconv ("gb2312", "Utf-8", $contents);
Echo $contents;
?>

2.curl
PHP code
$url = "http://www.phpzixue.cn";
$ch = Curl_init ();
$timeout = 5;
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_connecttimeout, $timeout);
The following two lines need to be added to a Web page that requires user testing
curl_setopt ($ch, Curlopt_httpauth, Curlauth_any);
curl_setopt ($ch, Curlopt_userpwd, Us_name. ":". US_PWD);
$contents = curl_exec ($ch);
Curl_close ($ch);
Echo $contents;
?>

3.fopen->fread->fclose
PHP code
$handle = fopen ("http://www.phpzixue.cn", "RB");
$contents = "";
do {
$data = Fread ($handle, 1024);
if (strlen ($data) = = 0) {
Break
}
$contents. = $data;
} while (true);
Fclose ($handle);
Echo $contents;
?>

Note:
1. Use file_get_contents and fopen to open allow_url_fopen. Methods: Edit PHP.ini, set allow_url_fopen = on,allow_url_fopen cannot open remote files when fopen and file_get_contents are closed.
2. Use curl must be open curl space. Method: Modify PHP.ini under WINDOWS, remove the semicolon in front of Extension=php_curl.dll, and need to copy Ssleay32.dll and Libeay32.dll to C:\WINDOWS\system32 ; Linux to install the Curl extension.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.