Using PHP to crawl the content of the page in the actual development is very useful, such as a simple content collector, extract part of the content of the Web page and so on to get what you want to do through the regular expression filter, as to how to use regular expression filter, here do not introduce, interested, Here are a few common ways to crawl content in a Web page with PHP.
1.file_get_contents
PHP code
Copy CodeThe code is as follows:
<?php
$url = "Http://www.jb51.net";
$contents = file_get_contents ($url);
If there is a garbled Chinese use the following code
$getcontent = Iconv ("gb2312", "Utf-8", $contents);
Echo $contents;
?>
2.curl
PHP code
Copy CodeThe code is as follows:
<?php
$url = "Http://www.jb51.net";
$ch = Curl_init ();
$timeout = 5;
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_connecttimeout, $timeout);
Add the following two lines to the Web page that requires user detection
curl_setopt ($ch, Curlopt_httpauth, Curlauth_any);
curl_setopt ($ch, Curlopt_userpwd, Us_name. ":". US_PWD);
$contents = curl_exec ($ch);
Curl_close ($ch);
Echo $contents;
?>
3.fopen->fread->fclose
PHP code
Copy CodeThe code is as follows:
<?php
$handle = fopen ("Http://www.jb51.net", "RB");
$contents = "";
do {
$data = Fread ($handle, 1024);
if (strlen ($data) = = 0) {
Break
}
$contents. = $data;
} while (true);
Fclose ($handle);
Echo $contents;
?>
Note:
1. Use file_get_contents and fopen to open the Allow_url_fopen. Method: Edit PHP.ini, set allow_url_fopen = On,allow_url_fopen Close when fopen and file_get_contents cannot open remote files.
2. Use curl to have space to turn on curl. Method: Modify PHP.ini under WINDOWS, remove the semicolon in front of Extension=php_curl.dll, and need to copy Ssleay32.dll and Libeay32.dll to C:\WINDOWS\system32 ; Install the curl extension under Linux.
PHP get Web content Method Summary