Using PHP to crawl the content of a page is very useful in actual development, such as a simple content collector, extract some of the content in the Web page, and so on, grab the content in the regular expression to do a filter to get the content you want, the following is a few commonly used PHP to crawl the content of the Web page method.
1.file_get_contents
PHP code
$url = "http://www.phpzixue.cn"; $contents = file_get_contents ($url); If there is a Chinese garbled use the following code $getcontent = Iconv ("gb2312", "Utf-8", $contents); Echo $contents; ?> |
2.curl
PHP code
$url = "http://www.phpzixue.cn"; $ch = Curl_init (); $timeout = 5; curl_setopt ($ch, Curlopt_url, $url); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_connecttimeout, $timeout); The following two lines need to be added to a Web page that requires user testing curl_setopt ($ch, Curlopt_httpauth, Curlauth_any); curl_setopt ($ch, Curlopt_userpwd, Us_name. ":". US_PWD); $contents = curl_exec ($ch); Curl_close ($ch); Echo $contents; ?> |
3.fopen->fread->fclose
PHP code
$handle = fopen ("http://www.phpzixue.cn", "RB"); $contents = ""; do { $data = Fread ($handle, 1024); if (strlen ($data) = = 0) { Break } $contents. = $data; } while (true); Fclose ($handle); Echo $contents; ?> |
Note:
1. Use file_get_contents and fopen to open allow_url_fopen. Methods: Edit PHP.ini, set allow_url_fopen = on,allow_url_fopen cannot open remote files when fopen and file_get_contents are closed.
2. Use curl must be open curl space. Method: Modify PHP.ini under WINDOWS, remove the semicolon in front of Extension=php_curl.dll, and need to copy Ssleay32.dll and Libeay32.dll to C:\WINDOWS\system32 ; Linux to install the Curl extension.