This article is the PHP crawl page of several methods for a detailed analysis of the introduction, the need for a friend reference
When doing some weather forecast or RSS Subscription program, often need to crawl non-local files, generally using PHP simulation browser access, HTTP requests to access the URL address, and then get the HTML source code or XML data, we can not directly output the data, It is often necessary to extract the content and then format it to appear in a more friendly way.
Here are some simple ways and principles of PHP crawl page:
First, the main method of PHP crawl page:
1. File () function
2. file_get_contents () function
3. fopen ()->fread ()->fclose () mode
4.curl mode
5. Fsockopen () function socket mode
6. Using plug-ins (e.g.: http://sourceforge.net/projects/snoopy/)
Second, PHP parsing HTML or XML code the main way:
1. File () function
The code is as follows:
<?php $url = ' http://t.qq.com '; $lines _array=file ($url); $lines _string=implode (', $lines _array); Echo htmlspecialchars ($lines _string);
2. file_get_contents () function
Use file_get_contents and fopen to open allow_url_fopen. Method: Edit PHP.ini, set allow_url_fopen = On,allow_url_fopen Close when fopen and file_get_contents cannot open remote files.
The code is as follows:
<?php $url = ' http://t.qq.com '; $lines _string=file_get_contents ($url); Echo htmlspecialchars ($lines _string);
3. fopen ()->fread ()->fclose () mode
The code is as follows:
<?php $url = ' http://t.qq.com '; $handle =fopen ($url, "RB"); $lines _string= ""; do{ $data =fread ($handle, 1024x768); if (strlen ($data) ==0) {break ; } $lines _string.= $data; }while (TRUE); Fclose ($handle); Echo htmlspecialchars ($lines _string);
4. Curl Mode
Use curl to have space to turn on curl. Method: Modify PHP.ini under WINDOWS, remove the semicolon in front of Extension=php_curl.dll, and need to copy Ssleay32.dll and Libeay32.dll to C:\WINDOWS\system32 ; Install the curl extension under Linux.
The code is as follows:
<?php $url = ' http://t.qq.com '; $ch =curl_init (); $timeout = 5; curl_setopt ($ch, Curlopt_url, $url); curl_setopt ($ch, Curlopt_returntransfer, 1); curl_setopt ($ch, Curlopt_connecttimeout, $timeout); $lines _string=curl_exec ($ch); Curl_close ($ch); Echo htmlspecialchars ($lines _string);
5. Fsockopen () function socket mode
Socket mode can be executed correctly, but also with the server settings, in particular, through the phpinfo to see what communication protocols are opened by the server, such as my local PHP socket does not open HTTP, only use UDP test.
The code is as follows:
<?php $fp = Fsockopen ("udp://127.0.0.1", $errno, $errstr), if (! $fp) { echo "ERROR: $errno-$errstr <br/ >\n "} else { fwrite ($fp," \ n ") Echo fread ($fp,-) fclose ($FP)}
6. Plugins
Online should have more plug-ins, Snoopy plug-ins are found on the Internet, interested can be studied.