Several methods for capturing web pages in php: several methods for capturing web pages in php: A joke platform recently, including the web version and the installation version. because there are no joke resources, so I used php to write a background program to capture data from various joke websites on a regular basis every day. below I sorted out some basic ways for php to capture webpage content. 1. main methods for capturing pages in PHP: 1. & nbsp; file () function & nbsp; 2. & php web page capturing implementation methods
Several methods for capturing web pages in php
Recently, I am working on a joke platform, including the web version and the installation version. since there are no joke resources, I used php to write a background program to capture data from various joke websites on a daily basis, the following describes some basic methods for capturing Web content in php.
I. main methods for capturing pages in PHP:
1. file () function 2. file_get_contents () function 3. fopen ()-> fread ()-> fclose () Mode 4. curl method 5. fsockopen () function socket mode 6. use plug-ins.
II. main methods for parsing html or xml code in PHP:
1. Regular expression 2. PHP DOMDocument object 3. plug-in (for example, PHP Simple html dom Parser)
If you are familiar with the above content, you can see the following content ......
PHP crawling page
1. file () function
2. file_get_contents () function
Use file_get_contents and fopen to enable allow_url_fopen. Method: edit php. ini and set allow_url_fopen = On. when allow_url_fopen is disabled, neither fopen nor file_get_contents can open remote files.
3. fopen ()-> fread ()-> fclose () mode
4. curl method
Use curl to enable curl. Method: modify php in windows. ini, remove the semicolon before extension = php_curl.dll, and copy ssleay32.dll and libeay32.dll to C: \ WINDOWS \ system32; install curl extension in Linux.
5. fsockopen () function socket mode
Whether the socket mode can be correctly executed depends on the server settings. you can use phpinfo to check which communication protocols are enabled on the server. for example, my local php socket does not enable http, you can only use udp for testing.
\n";} else {fwrite($fp, "\n");echo fread($fp, 26);fclose($fp);}?>
6. plug-ins
There should be a lot of plug-ins on the Internet, and snoopy plug-ins are found on the internet. if you are interested, you can study them.
PHP parses xml (html)
1. Regular expression:
(.*)',$lines_string,$title);echo htmlspecialchars($title[0]);?>
2. PHP DOMDocument () object
If the remote html or xml file has a syntax error, php will report an error when parsing the dom.
loadHTMLFile($url);$title=$html->getElementsByTagName('title');echo $title->item(0)->nodeValue;?>
3. plug-ins
This article takes PHP Simple html dom Parser as an example to give a brief introduction. the simple_html_dom syntax is similar to jQuery, which allows php to operate the dom, just as Simple as using jQuery to operate the dom.
find('title');echo $title[0]->plaintext;?>