Several implementation modes of web page capturing in php

Source: Internet
Author: User
Several methods for capturing web pages in php: several methods for capturing web pages in php: A joke platform recently, including the web version and the installation version. because there are no joke resources, so I used php to write a background program to capture data from various joke websites on a regular basis every day. below I sorted out some basic ways for php to capture webpage content. 1. main methods for capturing pages in PHP: 1. & nbsp; file () function & nbsp; 2. & php web page capturing implementation methods

Several methods for capturing web pages in php



Recently, I am working on a joke platform, including the web version and the installation version. since there are no joke resources, I used php to write a background program to capture data from various joke websites on a daily basis, the following describes some basic methods for capturing Web content in php.


I. main methods for capturing pages in PHP:

1. file () function 2. file_get_contents () function 3. fopen ()-> fread ()-> fclose () Mode 4. curl method 5. fsockopen () function socket mode 6. use plug-ins.


II. main methods for parsing html or xml code in PHP:

1. Regular expression 2. PHP DOMDocument object 3. plug-in (for example, PHP Simple html dom Parser)

If you are familiar with the above content, you can see the following content ......

PHP crawling page

1. file () function


 


2. file_get_contents () function
Use file_get_contents and fopen to enable allow_url_fopen. Method: edit php. ini and set allow_url_fopen = On. when allow_url_fopen is disabled, neither fopen nor file_get_contents can open remote files.

 


3. fopen ()-> fread ()-> fclose () mode

 


4. curl method
Use curl to enable curl. Method: modify php in windows. ini, remove the semicolon before extension = php_curl.dll, and copy ssleay32.dll and libeay32.dll to C: \ WINDOWS \ system32; install curl extension in Linux.

 


5. fsockopen () function socket mode
Whether the socket mode can be correctly executed depends on the server settings. you can use phpinfo to check which communication protocols are enabled on the server. for example, my local php socket does not enable http, you can only use udp for testing.

 \n";} else {fwrite($fp, "\n");echo fread($fp, 26);fclose($fp);}?>


6. plug-ins
There should be a lot of plug-ins on the Internet, and snoopy plug-ins are found on the internet. if you are interested, you can study them.

PHP parses xml (html)

1. Regular expression:

 (.*)',$lines_string,$title);echo htmlspecialchars($title[0]);?>


2. PHP DOMDocument () object
If the remote html or xml file has a syntax error, php will report an error when parsing the dom.

 loadHTMLFile($url);$title=$html->getElementsByTagName('title');echo $title->item(0)->nodeValue;?>


3. plug-ins
This article takes PHP Simple html dom Parser as an example to give a brief introduction. the simple_html_dom syntax is similar to jQuery, which allows php to operate the dom, just as Simple as using jQuery to operate the dom.

 find('title');echo $title[0]->plaintext;?>




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.