Summary of common methods for crawling web pages and parsing HTML in PHP
This article mainly introduces the methods commonly used by PHP to capture webpages and parse HTML. This article only summarizes the methods that can meet these two requirements. It only introduces the methods and does not introduce how to implement them, for more information, see
Overview
Crawlers are a feature we often encounter when developing programs. PHP has many open-source crawler tools, such as snoopy. These open-source crawler tools usually help us complete most of the functions, but in some cases, we need to implement a crawler by ourselves, this article summarizes how PHP implements crawling.
Main methods for implementing crawler in PHP
1. file () function
2. file_get_contents () function
3. fopen ()-> fread ()-> fclose () method
4. curl Method
5. fsockopen () function, socket mode
6. Use open-source tools, such as snoopy
Main Methods for parsing XML or HTML in PHP
1. Regular Expression
2. PHP DOMDocument object
3. Plug-ins, such as PHP Simple html dom Parser
Summary
Here is a simple summary of PHP crawler implementation methods. There are still a lot of content designed in this article. I will summarize the methods for parsing HTML and XML in PHP later.