Summary of common methods for crawling web pages and parsing HTML in PHP

Last Update:2015-07-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the methods commonly used by PHP to capture webpages and parse HTML. This article only summarizes the methods that can meet these two requirements. It only introduces the methods and does not introduce how to implement them, for more information, see

Overview

Crawlers are a feature we often encounter when developing programs. PHP has many open-source crawler tools, such as snoopy. These open-source crawler tools usually help us complete most of the functions, but in some cases, we need to implement a crawler by ourselves, this article summarizes how PHP implements crawling.

Main methods for implementing crawler in PHP

1. file () function

2. file_get_contents () function

3. fopen ()-> fread ()-> fclose () method

4. curl Method

5. fsockopen () function, socket mode

6. Use open-source tools, such as snoopy

Main Methods for parsing XML or HTML in PHP

1. Regular Expression

2. PHP DOMDocument object

3. Plug-ins, such as PHP Simple html dom Parser

Summary

Here is a simple summary of PHP crawler implementation methods. There are still a lot of content designed in this article. I will summarize the methods for parsing HTML and XML in PHP later.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of common methods for crawling web pages and parsing HTML in PHP

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support