A brief introduction to PHP's techniques for crawling web elements

Last Update:2014-05-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For PHP crawl Web page content, may be more difficult is the DOM parsing this part, here is a few technical recommendations to everyone, specific use of which is to see their favorite

1.php self-brought XPath parsing technology

XPath words specific can Baidu his usage, I only give a few simple examples, nonsense not much to say, the code is as follows

<?php

error_reporting (0);
$url = ' http://www.baidu.com ';//The URL of the page to be crawled here, I write it casually
$html =file_get_contents ($url);
$dom =new DOMDocument;
$dom->loadhtml ($html);
$xml =simplexml_import_dom ($dom);
$nav = $xml->xpath ('//p[@id = "NV"]);//Here's a simple explanation, just call SimpleXML's XPath method and pass in a string that conforms to the XPath syntax. I mean, Gets all the id attribute values for the P tag element of NV
Print_r ($NAV);

2.phpquery,

Phpquery is a DOM parser based on the jquery selector, and if you often use jquery, you'll like the tool, and here's how he uses it.

<?
Include ' phpquery.php ';
Phpquery::newdocumentfile (' http://job.blueidea.com ');
$companies = PQ (' #hotcoms. COMs ')->find (' div ');
foreach ($companies as $company)
{
Echo PQ ($company)->find (' H3 a ')->text (). " <br> ";
}

Simply explain:

PQ () is like a $ () in jquery
Basically the jquery selector can be used on phpquery, just put '. ' Become '--'
Phpquery provides several ways to load files, some use strings, some use files (including URLs), and choose to pay attention to

3.simplehtmldom

Official manual: Http://www.ecartchina.com/php-simple-html-dom/manual.htm

See for yourself, a moment to understand, I spent half an hour less than the time on the skilled use of

By the way, there is also a PHP crawl system, Phpcrawl, if you want to know some PHP search engine knowledge, you can look at his source:

Source

http://sourceforge.net/projects/phpcrawl/files/PHPCrawl/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A brief introduction to PHP's techniques for crawling web elements

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support