A brief introduction to PHP's techniques for crawling web elements

Source: Internet
Author: User

For PHP crawl Web page content, may be more difficult is the DOM parsing this part, here is a few technical recommendations to everyone, specific use of which is to see their favorite


1.php self-brought XPath parsing technology

XPath words specific can Baidu his usage, I only give a few simple examples, nonsense not much to say, the code is as follows

<?php

error_reporting (0);
$url = ' http://www.baidu.com ';//The URL of the page to be crawled here, I write it casually
$html =file_get_contents ($url);
$dom =new DOMDocument;
$dom->loadhtml ($html);
$xml =simplexml_import_dom ($dom);
$nav = $xml->xpath ('//p[@id = "NV"]);//Here's a simple explanation, just call SimpleXML's XPath method and pass in a string that conforms to the XPath syntax. I mean, Gets all the id attribute values for the P tag element of NV
Print_r ($NAV);


2.phpquery,

Phpquery is a DOM parser based on the jquery selector, and if you often use jquery, you'll like the tool, and here's how he uses it.

<?
Include ' phpquery.php ';
Phpquery::newdocumentfile (' http://job.blueidea.com ');
$companies = PQ (' #hotcoms. COMs ')->find (' div ');
foreach ($companies as $company)
{
Echo PQ ($company)->find (' H3 a ')->text (). " <br> ";
}


Simply explain:

  • PQ () is like a $ () in jquery
  • Basically the jquery selector can be used on phpquery, just put '. ' Become '--'
  • Phpquery provides several ways to load files, some use strings, some use files (including URLs), and choose to pay attention to
3.simplehtmldom

Official manual: Http://www.ecartchina.com/php-simple-html-dom/manual.htm

See for yourself, a moment to understand, I spent half an hour less than the time on the skilled use of


By the way, there is also a PHP crawl system, Phpcrawl, if you want to know some PHP search engine knowledge, you can look at his source:

Source

http://sourceforge.net/projects/phpcrawl/files/PHPCrawl/


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.