For PHP crawl Web page content, may be more difficult is the DOM parsing this part, here is a few technical recommendations to everyone, specific use of which is to see their favorite
1.php self-brought XPath parsing technology
XPath words specific can Baidu his usage, I only give a few simple examples, nonsense not much to say, the code is as follows
<?php
error_reporting (0);
$url = ' http://www.baidu.com ';//The URL of the page to be crawled here, I write it casually
$html =file_get_contents ($url);
$dom =new DOMDocument;
$dom->loadhtml ($html);
$xml =simplexml_import_dom ($dom);
$nav = $xml->xpath ('//p[@id = "NV"]);//Here's a simple explanation, just call SimpleXML's XPath method and pass in a string that conforms to the XPath syntax. I mean, Gets all the id attribute values for the P tag element of NV
Print_r ($NAV);
2.phpquery,
Phpquery is a DOM parser based on the jquery selector, and if you often use jquery, you'll like the tool, and here's how he uses it.
<?
Include ' phpquery.php ';
Phpquery::newdocumentfile (' http://job.blueidea.com ');
$companies = PQ (' #hotcoms. COMs ')->find (' div ');
foreach ($companies as $company)
{
Echo PQ ($company)->find (' H3 a ')->text (). " <br> ";
}
Simply explain:
- PQ () is like a $ () in jquery
- Basically the jquery selector can be used on phpquery, just put '. ' Become '--'
- Phpquery provides several ways to load files, some use strings, some use files (including URLs), and choose to pay attention to
3.simplehtmldom
Official manual: Http://www.ecartchina.com/php-simple-html-dom/manual.htm
See for yourself, a moment to understand, I spent half an hour less than the time on the skilled use of
By the way, there is also a PHP crawl system, Phpcrawl, if you want to know some PHP search engine knowledge, you can look at his source:
Source
http://sourceforge.net/projects/phpcrawl/files/PHPCrawl/