Web Crawl: PHP implementation of web crawler summary, crawling crawler
Source: http://www.ido321.com/1158.html
Crawl a page of content, need to parse the DOM tree, find the designated node, and then crawl what we need, the process is a bit cumbersome. LZ summarizes several commonly used, easy-to-implement Web Capture methods, and if you are familiar with the jquery selector, these frameworks can be quite simple.
First, Ganon
Project Address: http://code.google.com/p/ganon/
Document: Http://code.google.com/p/ganon/w/list
Test: Crawl My Site home page All class attribute values are div elements of focus and output class values
include' ganon.php '; $html = File_get_dom (' http://www.ido321.com/' ) foreach($html (' div[class= ' focus "] ' as $element) { echo $element class "
\ n "; }?" >
Results:
Second, Phpquery
Project Address: http://code.google.com/p/phpquery/
Document: Https://code.google.com/p/phpquery/wiki/Manual
Test: Grab the article tag element of my site's home page, and then out the HTML value of the H2 tag under the book
include' phpquery/phpquery.php '; phpquery::newdocumentfile (' http:// www.ido321.com/'); $artlist = PQ ("article"foreach as $title) { echo PQ ($title) Find (' H2 ')->html (). "
"; } ? >
Results:
Third, Simple-html-dom
Project Address: http://simplehtmldom.sourceforge.net/
Document: Http://simplehtmldom.sourceforge.net/manual.htm
Test: Crawl all links on my site's homepage
include' simple_html_dom.php '; The DOM$html = file_get_html (' http://www.ido321.com/') can be created using both the URL and file. //Find All pictures //foreach ($html->find (' img ') as $element) // echo $element->src. '
'; //Find all links foreach ($html->find (' a ' as $element) Echo '
';?>
Result: (part of)
Iv. Snoopy
Project Address: http://code.google.com/p/phpquery/
Document: Http://code.google.com/p/phpquery/wiki/Manual
Test: Crawl My Site Home
include("Snoopy.class.php""http://www.ido321.com"new//Get all content echo//Display result //Echo $snoopy->fetchtext;//Get text content (remove HTML code)//echo $ Snoopy->fetchlinks ($url);//Get link ?>
Results:
V. Manual crawler Writing
If writing ability OK, can write a web crawler, achieve Web page crawl. Online has a uniform introduction of this method of the article, LZ will not repeat. Interested to know, can Baidu PHP Web crawl.
PS: Resource sharing
Common open-source crawler projects please poke: http://blog.chinaunix.net/uid-22414998-id-3774291.html
Next: National father-in-law's "Fart People theory"
PHP web crawler to capture part of a site
Landlord, you can use simpl_html_dom this class to collect, specifically how to use, if you will jquery, I believe you see it understand. Good luck.
Crawler Crawl Web keywords, summary for search
Strip_tags ($string)
http://www.bkjia.com/PHPjc/907659.html www.bkjia.com true http://www.bkjia.com/PHPjc/907659.html techarticle Web Crawl: PHP implementation of web crawler summary, crawling crawler Source: http://www.ido321.com/1158.html Crawl A page content, need to parse the DOM tree, find the specified ...