Source: www.ido321.com1158.html to capture the content of a webpage, You need to parse the DOM tree, find the specified node, and then capture the required content, the process is a bit cumbersome. LZ summarizes several common and easy-to-implement web page capturing methods. If you are familiar with JQuery selector, these frameworks will be quite simple. 1. Ganon Project
Source: http://www.ido321.com/1158.html captures the content of a Web page, need to parse the DOM tree, find the specified node, and then capture the content we need, the process is a bit cumbersome. LZ summarizes several common and easy-to-implement web page capturing methods. If you are familiar with JQuery selector, these frameworks will be quite simple. 1. Ganon Project
Source: http://www.ido321.com/1158.html
To capture the content of a webpage, You need to parse the DOM tree, find the specified node, and then capture the content we need. This process is a bit cumbersome. LZ summarizes several common and easy-to-implement web page capturing methods. If you are familiar with JQuery selector, these frameworks will be quite simple.
1. Ganon
Project address: http://code.google.com/p/ganon/
Document: http://code.google.com/p/ganon/w/list
Test: capture all the class attribute values on the homepage of my website as the p element of focus and output the class value.
class, "
\n"; }?>
Result:
Ii. phpQuery
Project address: http://code.google.com/p/phpquery/
Document: https://code.google.com/p/phpquery/wiki/Manual
find('h2')->html()."
"; } ?>
Result:
3. Simple-Html-Dom
Address: http://simplehtmldom.sourceforge.net/
Document: http://simplehtmldom.sourceforge.net/manual.htm
Test: capture all links on the home page of my website
Find ('img ') as $ element) // echo $ element-> src .'
'; // Find all links, foreach ($ html-> find ('A') as $ element) echo $ element-> href .'
';?>
Result: (Part 1)
Iv. Snoopy
Project address: http://code.google.com/p/phpquery/
Document: http://code.google.com/p/phpquery/wiki/Manual
Test: capture the homepage of my website
Fetch ($ url); // obtain all content echo $ snoopy-> results; // Display results // echo $ snoopy-> fetchtext; // get text content (remove html code) // echo $ snoopy-> fetchlinks ($ url); // get the link // $ snoopy-> fetchform; // obtain the form?>
Result:
5. Manually write Crawlers
If the writing capability is OK, you can write a Web Crawler to capture webpages. LZ will not repeat this article on the Internet. If you are interested, you can crawl it on the Baidu php webpage.
Ps: Resource Sharing
Common open-source crawler projects please stamp: http://blog.chinaunix.net/uid-22414998-id-3774291.html
The "fart theory" of the National Father-in-law"