Client | network
Magpierss in the Snoopy, which makes me a little interested to study this. Then SF, find the source code. Actually is a class, but do not laugh to see Oh, the function is very powerful.
The official introduction, I translated the next (Khan ...) Always play the role of interpreter recently)
Snoopy is a PHP class that mimics the functionality of a Web browser, which completes the task of getting web content and sending forms.
Here are some of its features:
1, easy to crawl the content of the Web page
2, easy to crawl the text of the page (get rid of HTML code)
3, easy to crawl web links
4, support Agent Host
5, support the basic user/password Authentication mode
6. Support custom user Agent,referer,cookies and header content
7, support browser steering, and can control the steering depth
8, can extend the link in the webpage to the High quality URL (the default)
9. Easy to submit data and get return value
10, support the tracking HTML frame (v0.92 added)
11. Transfer cookies when supporting and turning again
Here are some simple examples, such as the words we crawl my blog
?
Include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchtext ("Http://www.phpobject.net/blog");
Echo $snoopy->results;
?>
' That's nice to put in, like crawling links
?
Include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchlinks ("Http://www.phpobject.net/blog");
Print_r ($snoopy->results);
?>
Hey, the effect is good, and all is the URL we need, and there is no kind of/blog/read.php/85.htm that kind of thing.
Submit the data in addition to those later test ...