If you use php to send Http requests and capture webpage data that have been developed by j2ee or android, you should have used the HttpClient class library of Apeache more or less. This class library provides us with powerful server-side Http request operations. It is very convenient to use in development. Recently, php development also requires sending http requests on the server, processing the returned results, and sending Http requests using php to capture webpage data.
For kids shoes that have been developed for j2ee or android, you should have used the HttpClient class library of Apeache more or less. This class library provides us with powerful server-side Http request operations. It is very convenient to use in development.
Recently, php development also requires sending http requests on the server and then processing and returning the requests to the client. if you use socket, it may not be too troublesome, I want to see if there are any class libraries like HttpClient in php.
Google once found that php really has such a class library, and its name is httpclient. it was quite excited. I went to the official website and found that it has not been updated for many years, and the features seem to be limited. Then I found another class library Snoopy. I didn't know about this class library, but I decided to use it when I checked the online response. Its API usage is very different from the HttpClient of Apeache, but it is still very easy to use. It also provides many special-purpose methods, such as capturing only form forms on the page or all links.
include 'Snoopy.class.php';$snoopy = new Snoopy();$snoopy->fetch("http://www.baidu.com");echo $snoopy->results;
The above code can easily capture Baidu's page.
Of course, when you need to send a post form, you can use the submit method to submit data.
At the same time, he also passed the request header, corresponding headers, and Cookie-related operation functions, which are very powerful.
include "Snoopy.class.php";$snoopy = new Snoopy();$snoopy->proxy_host = "http://www.baidu.cn";$snoopy->proxy_port = "80";$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";$snoopy->referer = "http://www.4wei.cn"; $snoopy->cookies["SessionID"] = '238472834723489';$snoopy->cookies["favoriteColor"] = "RED";$snoopy->rawheaders["Pragma"] = "no-cache";$snoopy->maxredirs = 2;$snoopy->offsiteok = false;$snoopy->expandlinks = false;$snoopy->user = "joe";$snoopy->pass = "bloe";if($snoopy->fetchtext("http://www.baidu.cn")) {echo "" . htmlspecialchars($snoopy->results) . "
\n";} else {echo "error fetching document: " . $snoopy->error . "\n";}For more operation methods, you can go to Snoopy's official documents or directly view the source code.
Here, snoopy only crawls the page back. if you want to extract data from the captured page, it will not be very helpful. Here I found another good php parsing html tool: phpQuery, which provides almost the same operation method as jquery. It also provides some php features and is familiar with jquery's shoes, it should be quite easy to use phpquery. even phpQuery documents are not needed ..
Using Snoopy + PhpQuery can easily capture webpages and parse data. it is very useful. I have recently discovered these two good class libraries, it turns out that php can do a lot of things that java can do.
If you are interested, you can also try using them to make a simple web crawler.