In fact, grab Ajax asynchronous content of the page and catch the ordinary page is not very different. Ajax simply makes an asynchronous HTTP request, as long as it uses firebug-like tools to find the requested backend service URL and value-passing parameters, and then fetches the URL parameters.
using Firebug's Network Tools
If you are grabbing a page, the data that is not displayed in the content is a bunch of JS code.
Code
$cookie _file=Tempnam('./temp ', ' Cookie ');$ch=curl_init ();$url 1= "Http://www.cdut.edu.cn/default.html"; curl_setopt ($ch, Curlopt_url,$url 1); curl_setopt ($ch, Curlopt_http_version,curl_http_version_1_1); curl_setopt ($ch, curlopt_header,0); curl_setopt ($ch, curlopt_returntransfer,1); curl_setopt ($ch, curlopt_followlocation,1); curl_setopt ($ch, curlopt_encoding, ' gzip ');//Add gzip parse//Set file to save cookie information after connection endscurl_setopt ($ch, Curlopt_cookiejar,$cookie _file);$content=curl_exec ($ch); Curl_close ($ch);$ch 3=curl_init ();$url 3= "Http://www.cdut.edu.cn/xww/dwr/call/plaincall/portalAjax.getNewsXml.dwr";$curlPost= "callcount=1&page=/xww/type/1000020118.html&httpsessionid=12a9b726e6a2d4d3b09de7952b2f282c& Scriptsessionid=295315b4b4141b09da888d3a3adb8faa658&c0-scriptname=portalajax&c0-methodname=getnewsxml &c0-id=0&c0-param0=string:10000201&c0-param1=string:1000020118&c0-param2=string:news_& C0-param3=number:5969&c0-param4=number:1&c0-param5=null:null&c0-param6=null:null&batchid=0 "; curl_setopt ($ch 3, Curlopt_url,$url 3); curl_setopt ($ch 3, curlopt_post,1); curl_setopt ($ch 3, Curlopt_postfields,$curlPost);//set up files to save cookie information after connection endscurl_setopt ($ch 3, Curlopt_cookiefile,$cookie _file); $content 1=curl_exec ($ch 3); Curl_close ($ch 3);
I'm the dividing line of the king of the land Tiger.
PHP Curl Crawl Ajax asynchronous content