Why can't data be captured using curl or file_get_content. Why can't data be captured using curl or file_get_content.
Baidu experience, such as http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html,
Directly view the source code of the page. there is article data.
However, neither curl nor file_get_content can be used to obtain the document content.
Why? IP addresses and routes have been forged, but cannot be captured. What does Baidu use to prevent data capture?
The following code is used:
Function fcontents ($ url, $ timeout = 5, $ referer = "") {$ ch = curl_init (); $ header = array ('User-Agent: mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 ', 'X-FORWARDED-FOR: 154.125.25.15', 'client-IP: 154.125.25.15 '); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_TIMEOUT, $ timeout); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_HTTPHEADER, $ header); // Construct the user IP curl_setopt ($ ch, CURLOPT_REFERER, "http://www.baidu.com /"); // Construct the $ result = curl_exec ($ ch); curl_close ($ ch); return $ result;} $ html = fcontents ('http: // response '); echo $ html;
Reply to discussion (solution)
Curl only captures the content of this page, but many other dynamic content on this page cannot be filled by crawling.
Curl only captures the content of this page, but many other dynamic content on this page cannot be filled by crawling.
The article data should not be dynamic. we can view the source code of the page, and the code that can be viewed can be crawled through curl, and this page can be seen without logon, search engine spider can also capture, why can't I capture it with curl now?
Why is there no cookie. Add the cookie first.
$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );
Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;
Why is there no cookie. Add the cookie first.
$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );
Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;
It cannot be added. I tried it on the local environment and three servers with different IP addresses, but I couldn't catch it.
Why is there no cookie. Add the cookie first.
$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );
Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;
It cannot be added. I tried it on the local environment and three servers with different IP addresses, but I couldn't catch it.
Using the code above, I captured the pages of Baidu Experience. Why don't you paste your code (the cookie code is added ).
Why is there no cookie. Add the cookie first.
$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );
Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;
It cannot be added. I tried it on the local environment and three servers with different IP addresses, but I couldn't catch it.
Using the code above, I captured the pages of Baidu Experience. Why don't you paste your code (the cookie code is added ).
Thank you very much. I made a mistake. Add less code.