Why can't data be captured using curl or file_get_content.

Source: Internet
Author: User
Why can't data be captured using curl or file_get_content. Why can't data be captured using curl or file_get_content.

Baidu experience, such as http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html,
Directly view the source code of the page. there is article data.
However, neither curl nor file_get_content can be used to obtain the document content.
Why? IP addresses and routes have been forged, but cannot be captured. What does Baidu use to prevent data capture?

The following code is used:
Function fcontents ($ url, $ timeout = 5, $ referer = "") {$ ch = curl_init (); $ header = array ('User-Agent: mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 ', 'X-FORWARDED-FOR: 154.125.25.15', 'client-IP: 154.125.25.15 '); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_TIMEOUT, $ timeout); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_HTTPHEADER, $ header); // Construct the user IP curl_setopt ($ ch, CURLOPT_REFERER, "http://www.baidu.com /"); // Construct the $ result = curl_exec ($ ch); curl_close ($ ch); return $ result;} $ html = fcontents ('http: // response '); echo $ html;


Reply to discussion (solution)

Curl only captures the content of this page, but many other dynamic content on this page cannot be filled by crawling.

Curl only captures the content of this page, but many other dynamic content on this page cannot be filled by crawling.


The article data should not be dynamic. we can view the source code of the page, and the code that can be viewed can be crawled through curl, and this page can be seen without logon, search engine spider can also capture, why can't I capture it with curl now?

Why is there no cookie. Add the cookie first.

$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );


Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;

Why is there no cookie. Add the cookie first.

$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );


Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;



It cannot be added. I tried it on the local environment and three servers with different IP addresses, but I couldn't catch it.


Why is there no cookie. Add the cookie first.

$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );


Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;



It cannot be added. I tried it on the local environment and three servers with different IP addresses, but I couldn't catch it.



Using the code above, I captured the pages of Baidu Experience. Why don't you paste your code (the cookie code is added ).



Why is there no cookie. Add the cookie first.

$ Url = "http://jingyan.baidu.com/article/00a07f38441c3782d028dc04.html"; $ cookie_jar = dirname (_ FILE __). "/jy. cookie ";/* get cookie */$ ch = curl_init (); curl_setopt ($ ch, CURLOPT_URL, $ url); curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookie_jar); curl_exec ($ ch); curl_close ($ ch );


Then, the request carries the cookie:
$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_jar);curl_setopt($ch, CURLOPT_HEADER, 0);$res = curl_exec($ch);curl_close($ch);echo $res;



It cannot be added. I tried it on the local environment and three servers with different IP addresses, but I couldn't catch it.



Using the code above, I captured the pages of Baidu Experience. Why don't you paste your code (the cookie code is added ).



Thank you very much. I made a mistake. Add less code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.