What is not found in php information collection?

Source: Internet
Author: User
What is not found in php information collection? PHP Curl information acquisition simulation browser acquisition

Today, I suddenly wanted to collect something. it was okay at the beginning. everything was normal. but after a while, I couldn't collect anything. I don't know where the problem is. the code is as follows. please ask?
Function getContent ($ url ){
$ Url = trim ($ url );
$ Content = '';
If (extension_loaded ('curl ')){
$ Ch = curl_init ();
Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, 1 );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_HTTPHEADER, array (
'Accept-Language: zh-cn ',
'Connection: Keep-alive ',
'Cache-Control: no-cache'
));
$ User_agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1 )";
Curl_setopt ($ ch, CURLOPT_USERAGENT, $ user_agent );
$ Content = curl_exec ($ ch );
Curl_close ($ ch );
} Else {
$ Content = file_get_contents ($ url );
}
Return trim ($ content );

} // End func getContent ();


Reply to discussion (solution)

What is the url you collected?

What is the url you collected?
For example, this address: http://movie.douban.com/subject/10604486? From = playing

Can it be collected? Can't you?

If you can try to extract the content of this address: http://www.tianya.cn/42564769. I only want to have a bit of funny content, but I cannot open the pages later.

Can it be collected? Can't you?
Yes, it may be because I collected it too quickly. I feel a little sorry for them. I can open it normally in a browser, but the collection won't work. Is there any way to continue collection?

In fact, I still feel that my program is not perfect, or I won't be able to use a browser, but the program cannot. please help me to complete my program. I really don't know where it is not perfect, please kindly advise !!!

There is nothing to do here. maybe you call it too frequently. I don't know how you call it.

There is nothing to do here. maybe you call it too frequently. I don't know how you call it.
I just put the address in and called it. it's not too frequent. what if it's too frequent?

I use a loop call, with no pause in the middle and less than one hundred loops. Now I have made the cycle intermittent, but now I cannot continue to use it. what should I do?

I don't know, but it will definitely put a lot of pressure on the server.

It is obviously blocked. generally, sampling will continue in two days.

First, you can calculate how long it takes to get blocked and then set the stop time. Use usleep or sleep for control. Or use a proxy ip address. I am collecting data now, and the collection efficiency will be much lower.

If the browser can open it, it means it has not been closed, and the packet capture check is completely simulated.

If the browser can open it, it means it has not been closed, and the packet capture check is completely simulated.
This is not necessarily because I collected Baidu pages a few days ago, and the browser could open them normally, but I could not collect data. the other party estimated whether it was a simulated capture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.