_php example of get_html of single page acquisition function based on Curl data acquisition

Source: Internet
Author: User
Tags curl

This is a series no way to write in a day or two so an article published

General outline:

Single page acquisition function of 1.curl data collection series get_html

Multi-page Parallel acquisition function of 2.CURL data acquisition series GET_HTMLS

The regular processing function of the 3.curl data collection series Get _matches

Code separation of 4.curl data collection series

Parallel logic control function Web_spider of 5.curl data collection series


Single page acquisition in the data acquisition process is the most commonly used a function sometimes in the case of server access restrictions can only use this collection method slow but simple control so it is important to write a common curl function call.

Baidu and NetEase more familiar so take these two site home collection to do examples to explain


The simplest wording:

Copy Code code as follows:

$url = ' http://www.baidu.com ';
$ch = Curl_init ($url);
curl_setopt ($ch, curlopt_returntransfer,true);
curl_setopt ($ch, curlopt_timeout,5);
$html = curl_exec ($ch);
if ($html!== false) {
Echo $html;
}

Curl_setopt_array can be written as a function because of its frequent use:
Copy Code code as follows:

function get_html ($url, $options = Array ()) {
$options [Curlopt_returntransfer] = true;
$options [Curlopt_timeout] = 5;
$ch = Curl_init ($url);
Curl_setopt_array ($ch, $options);
$html = curl_exec ($ch);
Curl_close ($ch);
if ($html = = False) {
return false;
}
return $html;
}

Copy Code code as follows:

$url = ' http://www.baidu.com ';
echo get_html ($url);

Sometimes need to pass some specific parameters to get the right page like now to get NetEase page:
Copy Code code as follows:

$url = ' http://www.163.com ';
echo get_html ($url);

You'll see a blank, nothing. Then use Curl_getinfo to write a function to see what happens:
Copy Code code as follows:

function Get_info ($url, $options = Array ()) {
$options [Curlopt_returntransfer] = true;
$options [Curlopt_timeout] = 5;
$ch = Curl_init ($url);
Curl_setopt_array ($ch, $options);
$html = curl_exec ($ch);
$info = Curl_getinfo ($ch);
Curl_close ($ch);
return $info;
}
$url = ' http://www.163.com ';
Var_dump (Get_info ($url));

You can see the Http_code 302 Redirect and you need to pass some arguments:

Copy Code code as follows:

$url = ' http://www.163.com ';
$options [Curlopt_followlocation] = true;
Echo get_html ($url, $options);

Will find out how it is such a page and our computer access to different???

It seems the parameters are not enough. The server determines what our client is on the device and returns to the normal version.

Looks like we're sending useragent.

Copy Code code as follows:

$url = ' http://www.163.com ';
$options [Curlopt_followlocation] = true;
$options [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1; rv:19.0) gecko/20100101 firefox/19.0 ';
Echo get_html ($url, $options);

OK Now the page has come out so basic this get_html function can basically achieve this kind of extended function

Of course, there are other ways to achieve, when you clearly know NetEase's web page can be simple to collect:

Copy Code code as follows:

$url = ' http://www.163.com/index.html ';
echo get_html ($url);

This can also be a normal collection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.