The use of single-page parallel acquisition function GET_HTMLS based on Curl Data acquisition _php Tutorial

Source: Internet
Author: User
Using the first get_html () to achieve a simple data acquisition, because it is a single execution to collect data transfer time will be the total length of all page downloads, a page assumes 1 seconds, then 10 pages is 10 seconds. Fortunately, Curl also provides the capability of parallel processing.

To write a parallel collection of functions, first of all to understand what kind of page to collect, to collect the page with what request, can write a relatively common function.


Functional Requirements Analysis:

return what?

Of course each page of the HTML set of the composition of the array

What parameters are passed?

When we wrote Get_html (), we knew that we could use the options array to pass more curl parameters, so the writing of multi-page collection functions would have to be preserved.

What type of parameter?

Both the get and post pass parameters always request the same page or interface, regardless of the parameter, whether you are requesting HTML or calling the Internet API interface. Then the type of the parameter is:

GET_HTMLS ($url, $options);

$url is a string

$options, is a two-dimensional array, and the parameters for each page are an array.

In that case, it seems to solve the problem. But I looked everywhere. The manual for Curl does not see where the get parameters are passed, so only the type of the array is passed in and the method parameter is added


The prototype of the function is determined by the get_htmls ($urls, $options = array, $method = ' get ') and the code is as follows:
Copy CodeThe code is as follows:
function get_htmls ($urls, $options = Array (), $method = ' get ') {
$MH = Curl_multi_init ();
if ($method = = ' Get ') {//get mode is most commonly used
foreach ($urls as $key = = $url) {
$ch = Curl_init ($url);
$options [Curlopt_returntransfer] = true;
$options [Curlopt_timeout] = 5;
Curl_setopt_array ($ch, $options);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}elseif ($method = = ' Post ') {//post mode pass value
foreach ($options as $key = = $option) {
$ch = Curl_init ($urls);
$option [Curlopt_returntransfer] = true;
$option [Curlopt_timeout] = 5;
$option [Curlopt_post] = true;
Curl_setopt_array ($ch, $option);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}else{
Exit ("parameter error!\n");
}
do{
$MRC = Curl_multi_exec ($MH, $active);
Curl_multi_select ($MH);//reduce CPU pressure comment out CPU pressure is getting bigger
}while ($active);
foreach ($curls as $key = = $ch) {
$html = Curl_multi_getcontent ($ch);
Curl_multi_remove_handle ($MH, $ch);
Curl_close ($ch);
$htmls [$key] = $html;
}
Curl_multi_close ($MH);
return $htmls;
}

Common get requests are implemented by changing URL parameters, and because our functions are for data acquisition. Must be a sort of collection, so the URL is similar to this:

Http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8

The above five pages are very regular, changing only the value of PN.
copy code code is as follows:
$urls = Array ();
for ($i =1; $i <=5; $i + +) {
$urls [] = ' http://www.baidu.com/s?wd=shili&pn= '. ( ($i-1) *10). ' &ie=utf-8 ';
}
$option [curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1; rv:19.0) gecko/20100101 firefox/19.0 ';
$htmls = GET_HTMLS ($urls, $option);
foreach ($htmls as $html) {
echo $html;//Get HTML here to data processing
}

impersonate a commonly used POST request:

Write a post.php file as follows:
Copy the Code code as follows:
if (isset ($_post[' username ')) && isset ($_post[' password ']) {
Echo ' username is: '. $_post[' username '. ' The password is: '. $_post[' password '];
}else{
echo ' request error! ';
}

Then call the following:
Copy the Code code as follows:
$url = ' http://localhost/yourpath/post.php ';//This is your path.
$options = Array ();
for ($i =1; $i <=5; $i + +) {
$option [Curlopt_postfields] = ' username=user '. $i. ' &password=pass '. $i;
$options [] = $option;
}
$htmls = get_htmls ($url, $options, ' post ');
foreach ($htmls as $html) {
echo $html;//You can get HTML here to process data.
}

This get_htmls function also basically can realize some function of data collection

Today to share on the right to write here is not clear, please advise

http://www.bkjia.com/PHPjc/326892.html www.bkjia.com true http://www.bkjia.com/PHPjc/326892.html techarticle using the first get_html () to achieve a simple data acquisition, because it is an execution to collect data transfer time will be the total length of all page downloads, a page assumes 1 seconds, that ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.