Using the first get_html () to achieve a simple data acquisition, because it is a single execution to collect data transfer time will be the total length of all page downloads, a page assumes 1 seconds, then 10 pages is 10 seconds. Fortunately, Curl also provides the capability of parallel processing.
To write a parallel collection of functions, first of all to understand what kind of page to collect, to collect the page with what request, can write a relatively common function.
Functional Requirements Analysis:
return what?
Of course each page of the HTML set of the composition of the array
What parameters are passed?
When we wrote Get_html (), we knew that we could use the options array to pass more curl parameters, so the writing of multi-page collection functions would have to be preserved.
What type of parameter?
Both the get and post pass parameters always request the same page or interface, regardless of the parameter, whether you are requesting HTML or calling the Internet API interface. Then the type of the parameter is:
GET_HTMLS ($url, $options);
$url is a string
$options, is a two-dimensional array, and the parameters for each page are an array.
In that case, it seems to solve the problem. But I looked everywhere. The manual for Curl does not see where the get parameters are passed, so only the type of the array is passed in and the method parameter is added
The prototype of the function is determined by the get_htmls ($urls, $options = array, $method = ' get ') and the code is as follows:
Copy CodeThe code is as follows:
function get_htmls ($urls, $options = Array (), $method = ' get ') {
$MH = Curl_multi_init ();
if ($method = = ' Get ') {//get mode is most commonly used
foreach ($urls as $key = = $url) {
$ch = Curl_init ($url);
$options [Curlopt_returntransfer] = true;
$options [Curlopt_timeout] = 5;
Curl_setopt_array ($ch, $options);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}elseif ($method = = ' Post ') {//post mode pass value
foreach ($options as $key = = $option) {
$ch = Curl_init ($urls);
$option [Curlopt_returntransfer] = true;
$option [Curlopt_timeout] = 5;
$option [Curlopt_post] = true;
Curl_setopt_array ($ch, $option);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}else{
Exit ("parameter error!\n");
}
do{
$MRC = Curl_multi_exec ($MH, $active);
Curl_multi_select ($MH);//reduce CPU pressure comment out CPU pressure is getting bigger
}while ($active);
foreach ($curls as $key = = $ch) {
$html = Curl_multi_getcontent ($ch);
Curl_multi_remove_handle ($MH, $ch);
Curl_close ($ch);
$htmls [$key] = $html;
}
Curl_multi_close ($MH);
return $htmls;
}
Common get requests are implemented by changing URL parameters, and because our functions are for data acquisition. Must be a sort of collection, so the URL is similar to this:
Http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8
The above five pages are very regular, changing only the value of PN.
copy code code is as follows:
$urls = Array ();
for ($i =1; $i <=5; $i + +) {
$urls [] = ' http://www.baidu.com/s?wd=shili&pn= '. ( ($i-1) *10). ' &ie=utf-8 ';
}
$option [curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1; rv:19.0) gecko/20100101 firefox/19.0 ';
$htmls = GET_HTMLS ($urls, $option);
foreach ($htmls as $html) {
echo $html;//Get HTML here to data processing
}
impersonate a commonly used POST request:
Write a post.php file as follows:
Copy the Code code as follows:
if (isset ($_post[' username ')) && isset ($_post[' password ']) {
Echo ' username is: '. $_post[' username '. ' The password is: '. $_post[' password '];
}else{
echo ' request error! ';
}
Then call the following:
Copy the Code code as follows:
$url = ' http://localhost/yourpath/post.php ';//This is your path.
$options = Array ();
for ($i =1; $i <=5; $i + +) {
$option [Curlopt_postfields] = ' username=user '. $i. ' &password=pass '. $i;
$options [] = $option;
}
$htmls = get_htmls ($url, $options, ' post ');
foreach ($htmls as $html) {
echo $html;//You can get HTML here to process data.
}
This get_htmls function also basically can realize some function of data collection
Today to share on the right to write here is not clear, please advise
http://www.bkjia.com/PHPjc/326892.html www.bkjia.com true http://www.bkjia.com/PHPjc/326892.html techarticle using the first get_html () to achieve a simple data acquisition, because it is an execution to collect data transfer time will be the total length of all page downloads, a page assumes 1 seconds, that ...