With the first get_html () to achieve a simple data acquisition, because it is a implementation of data acquisition of the transmission time will be all the total length of the page download, a page assuming 1 seconds, then 10 pages is 10 seconds. Fortunately Curl also provides the function of parallel processing.
To write a parallel collection of functions, first to understand what to collect the page, the acquisition of the page with what request, in order to write a relatively common function.
Functional Requirements Analysis:
return what?
Of course each page of HTML sets the combined array of
What parameters are passed?
When writing get_html (), we know that we can use the options array to pass more curl parameters, so many pages at the same time to compile the function of this feature must be preserved.
What type of argument?
Whether it's requesting web HTML or calling the Internet API interface, get and post pass parameters always request the same page or interface, but the parameters are different. Then the type of the parameter is:
GET_HTMLS ($url, $options);
$url is a string
$options, is a two-dimensional array, each of the parameters of the page is an array.
In this case, it seems to solve the problem. But I looked through the curl's manual without seeing where the parameters of get were passed, so only $url was passed in the form of an array and a method parameter was added
The prototype of the function is set to GET_HTMLS ($urls, $options = array, $method = ' get '), and the code is as follows:
Copy Code code as follows:
function get_htmls ($urls, $options = Array (), $method = ' get ') {
$MH = Curl_multi_init ();
if ($method = = ' Get ') {//get is the most common way to pass values
foreach ($urls as $key => $url) {
$ch = Curl_init ($url);
$options [Curlopt_returntransfer] = true;
$options [Curlopt_timeout] = 5;
Curl_setopt_array ($ch, $options);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}elseif ($method = = ' Post ') {//post-pass value
foreach ($options as $key => $option) {
$ch = Curl_init ($urls);
$option [Curlopt_returntransfer] = true;
$option [Curlopt_timeout] = 5;
$option [Curlopt_post] = true;
Curl_setopt_array ($ch, $option);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}else{
Exit ("parameter error!\n");
}
do{
$MRC = Curl_multi_exec ($MH, $active);
Curl_multi_select ($MH);//reduce CPU pressure comment out CPU pressure becomes larger
}while ($active);
foreach ($curls as $key => $ch) {
$html = Curl_multi_getcontent ($ch);
Curl_multi_remove_handle ($MH, $ch);
Curl_close ($ch);
$htmls [$key] = $html;
}
Curl_multi_close ($MH);
return $htmls;
}
The common get requests are implemented by changing the URL parameters, and because our functions are for data acquisition. Must be classified collection, so the URL is similar to this:
Http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8
Http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8
The top five pages are very regular and change only the value of PN.
Copy Code code as follows:
$urls = Array ();
For ($i =1 $i <=5; $i + +) {
$urls [] = ' http://www.baidu.com/s?wd=shili&pn= '. (($i-1) *10). ' &ie=utf-8 ';
}
$option [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1; rv:19.0) gecko/20100101 firefox/19.0 ';
$htmls = get_htmls ($urls, $option);
foreach ($htmls as $html) {
echo $html//The HTML can be processed here.
}
Simulate a common POST request:
Write a post.php file as follows:
Copy Code code as follows:
if (isset ($_post[' username ')) && isset ($_post[' password ')) {
Echo ' username is: '. $_post[' username '. ' The password is: '. $_post[' password '];
}else{
echo ' request Error! '
}
Then call the following:
Copy Code code as follows:
$url = ' http://localhost/yourpath/post.php '; this is your path.
$options = Array ();
For ($i =1 $i <=5; $i + +) {
$option [Curlopt_postfields] = ' username=user '. $i. ' &password=pass '. $i;
$options [] = $option;
}
$htmls = get_htmls ($url, $options, ' post ');
foreach ($htmls as $html) {
echo $html//The HTML can be processed here.
}
This get_htmls function can also basically achieve some data acquisition function.
Today to share to write the bad said not clear, please advise