_php instance of get_htmls of single page parallel acquisition function based on Curl data acquisition

Source: Internet
Author: User
Tags curl php file

With the first get_html () to achieve a simple data acquisition, because it is a implementation of data acquisition of the transmission time will be all the total length of the page download, a page assuming 1 seconds, then 10 pages is 10 seconds. Fortunately Curl also provides the function of parallel processing.

To write a parallel collection of functions, first to understand what to collect the page, the acquisition of the page with what request, in order to write a relatively common function.


Functional Requirements Analysis:

return what?

Of course each page of HTML sets the combined array of

What parameters are passed?

When writing get_html (), we know that we can use the options array to pass more curl parameters, so many pages at the same time to compile the function of this feature must be preserved.

What type of argument?

Whether it's requesting web HTML or calling the Internet API interface, get and post pass parameters always request the same page or interface, but the parameters are different. Then the type of the parameter is:

GET_HTMLS ($url, $options);

$url is a string

$options, is a two-dimensional array, each of the parameters of the page is an array.

In this case, it seems to solve the problem. But I looked through the curl's manual without seeing where the parameters of get were passed, so only $url was passed in the form of an array and a method parameter was added


The prototype of the function is set to GET_HTMLS ($urls, $options = array, $method = ' get '), and the code is as follows:

Copy Code code as follows:

function get_htmls ($urls, $options = Array (), $method = ' get ') {
$MH = Curl_multi_init ();
if ($method = = ' Get ') {//get is the most common way to pass values
foreach ($urls as $key => $url) {
$ch = Curl_init ($url);
$options [Curlopt_returntransfer] = true;
$options [Curlopt_timeout] = 5;
Curl_setopt_array ($ch, $options);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}elseif ($method = = ' Post ') {//post-pass value
foreach ($options as $key => $option) {
$ch = Curl_init ($urls);
$option [Curlopt_returntransfer] = true;
$option [Curlopt_timeout] = 5;
$option [Curlopt_post] = true;
Curl_setopt_array ($ch, $option);
$curls [$key] = $ch;
Curl_multi_add_handle ($MH, $curls [$key]);
}
}else{
Exit ("parameter error!\n");
}
do{
$MRC = Curl_multi_exec ($MH, $active);
Curl_multi_select ($MH);//reduce CPU pressure comment out CPU pressure becomes larger
}while ($active);
foreach ($curls as $key => $ch) {
$html = Curl_multi_getcontent ($ch);
Curl_multi_remove_handle ($MH, $ch);
Curl_close ($ch);
$htmls [$key] = $html;
}
Curl_multi_close ($MH);
return $htmls;
}

The common get requests are implemented by changing the URL parameters, and because our functions are for data acquisition. Must be classified collection, so the URL is similar to this:

Http://www.baidu.com/s?wd=shili&pn=0&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=10&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=20&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=30&ie=utf-8

Http://www.baidu.com/s?wd=shili&pn=50&ie=utf-8

The top five pages are very regular and change only the value of PN.

Copy Code code as follows:

$urls = Array ();
For ($i =1 $i <=5; $i + +) {
$urls [] = ' http://www.baidu.com/s?wd=shili&pn= '. (($i-1) *10). ' &ie=utf-8 ';
}
$option [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1; rv:19.0) gecko/20100101 firefox/19.0 ';
$htmls = get_htmls ($urls, $option);
foreach ($htmls as $html) {
echo $html//The HTML can be processed here.
}

Simulate a common POST request:

Write a post.php file as follows:

Copy Code code as follows:

if (isset ($_post[' username ')) && isset ($_post[' password ')) {
Echo ' username is: '. $_post[' username '. ' The password is: '. $_post[' password '];
}else{
echo ' request Error! '
}

Then call the following:
Copy Code code as follows:

$url = ' http://localhost/yourpath/post.php '; this is your path.
$options = Array ();
For ($i =1 $i <=5; $i + +) {
$option [Curlopt_postfields] = ' username=user '. $i. ' &password=pass '. $i;
$options [] = $option;
}
$htmls = get_htmls ($url, $options, ' post ');
foreach ($htmls as $html) {
echo $html//The HTML can be processed here.
}

This get_htmls function can also basically achieve some data acquisition function.

Today to share to write the bad said not clear, please advise

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.