PHP Curl and rolling curl concurrent way comparison _php skill

Source: Internet
Author: User
Tags compact curl error handling php online usleep

In actual projects or their own small tools (such as news aggregation, commodity price monitoring, parity) in the process, usually need to get data from the 3rd party Web site or API interface, when the need to deal with 1 URL queues, in order to improve performance, you can use the curl provided curl_multi_* A family function implements simple concurrency.
In this paper, we will explore two specific implementation methods, and do a simple performance comparison for different methods.
1. Classical curl concurrency mechanism and its existing problems
The classic Curl implementation mechanism is easy to find online, such as the following implementation of the PHP online manual:

Copy Code code as follows:

function

Classic_curl ($urls,
$delay)
{

$queue

= Curl_multi_init ();

$map

= Array ();

Foreach

($urls

As
$url)
{

//
Create CURL Resources

$ch

= Curl_init ();

//
Set URL and other appropriate options

curl_setopt ($ch,
Curlopt_url, $url);

curl_setopt ($ch,
Curlopt_timeout, 1);

curl_setopt ($ch,
Curlopt_returntransfer, 1);

curl_setopt ($ch,
Curlopt_header, 0);

curl_setopt ($ch,
Curlopt_nosignal, True);

//
Add handle

Curl_multi_add_handle ($queue,
$CH);

$map [$url]
= $ch;

}

$active

= NULL;

//
Execute the Handles

Todo

{

$mrc

= Curl_multi_exec ($queue,
$active);

}
While

($MRC

= = Curlm_call_multi_perform);

While

($active

> 0 && $MRC

= = CURLM_OK) {

If

(Curl_multi_select ($queue,
0.5)!=-1) {

Todo

{

$mrc

= Curl_multi_exec ($queue,
$active);

}
While

($MRC

= = Curlm_call_multi_perform);

}

}

$responses

= Array ();

Foreach

($map

As
$url => $ch)
{

$responses [$url]
= Callback (Curl_multi_getcontent ($ch),
$delay);

Curl_multi_remove_handle ($queue,
$CH);

Curl_close ($ch);

}

Curl_multi_close ($queue);

Return

$responses;

}


First, all the URLs are pressed into the concurrent queue, and then the concurrent process is performed, waiting for subsequent processing of data parsing after all requests have been received. In the actual processing, affected by the network transmission, the content of some URLs will take precedence over the other URLs, but the classic curl concurrency must wait for the slowest URL to return before starting processing, waiting means the CPU idle and waste. If the URL queue is short, this kind of idle and wasteful is in an acceptable range, but if the queue is very long, this wait and waste will become unacceptable.
2. Improved rolling curl concurrency method
Careful analysis is not difficult to find the classic curl concurrency there are optimized space, the optimization of the way when a URL after a request to deal with it as quickly as possible, while processing while waiting for other URLs to return, rather than waiting for the slowest interface to return to start processing and so on, so as to avoid CPU idle and waste. Gossip is not much said, the following affixed to the specific implementation:
Copy Code code as follows:

function

Rolling_curl ($urls,
$delay)
{

$queue

= Curl_multi_init ();

$map

= Array ();

Foreach

($urls

As
$url)
{

$ch

= Curl_init ();

curl_setopt ($ch,
Curlopt_url, $url);

curl_setopt ($ch,
Curlopt_timeout, 1);

curl_setopt ($ch,
Curlopt_returntransfer, 1);

curl_setopt ($ch,
Curlopt_header, 0);

curl_setopt ($ch,
Curlopt_nosignal, True);

Curl_multi_add_handle ($queue,
$CH);

$map [(String)
$CH]
= $url;

}

$responses

= Array ();

Todo

{

While

(($code

= Curl_multi_exec ($queue,
$active))
= = Curlm_call_multi_perform);

If

($code

!= Curlm_ok) {break;
}

//
A request is just completed--find out which one

While

($done

= Curl_multi_info_read ($queue))
{

//
Get the info and content returned on the request

$info

= Curl_getinfo ($done [' handle ']);

$error

= Curl_error ($done [' handle ']);

$results

= Callback (Curl_multi_getcontent ($done [' Handle ']),
$delay);

$responses [$map [(String)
$done [' handle ']]]
= Compact (' Info ',
' Error ',
' Results ');

//
Remove the curl handle that just completed

Curl_multi_remove_handle ($queue,
$done [' handle ']);

Curl_close ($done [' handle ']);

}

//
Block for data in/output; Error handling is do by curl_multi_exec

If

($active

> 0) {

Curl_multi_select ($queue,
0.5);

}

}
While

($active);

Curl_multi_close ($queue);

Return

$responses;

}


3. Performance comparisons for two concurrent implementations
Performance comparison test before and after improvement on the Linux host, the concurrent queues used in the test are as follows:

http://a.com/item.htm?id=14392877692
http:/a.com/item.htm?id=16231676302
http://a.com/item.htm?id=5522416710
http://a.com/item.htm?id=16551116403
The principles of the experimental design and the format of the test results are briefly explained: In order to ensure the reliability of the results, each group of experiments repeated 20 times, in a single experiment, given the same set of interface URLs, respectively measuring classic (referring to the classical concurrency mechanism) and rolling (refers to the improved concurrency mechanism) The two concurrency mechanisms are time-consuming (in seconds), the short duration wins (Winner), and the calculated time (excellence, seconds), and performance scaling (Excel.%). In order to be as close to the real request as possible and keep the experiment simple, Only simple regular expression matches are done on the processing of the returned results, but no other complex operations are performed. In addition, in order to determine the effect of the result processing callback on the performance comparison test results, it is possible to use the Usleep simulation real-world data processing logic (such as extraction, participle, write file or database, etc.).
The callback functions that are used in the performance test are:

Copy Code code as follows:

function

Callback ($data,
$delay)
{

Preg_match_all ('/$data,
$matches);

Usleep ($delay);

Return

Compact (' Data ',
' matches ');

}


When data processing callback has no delay: rolling curl is slightly superior, but the performance improvement effect is not obvious.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.