PhpcURL and RollingcURL concurrency method comparison

PhpcURL and RollingcURL concurrency method comparison _ PHP Tutorial

Last Update:2017-05-13 Source: Internet

Author: User

Tags php online usleep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The phpcURL and RollingcURL concurrency methods are compared. In actual projects or self-compiled gadgets (such as news aggregation, product price monitoring, and price comparison), data is usually obtained from the 3rd-party website or API, when you need to process one UR in a real project or write your own gadgets (such as news aggregation, product price monitoring, price comparison), you usually need to obtain data from the 3rd-party website or API interface, to improve the performance of a URL queue, you can use the curl_multi _ * family function provided by cURL to implement simple concurrency.
This article will discuss two specific implementation methods and compare the performance of different methods.
1. typical cURL concurrency mechanism and its problems
The Classic cURL implementation mechanism can be easily found online. for example, refer to the PHP online manual for the following implementation methods:

The code is as follows:

Function

Classic_curl ($ urls,
$ Delay)
{

$ Queue

= Curl_multi_init ();

$ Map

= Array ();

Foreach

($ Urls

As
$ Url)
{

//
Create cURL resources

$ Ch

= Curl_init ();

//
Set URL and other appropriate options

Curl_setopt ($ ch,
CURLOPT_URL, $ url );

Curl_setopt ($ ch,
CURLOPT_TIMEOUT, 1 );

Curl_setopt ($ ch,
CURLOPT_RETURNTRANSFER, 1 );

Curl_setopt ($ ch,
CURLOPT_HEADER, 0 );

Curl_setopt ($ ch,
CURLOPT_NOSIGNAL, true );

//
Add handle

Curl_multi_add_handle ($ queue,
$ Ch );

$ Map [$ url]
= $ Ch;

}

$ Active

= Null;

//
Execute the handles

{

$ Mrc

= Curl_multi_exec ($ queue,
$ Active );

}
While

($ Mrc

= CURLM_CALL_MULTI_PERFORM );

While

($ Active

> 0 & $ mrc

= CURLM_ OK ){

(Curl_multi_select ($ queue,
0.5 )! =-1 ){

{

$ Mrc

= Curl_multi_exec ($ queue,
$ Active );

}
While

($ Mrc

= CURLM_CALL_MULTI_PERFORM );

}

$ Responses

= Array ();

Foreach

($ Map

As
$ Url => $ ch)
{

$ Responses [$ url]
= Callback (curl_multi_getcontent ($ ch ),
$ Delay );

Curl_multi_remove_handle ($ queue,
$ Ch );

Curl_close ($ ch );

}

Curl_multi_close ($ queue );

Return

$ Responses;

}

First, all the URLs are pushed into the concurrent queue, and then the concurrent process is executed. after all the requests are received, the data will be parsed and processed. in actual processing, due to the impact of network transmission, the content of some URLs will take precedence over those returned by other URLs, but the classic cURL concurrency must wait for the slowest URL to return before processing, waiting means that the CPU is idle and wasted. if the URL queue is short, this idle and waste is still in the acceptable range, but if the queue is long, this waiting and waste will become unacceptable.
2. improved Rolling cURL concurrency
After careful analysis, it is not difficult to find that there is still room for optimization for the concurrency of the classic cURL. in the optimization mode, when a URL request is completed, process it as quickly as possible and wait for other URLs to return, instead of waiting for the slowest interface to return to start processing and other work, so as to avoid idle CPU and waste. I will not talk much about it. the specific implementation is shown below:

The code is as follows:

Function

Rolling_curl ($ urls,
$ Delay)
{

$ Queue

= Curl_multi_init ();

$ Map

= Array ();

Foreach

($ Urls

As
$ Url)
{

$ Ch

= Curl_init ();

Curl_setopt ($ ch,
CURLOPT_URL, $ url );

Curl_setopt ($ ch,
CURLOPT_TIMEOUT, 1 );

Curl_setopt ($ ch,
CURLOPT_RETURNTRANSFER, 1 );

Curl_setopt ($ ch,
CURLOPT_HEADER, 0 );

Curl_setopt ($ ch,
CURLOPT_NOSIGNAL, true );

Curl_multi_add_handle ($ queue,
$ Ch );

$ Map [(string)
$ Ch]
= $ Url;

}

$ Responses

= Array ();

{

While

($ Code

= Curl_multi_exec ($ queue,
$ Active ))
= CURLM_CALL_MULTI_PERFORM );

($ Code

! = CURLM_ OK) {break;
}

//
A request was just completed -- find out which one

While

($ Done

= Curl_multi_info_read ($ queue ))
{

//
Get the info and content returned on the request

$ Info

= Curl_getinfo ($ done ['handle']);

$ Error

= Curl_error ($ done ['handle']);

$ Results

= Callback (curl_multi_getcontent ($ done ['handle']),
$ Delay );

$ Responses [$ map [(string)
$ Done ['handle']
= Compact ('info ',
'Error ',
'Results ');

//
Remove the curl handle that just completed

Curl_multi_remove_handle ($ queue,
$ Done ['handle']);

Curl_close ($ done ['handle']);

}

//
Block for data in/output; error handling is done by curl_multi_exec

($ Active

> 0 ){

Curl_multi_select ($ queue,
0.5 );

}

}
While

($ Active );

Curl_multi_close ($ queue );

Return

$ Responses;

}

3. Performance comparison of two concurrent implementations
The performance comparison test before and after improvement is performed on the LINUX host. The Concurrent Queue used during the test is as follows:

Http://a.com/item.htm? Id = 14392877692
Http:/a.com/item.htm? Id = 16231676302
Http://a.com/item.htm? Id = 5522416710
Http://a.com/item.htm? Id = 16551116403
Briefly describe the principles of the experiment design and the format of the performance test results: to ensure the reliability of the results, each group of experiments repeats 20 times, in a single experiment, the same interface URL set is given, measure the time consumption (in seconds) of the two concurrency mechanisms, Classic (the Classic concurrency mechanism) and Rolling (the improved concurrency mechanism), respectively ), calculate the time saved (Excellence, in seconds) and the performance improvement ratio (Excel. % ). in order to keep the actual request as close as possible while keeping the experiment simple, only a simple regular expression matching is performed on the processing of the returned results, without other complicated operations. in addition, to determine the impact of result processing callback on performance comparison test results, you can use usleep to simulate the data processing logic (such as extraction, word segmentation, writing files or databases) that is more appropriate in reality ).
The callback functions used in performance testing are:

The code is as follows:

Function

Callback ($ data,
$ Delay)
{

Preg_match_all ('/(. +) <\/h3>/iU ',
$ Data,
$ Matches );

Usleep ($ delay );

Return

Compact ('data ',
'Matches ');

}

When data processing callback is not delayed: Rolling Curl is slightly better, but the performance improvement is not obvious.

During the renewal (such as news aggregation, product price monitoring, and price comparison) process, you usually need to obtain data from a 3rd-party website or an API. you need to process 1 UR...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More