PhpcURL and RollingcURL concurrency method comparison _ PHP Tutorial

Source: Internet
Author: User
Tags php online usleep
The phpcURL and RollingcURL concurrency methods are compared. In actual projects or self-compiled gadgets (such as news aggregation, product price monitoring, and price comparison), data is usually obtained from the 3rd-party website or API, when you need to process one UR in a real project or write your own gadgets (such as news aggregation, product price monitoring, price comparison), you usually need to obtain data from the 3rd-party website or API interface, to improve the performance of a URL queue, you can use the curl_multi _ * family function provided by cURL to implement simple concurrency.
This article will discuss two specific implementation methods and compare the performance of different methods.
1. typical cURL concurrency mechanism and its problems
The Classic cURL implementation mechanism can be easily found online. for example, refer to the PHP online manual for the following implementation methods:

The code is as follows:


Function

Classic_curl ($ urls,
$ Delay)
{

$ Queue

= Curl_multi_init ();

$ Map

= Array ();

Foreach

($ Urls

As
$ Url)
{

//
Create cURL resources

$ Ch

= Curl_init ();

//
Set URL and other appropriate options

Curl_setopt ($ ch,
CURLOPT_URL, $ url );

Curl_setopt ($ ch,
CURLOPT_TIMEOUT, 1 );

Curl_setopt ($ ch,
CURLOPT_RETURNTRANSFER, 1 );

Curl_setopt ($ ch,
CURLOPT_HEADER, 0 );

Curl_setopt ($ ch,
CURLOPT_NOSIGNAL, true );

//
Add handle

Curl_multi_add_handle ($ queue,
$ Ch );

$ Map [$ url]
= $ Ch;

}

$ Active

= Null;

//
Execute the handles

Do

{

$ Mrc

= Curl_multi_exec ($ queue,
$ Active );

}
While

($ Mrc

= CURLM_CALL_MULTI_PERFORM );

While

($ Active

> 0 & $ mrc

= CURLM_ OK ){

If

(Curl_multi_select ($ queue,
0.5 )! =-1 ){

Do

{

$ Mrc

= Curl_multi_exec ($ queue,
$ Active );

}
While

($ Mrc

= CURLM_CALL_MULTI_PERFORM );

}

}

$ Responses

= Array ();

Foreach

($ Map

As
$ Url => $ ch)
{

$ Responses [$ url]
= Callback (curl_multi_getcontent ($ ch ),
$ Delay );

Curl_multi_remove_handle ($ queue,
$ Ch );

Curl_close ($ ch );

}

Curl_multi_close ($ queue );

Return

$ Responses;

}


First, all the URLs are pushed into the concurrent queue, and then the concurrent process is executed. after all the requests are received, the data will be parsed and processed. in actual processing, due to the impact of network transmission, the content of some URLs will take precedence over those returned by other URLs, but the classic cURL concurrency must wait for the slowest URL to return before processing, waiting means that the CPU is idle and wasted. if the URL queue is short, this idle and waste is still in the acceptable range, but if the queue is long, this waiting and waste will become unacceptable.
2. improved Rolling cURL concurrency
After careful analysis, it is not difficult to find that there is still room for optimization for the concurrency of the classic cURL. in the optimization mode, when a URL request is completed, process it as quickly as possible and wait for other URLs to return, instead of waiting for the slowest interface to return to start processing and other work, so as to avoid idle CPU and waste. I will not talk much about it. the specific implementation is shown below:

The code is as follows:


Function

Rolling_curl ($ urls,
$ Delay)
{

$ Queue

= Curl_multi_init ();

$ Map

= Array ();

Foreach

($ Urls

As
$ Url)
{

$ Ch

= Curl_init ();

Curl_setopt ($ ch,
CURLOPT_URL, $ url );

Curl_setopt ($ ch,
CURLOPT_TIMEOUT, 1 );

Curl_setopt ($ ch,
CURLOPT_RETURNTRANSFER, 1 );

Curl_setopt ($ ch,
CURLOPT_HEADER, 0 );

Curl_setopt ($ ch,
CURLOPT_NOSIGNAL, true );

Curl_multi_add_handle ($ queue,
$ Ch );

$ Map [(string)
$ Ch]
= $ Url;

}

$ Responses

= Array ();

Do

{

While

($ Code

= Curl_multi_exec ($ queue,
$ Active ))
= CURLM_CALL_MULTI_PERFORM );

If

($ Code

! = CURLM_ OK) {break;
}

//
A request was just completed -- find out which one

While

($ Done

= Curl_multi_info_read ($ queue ))
{

//
Get the info and content returned on the request

$ Info

= Curl_getinfo ($ done ['handle']);

$ Error

= Curl_error ($ done ['handle']);

$ Results

= Callback (curl_multi_getcontent ($ done ['handle']),
$ Delay );

$ Responses [$ map [(string)
$ Done ['handle']
= Compact ('info ',
'Error ',
'Results ');

//
Remove the curl handle that just completed

Curl_multi_remove_handle ($ queue,
$ Done ['handle']);

Curl_close ($ done ['handle']);

}

//
Block for data in/output; error handling is done by curl_multi_exec

If

($ Active

> 0 ){

Curl_multi_select ($ queue,
0.5 );

}

}
While

($ Active );

Curl_multi_close ($ queue );

Return

$ Responses;

}


3. Performance comparison of two concurrent implementations
The performance comparison test before and after improvement is performed on the LINUX host. The Concurrent Queue used during the test is as follows:

Http://a.com/item.htm? Id = 14392877692
Http:/a.com/item.htm? Id = 16231676302
Http://a.com/item.htm? Id = 5522416710
Http://a.com/item.htm? Id = 16551116403
Briefly describe the principles of the experiment design and the format of the performance test results: to ensure the reliability of the results, each group of experiments repeats 20 times, in a single experiment, the same interface URL set is given, measure the time consumption (in seconds) of the two concurrency mechanisms, Classic (the Classic concurrency mechanism) and Rolling (the improved concurrency mechanism), respectively ), calculate the time saved (Excellence, in seconds) and the performance improvement ratio (Excel. % ). in order to keep the actual request as close as possible while keeping the experiment simple, only a simple regular expression matching is performed on the processing of the returned results, without other complicated operations. in addition, to determine the impact of result processing callback on performance comparison test results, you can use usleep to simulate the data processing logic (such as extraction, word segmentation, writing files or databases) that is more appropriate in reality ).
The callback functions used in performance testing are:

The code is as follows:


Function

Callback ($ data,
$ Delay)
{

Preg_match_all ('/(. +) <\/h3>/iU ',
$ Data,
$ Matches );

Usleep ($ delay );

Return

Compact ('data ',
'Matches ');

}


When data processing callback is not delayed: Rolling Curl is slightly better, but the performance improvement is not obvious.

During the renewal (such as news aggregation, product price monitoring, and price comparison) process, you usually need to obtain data from a 3rd-party website or an API. you need to process 1 UR...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.