PHP curl concurrency Best Practices Code sharing

Last Update:2018-12-08 Source: Internet

Author: User

Tags php online usleep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article will discuss two specific implementation methods and compare the performance of different methods.

1. Typical cURL concurrency mechanism and Its Problems

The classic cURL implementation mechanism can be easily found online. For example, refer to the PHP online manual for the following implementation methods:

Copy codeThe Code is as follows: function classic_curl ($ urls, $ delay ){
$ Queue = curl_multi_init ();
$ Map = array ();

Foreach ($ urls as $ url ){
// Create cURL resources
$ Ch = curl_init ();

// Set URL and other appropriate options
Curl_setopt ($ ch, CURLOPT_URL, $ url );

Curl_setopt ($ ch, CURLOPT_TIMEOUT, 1 );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_NOSIGNAL, true );

// Add handle
Curl_multi_add_handle ($ queue, $ ch );
$ Map [$ url] = $ ch;
}

$ Active = null;

// Execute the handles
Do {
$ Mrc = curl_multi_exec ($ queue, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );

While ($ active> 0 & $ mrc = CURLM_ OK ){
If (curl_multi_select ($ queue, 0.5 )! =-1 ){
Do {
$ Mrc = curl_multi_exec ($ queue, $ active );
} While ($ mrc = CURLM_CALL_MULTI_PERFORM );
}
}

$ Responses = array ();
Foreach ($ map as $ url => $ ch ){
$ Responses [$ url] = callback (curl_multi_getcontent ($ ch), $ delay );
Curl_multi_remove_handle ($ queue, $ ch );
Curl_close ($ ch );
}

Curl_multi_close ($ queue );
Return $ responses;
}

First, all the URLs are pushed into the concurrent queue, and then the concurrent process is executed. After all the requests are received, the data will be parsed and processed. in actual processing, due to the impact of network transmission, the content of some URLs will take precedence over those returned by other URLs, but the classic cURL concurrency must wait for the slowest URL to return before processing, waiting means that the CPU is idle and wasted. if the URL queue is short, this idle and waste is still in the acceptable range, but if the queue is long, this waiting and waste will become unacceptable.

2. Improved Rolling cURL concurrency

After careful analysis, it is not difficult to find that there is still room for optimization for the concurrency of the classic cURL. In the optimization mode, when a URL request is completed, process it as quickly as possible and wait for other URLs to return, instead of waiting for the slowest interface to return to start processing and other work, so as to avoid idle CPU and waste. I will not talk much about it. The specific implementation is shown below:Copy codeThe Code is as follows: function rolling_curl ($ urls, $ delay ){
$ Queue = curl_multi_init ();
$ Map = array ();

Foreach ($ urls as $ url ){
$ Ch = curl_init ();

Curl_setopt ($ ch, CURLOPT_URL, $ url );
Curl_setopt ($ ch, CURLOPT_TIMEOUT, 1 );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_NOSIGNAL, true );

Curl_multi_add_handle ($ queue, $ ch );
$ Map [(string) $ ch] = $ url;
}

$ Responses = array ();
Do {
While ($ code = curl_multi_exec ($ queue, $ active) = CURLM_CALL_MULTI_PERFORM );

If ($ code! = CURLM_ OK) {break ;}

// A request was just completed -- find out which one
While ($ done = curl_multi_info_read ($ queue )){

// Get the info and content returned on the request
$ Info = curl_getinfo ($ done ['handle']);
$ Error = curl_error ($ done ['handle']);
$ Results = callback (curl_multi_getcontent ($ done ['handle']), $ delay );
$ Responses [$ map [(string) $ done ['handle'] = compact ('info', 'error', 'results ');

// Remove the curl handle that just completed
Curl_multi_remove_handle ($ queue, $ done ['handle']);
Curl_close ($ done ['handle']);
}

// Block for data in/output; error handling is done by curl_multi_exec
If ($ active> 0 ){
Curl_multi_select ($ queue, 0.5 );
}

} While ($ active );

Curl_multi_close ($ queue );
Return $ responses;
}

3. Performance Comparison of two concurrent implementations

The performance comparison test before and after improvement is performed on the LINUX host. The concurrent queue used during the test is as follows:

Http://item.taobao.com/item.htm? Id = 14392877692
Http://item.taobao.com/item.htm? Id = 16231676302
Http://item.taobao.com/item.htm? Id = 17037160462
Http://item.taobao.com/item.htm? Id = 5522416710
Http://item.taobao.com/item.htm? Id = 16551116403
Http://item.taobao.com/item.htm? Id = 14088310973

Briefly describe the principles of the experiment design and the format of the performance test results: to ensure the reliability of the results, each group of experiments repeats 20 times, in a single experiment, the same interface URL set is given, measure the time consumption (in seconds) of the two concurrency mechanisms, Classic (the Classic concurrency mechanism) and Rolling (the improved concurrency mechanism), respectively ), calculate the time saved (Excellence, in seconds) and the performance improvement ratio (Excel. % ). in order to keep the actual request as close as possible while keeping the experiment simple, only a simple regular expression matching is performed on the processing of the returned results, without other complicated operations. in addition, to determine the impact of result processing callback on performance comparison test results, you can use usleep to simulate the data processing logic (such as extraction, word segmentation, writing files or databases) that is more appropriate in reality ).

The callback functions used in performance testing are:Copy codeThe Code is as follows: function callback ($ data, $ delay ){
Preg_match_all ('/Usleep ($ delay );
Return compact ('data', 'matches ');
}

When data processing callback is not delayed: Rolling Curl is slightly better, but the performance improvement is not obvious.
Data processing callback latency of 5 ms: Rolling Curl wins, performance is improved by about 40%.
Based on the above performance comparison, Rolling cURL should be a more choice in the Application Scenario for processing URL queue concurrency, when the concurrency is very large (1000 +, you can control the maximum length of a concurrent queue, for example, 20. Add one unrequested URL to the queue immediately after one URL is returned and processed, in this way, the written code will be more robust and won't be stuck or crashed due to the large number of concurrent jobs. for detailed implementation, see: http://code.google.com/p/rolling-curl/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More