I wrote a simple curl collection here, but after execution I found that single-threaded execution was much faster than multithreaded execution.
Is it my writing and the question?
$images = [ "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415151923_73502.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415151826_52170.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415152035_59698.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140507/20140507143708_26688.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140417/20140417095153_61993.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140426/20140426094716_96396.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20130730/20130730160625_21437.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20130731/20130731170502_90104.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20130731/20130731165147_80414.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415151923_73502.jpg",];
This is a single-threaded function:
function getimg ($url = "", $filename = "") {$ch = Curl_init (); $opt [Curlopt_url] = $url; $opt [Curlopt_header] = true; $opt [Curlopt_connecttimeout] = 10; $opt [Curlopt_timeout] = 60; $opt [Curlopt_autoreferer] = true; $opt [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1) applewebkit/536.11 (khtml, like Gecko) chrome/20.0.1132.47 Safari /536.11 '; $opt [Curlopt_returntransfer] = true; $opt [Curlopt_followlocation] = true; Follow redirect//$opt [curlopt_maxredirs] = 10; Curl_setopt_array ($ch, $opt); $r = curl_exec ($ch); if (false = = = $r) {$errno = Curl_errno ($ch); $err = Curl_error ($ch); Curl_close ($ch); return false; }//Identify header:200 before writing to file $header = explode ("\r\n\r\n", $r); if (Strpos ($header [0], ' http/1.1 ') = = = 0) {file_put_contents ($filename, $header [1]); } curl_close ($ch); return true;}
The
also tries to use the Curl_multi series function, but the manual that looks directly does not fully understand:
Multi-threaded Acquisition data function Getimgmulti ($url =[], $filename =[]) {//Create batch curl Handle $MH = Curl_multi_init (); Here you can add n=10 thread foreach ($url as $k + = $v) {$ch [$k] = Curl_init (); $opt [Curlopt_url] = $v; $opt [Curlopt_header] = true; $opt [Curlopt_connecttimeout] = 10; $opt [Curlopt_timeout] = 60; $opt [Curlopt_autoreferer] = true; $opt [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1) applewebkit/536.11 (khtml, like Gecko) chrome/20.0.1132.47 Safari /536.11 '; $opt [Curlopt_returntransfer] = true; $opt [Curlopt_followlocation] = true; Follow redirect//$opt [curlopt_maxredirs] = 10; Curl_setopt_array ($ch [$k], $opt); Add 1 Handles Curl_multi_add_handle ($MH, $ch [$k]); } $running =null; Execute the batch handle do {curl_multi_exec ($MH, $running); } while ($running > 0); for ($i =0; $i <, $i + +) {$r = Curl_multi_getcontent ($ch [$i]); Identify the header:200 before writing the file $header =Explode ("\r\n\r\n", $r); if (Strpos ($header [0], ' http/1.1 ') = = = 0) {file_put_contents (' pics/'. $i. jpg ', $header [1]); }}//Close all handles//Curl_multi_remove_handle ($MH, $ch 1); Curl_multi_close ($MH);}
The execution results, the loop executes a single thread about 1.7 seconds to complete, after this 3.5 seconds to complete.
Perhaps my use of this function is not clear, please explain why?
----Follow-up supplement----
I test on windows, is it because of the win PHP multi-threaded different problems?
Also refer to other people's written PHP class
Http://blog.eiodesign.com/archives/86
Using this kind of library to do the collection again, the result is the same, more slowly
// 测试库采集require("libs/class_curl_multi.php");$mp = new MultiHttpRequest();//远程图片本地化$mp->set_urls($images);$images_result = $mp->start();foreach ((array)$images_result as $image_key => $image_value) { if (!empty($image_key)) { _flush("store image:".$image_key."
"); file_put_contents('pics/'.$image_key.'.jpg',$image_value); }}
spents 4.05 seconds
Is it because I have a problem with the multi-threaded understanding of this PHP, or is this the difference caused by other reasons? Seemingly multi-threaded does not improve the acquisition efficiency. Instead, it affected.
Reply content:
I wrote a simple curl collection here, but after execution I found that single-threaded execution was much faster than multithreaded execution.
Is it my writing and the question?
$images = [ "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415151923_73502.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415151826_52170.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415152035_59698.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140507/20140507143708_26688.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140417/20140417095153_61993.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140426/20140426094716_96396.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20130730/20130730160625_21437.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20130731/20130731170502_90104.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20130731/20130731165147_80414.jpg", "http://pic.91taojin.com.cn/data/attachment/image/20140415/20140415151923_73502.jpg",];
This is a single-threaded function:
function getimg ($url = "", $filename = "") {$ch = Curl_init (); $opt [Curlopt_url] = $url; $opt [Curlopt_header] = true; $opt [Curlopt_connecttimeout] = 10; $opt [Curlopt_timeout] = 60; $opt [Curlopt_autoreferer] = true; $opt [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1) applewebkit/536.11 (khtml, like Gecko) chrome/20.0.1132.47 Safari /536.11 '; $opt [Curlopt_returntransfer] = true; $opt [Curlopt_followlocation] = true; Follow redirect//$opt [curlopt_maxredirs] = 10; Curl_setopt_array ($ch, $opt); $r = curl_exec ($ch); if (false = = = $r) {$errno = Curl_errno ($ch); $err = Curl_error ($ch); Curl_close ($ch); return false; }//Identify header:200 before writing to file $header = explode ("\r\n\r\n", $r); if (Strpos ($header [0], ' http/1.1 ') = = = 0) {file_put_contents ($filename, $header [1]); } curl_close ($ch); return true;}
The
also tries to use the Curl_multi series function, but the manual that looks directly does not fully understand:
Multi-threaded Acquisition data function Getimgmulti ($url =[], $filename =[]) {//Create batch curl Handle $MH = Curl_multi_init (); Here you can add n=10 thread foreach ($url as $k + = $v) {$ch [$k] = Curl_init (); $opt [Curlopt_url] = $v; $opt [Curlopt_header] = true; $opt [Curlopt_connecttimeout] = 10; $opt [Curlopt_timeout] = 60; $opt [Curlopt_autoreferer] = true; $opt [Curlopt_useragent] = ' mozilla/5.0 (Windows NT 6.1) applewebkit/536.11 (khtml, like Gecko) chrome/20.0.1132.47 Safari /536.11 '; $opt [Curlopt_returntransfer] = true; $opt [Curlopt_followlocation] = true; Follow redirect//$opt [curlopt_maxredirs] = 10; Curl_setopt_array ($ch [$k], $opt); Add 1 Handles Curl_multi_add_handle ($MH, $ch [$k]); } $running =null; Execute the batch handle do {curl_multi_exec ($MH, $running); } while ($running > 0); for ($i =0; $i <, $i + +) {$r = Curl_multi_getcontent ($ch [$i]); Identify the header:200 before writing the file $header =Explode ("\r\n\r\n", $r); if (Strpos ($header [0], ' http/1.1 ') = = = 0) {file_put_contents (' pics/'. $i. jpg ', $header [1]); }}//Close all handles//Curl_multi_remove_handle ($MH, $ch 1); Curl_multi_close ($MH);}
The execution results, the loop executes a single thread about 1.7 seconds to complete, after this 3.5 seconds to complete.
Perhaps my use of this function is not clear, please explain why?
----Follow-up supplement----
I test on windows, is it because of the win PHP multi-threaded different problems?
Also refer to other people's written PHP class
Http://blog.eiodesign.com/archives/86
Using this kind of library to do the collection again, the result is the same, more slowly
// 测试库采集require("libs/class_curl_multi.php");$mp = new MultiHttpRequest();//远程图片本地化$mp->set_urls($images);$images_result = $mp->start();foreach ((array)$images_result as $image_key => $image_value) { if (!empty($image_key)) { _flush("store image:".$image_key."
"); file_put_contents('pics/'.$image_key.'.jpg',$image_value); }}
spents 4.05 seconds
Is it because I have a problem with the multi-threaded understanding of this PHP, or is this the difference caused by other reasons? Seemingly multi-threaded does not improve the acquisition efficiency. Instead, it affected.