PHP multi-thread batch collection and download images? The use of curl multithreading, in addition, curl can set the request time, in the event of a very slow url resource, you can decisively give up, so there is no blocking, and there are multi-threaded requests, the efficiency should be relatively high, refer: CURL learning and application [multithreading], let's test it again. core code: PHP multi-thread batch collection and download images
?
The use of curl multithreading, in addition, curl can set the request time, in the event of a very slow url resource, you can decisively give up, so there is no blocking, and there are multi-threaded requests, the efficiency should be relatively high, refer: CURL learning and application [multithreading]. let's test it again;
Core code:
?
/*** Curl multithreading ** @ param array $ array parallel URL * @ param int $ timeout time * @ return mix */public function Curl_http ($ array, $ timeout = '15') {$ res = array (); $ mh = curl_multi_init (); // create multiple curl syntax handles foreach ($ array as $ k => $ url) {$ conn [$ k] = curl_init ($ url ); // initialize curl_setopt ($ conn [$ k], CURLOPT_TIMEOUT, $ timeout); // Set the timeout value curl_setopt ($ conn [$ k], CURLOPT_USERAGENT, 'mozilla/5.0 (compatible; MSIE 5.01; Windows NT 5. 0) '); curl_setopt ($ conn [$ k], CURLOPT_MAXREDIRS, 7); // HTTp targeting level, 7 highest curl_setopt ($ conn [$ k], CURLOPT_HEADER, false); // no header here. add the block efficiency curl_setopt ($ conn [$ k], CURLOPT_FOLLOWLOCATION, 1); // 302 redirect curl_setopt ($ conn [$ k], CURLOPT_RETURNTRANSFER, 1); // The result must be a string and be output to the screen. curl_setopt ($ conn [$ k], CURLOPT_HTTPGET, true); curl_multi_add_handle ($ mh, $ conn [$ k]);} // prevent endless cycle cpu consumption. this section is based on the online statement do {$ mrc = curl_multi_e Xec ($ mh, $ active); // when no data exists, active = true} while ($ mrc = CURLM_CALL_MULTI_PERFORM ); // while ($ active and $ mrc = CURLM_ OK) while receiving data {// when there is no data or the request is paused, active = true if (curl_multi_select ($ mh )! =-1) {do {$ mrc = curl_multi_exec ($ mh, $ active);} while ($ mrc = CURLM_CALL_MULTI_PERFORM );}} foreach ($ array as $ k =>$ url) {if (! Curl_errno ($ conn [$ k]) {$ data [$ k] = curl_multi_getcontent ($ conn [$ k]); // convert data to array $ header [$ k] = curl_getinfo ($ conn [$ k]); // return http header information curl_close ($ conn [$ k]); // Close the language handle curl_multi_remove_handle ($ mh, $ conn [$ k]); // release resources} else {unset ($ k, $ url );}} curl_multi_close ($ mh); return $ data;} // receives $ callback =$ _ GET ['callback']; $ hrefs = $ _ GET ['hrefs']; $ urlarray = explode (',', trim ($ hrefs, ','); $ date = date ('ymmd', time ()); // Instantiate $ img = new HttpImg (); $ stime = $ img-> getMicrotime (); // Start Time $ data = $ img-> Curl_http ($ urlarray, '20'); // list data mkdir ('. /img /'. $ date, 0777); foreach (array) $ data as $ k => $ v) {preg_match_all ("/(href | src) = ([\ "| ']?) ([^ \ "'>] + \. (Jpg | png | PNG | JPG | gif) \ 2/I ", $ v, $ matches [$ k]); if (count ($ matches [$ k] [3])> 0) {$ dataimg = $ img-> Curl_http ($ matches [$ k] [3], '20'); // binary $ j = 0 for all image data; foreach (array) $ dataimg as $ kk => $ vv) {if ($ vv! = '') {$ Rand = rand (1000,9999); $ basename = time (). "_". $ rand. ". ". jpg; // Save the file as jpg $ fname = '. /img /'. $ date. "/". "$ basename"; file_put_contents ($ fname, $ vv); $ j ++; echo "create ". $ j. "Images ". "$ fname "."
";}Else {unset ($ kk, $ vv) ;}} else {unset ($ matches) ;}$ etime = $ img-> getMicrotime (); // end time echo "time in use ". ($ etime-$ stime ). "Seconds"; exit;
?
?
Test the effect
It takes about 337 seconds for 260 images to be collected within one second. In addition, the faster the image acquisition speed, the more obvious the image is.
Let's take a look at the file name: 10 images can be generated at the same time,
Due to the 20-second request time limit, some images are obviously incomplete after being generated, that is, the image resources cannot be fully collected within 20 seconds. you can set this time on your own.
?