As the company's car site backstage car content is mainly from the Autohome, editorial colleagues must be manually to the Autohome to add cars every day, it is too painful. So, in order to change this situation, as a development code farming, my task is coming ... That is to be prepared to do a function, as long as the corresponding Autohome URL to paste the data can be automatically populated in our backstage form, the basic fill has been implemented, but still not able to take the corresponding car album in.
The ability to capture pictures I have done before, but autohome most of the cars have a lot of pictures, at the beginning, I intend to use the previous method of collecting pictures, that is, using file_get_content to get the URL corresponding to the content, and then match to the image address, and then use File_get _content get the contents of these image URLs and load them locally, the code is as follows:
<?PHPHeader(' Content-type:text/html;charset=utf-8 ');Set_time_limit(0);classRuntime {var $StartTime= 0; var $StopTime= 0; functionGet_microtime () {List($usec,$sec) =Explode(‘ ‘,Microtime()); return((float)$usec+ (float)$sec); } functionstart () {$this->starttime =$this-Get_microtime (); } functionStop () {$this->stoptime =$this-Get_microtime (); } functionspent () {return round(($this->stoptime-$this->starttime) * 1000, 1); } } $runtime=Newruntime (); $runtime-start (); $url= ' http://car.autohome.com.cn/pic/series-s15306/289.html#pvareaid=102177 ';$rs=file_get_contents($url);//echo $rs; exit;Preg_match_all('/(\/pic\/series-s15306\/289-\d+\.html)/',$rs,$URLARR);$avalie=Array_unique($URLARR[0]);$count=Array();foreach($avalie as $key=$ul) { $pattern= '/; Preg_match_all($pattern,file_get_contents(' http://car.autohome.com.cn '.$ul),$IMGSRC); $count=Array_merge($count,$IMGSRC[1]);}foreach($count as $k=$v) { $data[$k] =file_get_contents($v);}foreach($data as $k=$v) { file_put_contents('./pic2/'). Time().‘ _‘.Rand(1, 10000). '. JPG ',$v);}$runtime-Stop (); Echo"Page Execution time:".$runtime->spent (). "MS";
The results found that this method less pictures good, more pictures, that is quite a card. Local testing is also more difficult to run, more than the time to go online. After Baidu, I used the method of curl to download pictures, after testing did improve, but the feeling is still a bit slow, if PHP has multiple threads that how good ...
And after a toss and find information, found that PHP Curl Library can still simulate multi-threaded, that is, the use of the Curl_multi_* series of functions, after rewriting, the code has become this:
<?PHPHeader(' Content-type:text/html;charset=utf-8 ');Set_time_limit(0);classRuntime {var $StartTime= 0; var $StopTime= 0; functionGet_microtime () {List($usec,$sec) =Explode(‘ ‘,Microtime()); return((float)$usec+ (float)$sec); } functionstart () {$this->starttime =$this-Get_microtime (); } functionStop () {$this->stoptime =$this-Get_microtime (); } functionspent () {return round(($this->stoptime-$this->starttime) * 1000, 1); } } $runtime=Newruntime (); $runtime-start (); $url= ' http://car.autohome.com.cn/pic/series-s15306/289.html#pvareaid=102177 ';$rs=file_get_contents($url);Preg_match_all('/(\/pic\/series-s15306\/289-\d+\.html)/',$rs,$URLARR);$avalie=Array_unique($URLARR[0]);$count=Array();foreach($avalie as $key=$ul) { $pattern= '/; Preg_match_all($pattern,file_get_contents(' http://car.autohome.com.cn '.$ul),$IMGSRC); $count=Array_merge($count,$IMGSRC[1]);}$handle=curl_multi_init ();foreach($count as $k=$v) { $curl[$k] = Curl_init ($v); curl_setopt ($curl[$k], Curlopt_returntransfer, 1); curl_setopt ($curl[$k], Curlopt_header, 0); curl_setopt ($curl[$k], Curlopt_timeout, 30); Curl_multi_add_handle ($handle,$curl[$k]);}$active=NULL; Do { $MRC= Curl_multi_exec ($handle,$active);} while($MRC==curlm_call_multi_perform); while($active&&$MRC==CURLM_OK) {//This sentence after the php5.3 version is critical, because there is no such sentence, maybe Curl_multi_select will return forever-1, so it will die in the loop forever. while(Curl_multi_exec ($handle,$active) ===curlm_call_multi_perform); if(Curl_multi_select ($handle)! =-1) { Do { $MRC= Curl_multi_exec ($handle,$active); } while($MRC==curlm_call_multi_perform); }}foreach($curl as $k=$v) { if(Curl_error ($curl[$k]) == "") { $data[$k] = Curl_multi_getcontent ($curl[$k]); } curl_multi_remove_handle ($handle,$curl[$k]); Curl_close ($curl[$k]);}foreach($data as $k=$v) { $file= Time().‘ _‘.Rand(1000, 9999). '. jpg; file_put_contents('./pic3/').$file,$v); }curl_multi_close ($handle);$runtime-Stop (); Echo"Page Execution time:".$runtime->spent (). "MS";
Well, multi-threaded collection is really very sour, and then through a series of tests and comparisons, 5 tests, curl multithreading has 4 times is faster than file_get_content, and time is file_get_content of three times, summed up, In the future, the collection will try to use this method to improve efficiency.
Performance comparison of using File_get_content series functions and using Curl series functions to capture pictures