PHP uses Curl Functions can complete a variety of transfer file operations, such as Analog browser to send get,post requests, etc., limited by the PHP language itself does not support multi-threading, so the development of the crawler efficiency is not high, this time often need to use Curl Multi Functions It can implement concurrent multi-threaded access to multiple URL addresses. Since Curl Multi function is so powerful, can you use Curl Multi Functions to write concurrent multi-threaded download files, of course, the following gives my code:
<?PHP$urls=Array( ' http://www.sina.com.cn/', ' http://www.sohu.com/', ' http://www.163.com/' ); //set the URL of the page to crawl $save _to= '/test.txt ';//write the crawled code to the file $st=fopen($save _to, "a"); $MH=Curl_multi_init (); foreach($urls as $i=$url) { $conn[$i] = Curl_init ($url); curl_setopt ($conn[$i], Curlopt_useragent, "mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0) "); curl_setopt ($conn[$i], Curlopt_header, 0); curl_setopt ($conn[$i], curlopt_connecttimeout,60); curl_setopt ($conn[$i], Curlopt_file,$st);//set the crawled code to write to the fileCurl_multi_add_handle ($MH,$conn[$i]); } //Initialize Do{curl_multi_exec ($MH,$active); } while($active);//Execution foreach($urls as $i=$url) {Curl_multi_remove_handle ($MH,$conn[$i]); Curl_close ($conn[$i]); } //End CleanupCurl_multi_close ($MH); fclose($st);?>
Simple use of PHP multi-threaded crawling web pages