PHP uses curl to implement multi-threaded crawl Web pages, Phpcurl multi-threaded crawl
PHP uses Curl Functions can complete a variety of transfer file operations, such as Analog browser to send get,post requests, etc., limited by the PHP language itself does not support multi-threading, so the development of the crawler efficiency is not high, this time often need to use Curl Multi Functions It can implement concurrent multi-threaded access to multiple URL addresses. Since Curl Multi function is so powerful, can you use Curl Multi Functions to write concurrent multi-threaded download files, of course, the following gives my code:
Code Listing 1: Writing the obtained code directly to a file
<?php $urls = Array ( ' http://www.sina.com.cn/', ' http://www.sohu.com/', ' http://www.163.com/'); Set the URL of the page to crawl $save _to= '/test.txt '; Write the crawled code to the file $st = fopen ($save _to, "a"); $MH = Curl_multi_init (); foreach ($urls as $i = = $url) { $conn [$i] = Curl_init ($url); curl_setopt ($conn [$i], Curlopt_useragent, "mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0) "); curl_setopt ($conn [$i], Curlopt_header, 0); curl_setopt ($conn [$i], curlopt_connecttimeout,60); curl_setopt ($conn [$i], Curlopt_file, $st); Sets the crawled code to write to the file curl_multi_add_handle ($MH, $conn [$i]); }//Initialize do { curl_multi_exec ($MH, $ Active); } while ($active); Executes a foreach ($urls as $i = = $url) { curl_multi_remove_handle ($mh, $conn [$i]); Curl_close ($conn [$i]); } End Cleanup curl_multi_close ($MH);
Code Listing 2: Putting the obtained code into a variable before writing to a file
<?php $urls = Array (' http://www.sina.com.cn/', ' http://www.sohu.com/', ' http://www.163.com/') ; $save _to= '/test.txt '; Write the crawled code to the file $st = fopen ($save _to, "a"); $MH = Curl_multi_init (); foreach ($urls as $i = = $url) {$conn [$i] = Curl_init ($url); curl_setopt ($conn [$i], Curlopt_useragent, "mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0) "); curl_setopt ($conn [$i], Curlopt_header, 0); curl_setopt ($conn [$i], curlopt_connecttimeout,60); curl_setopt ($conn [$i],curlopt_returntransfer,true); The settings do not write the crawl substitution code to the browser, but instead convert to a string Curl_multi_add_handle ($MH, $conn [$i]); } do {curl_multi_exec ($MH, $active); } while ($active); foreach ($urls as $i = + $url) {$data = Curl_multi_getcontent ($conn [$i]);//Get Crawled Code string fwrite ($st, $data);//write String into the file. Of course, you can also not write to the file, such as database}//Get data variables, and write to the file foreach ($urls as $i + $url) {Curl_multi_remove_handle ($MH, $conn [$i]); Curl_close ($conn [$i]); } curl_multi_close ($MH); Fclose ($st); ?>
The above mentioned is the whole content of this article, I hope you can like.
http://www.bkjia.com/PHPjc/992545.html www.bkjia.com true http://www.bkjia.com/PHPjc/992545.html techarticle PHP using Curl to implement multi-threaded crawl Web pages, Phpcurl multi-threaded crawl PHP using Curl Functions can complete a variety of transfer file operations, such as Analog browser send Get,post request, etc. ..