Use PHP's curl to implement concurrent requests for remote files (crawl remote Web pages)

Source: Internet
Author: User
Tags curl

PHP's Curl function is really powerful. There is a curl_multi_init function, that is, batch processing tasks. This can be used to achieve multi-process synchronization to fetch multiple records, optimize the normal web crawler.

A simple fetch function:

functionHttp_get_multi ($urls){    $count=Count($urls); $data= []; $chs= []; //Create a batch curl handle    $MH=Curl_multi_init (); //Create a Curl resource     for($i= 0;$i<$count;$i++){        $chs[$i] =Curl_init (); //set the URL and the appropriate optionscurl_setopt ($chs[$i], Curlopt_returntransfer, 1);//return don ' t printcurl_setopt ($chs[$i], Curlopt_url,$urls[$i]); curl_setopt ($chs[$i], Curlopt_header, 0); Curl_multi_add_handle ($MH,$chs[$i ]); }    //add handle//for ($i = 0; $i < $count; $i + +) {//Curl_multi_add_handle ($MH, $chs [$i]); }//Execute batch handle     Do {        $MRC= Curl_multi_exec ($MH,$active); }  while($active> 0);  while($activeand$MRC==CURLM_OK) {        if(Curl_multi_select ($MH)! =-1) {             Do {                $MRC= Curl_multi_exec ($MH,$active); }  while($MRC==curlm_call_multi_perform); }    }     for($i= 0;$i<$count;$i++){        $content= Curl_multi_getcontent ($chs[$i ]); $data[$i] = (Curl_errno ($chs[$i]) = = 0)?$content:false; }    //Close all handles     for($i= 0;$i<$count;$i++) {Curl_multi_remove_handle ($MH,$chs[$i ]); } curl_multi_close ($MH); return $data;}

The following call test (get () function as here: http://www.cnblogs.com/whatmiss/p/7114954.html):

Get a URL for a lot of pages
$url= [ ' http://www.baidu.com ', ' http://www.163.com ', ' http://www.sina.com.cn ', ' http://www.qq.com ', ' Http://www.sohu. com ', ' http://www.douban.com ', ' http://www.cnblogs.com ', ' http://www.taobao.com ', ' http://www.php.net ',];$urls= []; for($i= 0;$i< 10;$i++){ foreach($url as $r) $urls[] =$r. '/?v= '.Rand();}

Concurrent Requests$datas= Http_get_multi ($urls); foreach($datas as $key=$data){ file_put_contents(' Log/multi_ '.$key. '. txt ',$data); Record the request result. Remember to create a log folder}$t 2=Microtime(true);Echo $t 2-$t 1;Echo' <br/> ';

Synchronous request, get () function as here: http://www.cnblogs.com/whatmiss/p/7114954.html$t 1=Microtime(true);foreach($urls as $key=$url){ file_put_contents(' log/get_ '.$key. '. txt ', get ($url)); Record the request result. Remember to create a log folder}$t 2=Microtime(true);Echo $t 2-$t 1;

The test results are obvious gaps, and as the volume of data increases, there is an exponential widening gap:

2.448140144348121.689239978798.92550992965724.731415033343.24318504333523.3843379020693.284188032150324.754415035248 3.209182977676429.068662881851

Reference, thank the original

http://php.net/manual/zh/function.curl-multi-init.php

Http://www.tuicool.com/articles/auiEBb

http://blog.csdn.net/liylboy/article/details/39669963 This article writes about a possible time-out problem

Another, here is an article said, multithreading is not faster, even a little bit slower, I feel very strange, how can have such a result:

http://www.webkaka.com/tutorial/php/2013/102843/

Use PHP's curl to implement concurrent requests for remote files (crawl remote Web pages)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.