PHP to determine whether the visit is a search engine spider or ordinary user code summary _php instance

Source: Internet
Author: User
1, recommended a method: PHP Judge search engine Spider crawler or human access code, from Discuz x3.2

<?phpfunction Checkrobot ($useragent = ") {static $kw _spiders = Array (' bot ', ' crawl ', ' spider ', ' slurp ', ' Sohu-search ') , ' Lycos ', ' Robozilla '), static $kw _browsers = Array (' MSIE ', ' Netscape ', ' opera ', ' Konqueror ', ' Mozilla '); $useragent = str ToLower (Empty ($useragent)? $_server[' Http_user_agent '): $useragent); if (Strpos ($useragent, ' http://') = = = False & & Dstrpos ($useragent, $kw _browsers)) return false;if (Dstrpos ($useragent, $kw _spiders)) return True;return false; function Dstrpos ($string, $arr, $returnvalue = False) {if (empty ($string)) return False;foreach ((array) $arr as $v) {if (str POS ($string, $v)!== false) {$return = $returnvalue? $v: True;return $return;}} return false;} if (Checkrobot ()) {echo ' Robot crawler ';} Else{echo ' People ';}? >

The actual application can be judged in this way, directly not the search engine to perform the operation

<?phpif (!checkrobot ()) {//do something}?>

2. The second method:

Using PHP to implement Spider access log statistics

$useragent = Addslashes (Strtolower ($_server[' http_user_agent ')); if (Strpos ($useragent, ' Googlebot ')!== false) {$bot = ' Google ';} elseif (Strpos ($useragent, ' mediapartners-google ')!== False) {$bot = ' Google Adsense ';} elseif (Strpos ($useragent, ' Baiduspider ')!== false) {$bot = ' Baidu ';} elseif (Strpos ($use  Ragent, ' Sogou spider ')!== false) {$bot = ' Sogou ';} elseif (Strpos ($useragent, ' Sogou web ')!== false) {$bot = ' Sogou web ';} ElseIf (Strpos ($useragent, ' Sosospider ')!== false) {$bot = ' SOSO ';} elseif (Strpos ($useragent, ' 360spider ')!== false) {$ bot = ' 360Spider ';}  ElseIf (Strpos ($useragent, ' Yahoo ')!== false) {$bot = ' Yahoo ';} elseif (Strpos ($useragent, ' MSN ')!== false) {$bot = ' MSN ';} ElseIf (Strpos ($useragent, ' MSNBot ')!== false) {$bot = ' msnbot ';} elseif (Strpos ($useragent, ' Sohu ')!== false) {$bot = ' Soh U ';} ElseIf (Strpos ($useragent, ' Yodaobot ')!== false) {$bot = ' Yodao ';} elseif (Strpos ($useragent, ' Twiceler ')!== false) {$bot = ' Twiceler ';} ElseIf (Strpos ($useragent, ' Ia_archiver ')!== false) {$bot = ' Alexa_ ';} ElseIf (Strpos ($useragent, ' Iaarchiver ')!== false) {$bot = ' Alexa ';} elseif (Strpos ($useragent, ' slurp ')!== false) {$bot = ' Yahoo ';}   ElseIf (Strpos ($useragent, ' bot ')!== false) {$bot = ' other spider ';} if (Isset ($bot)) {$fp = @fopen (' Bot.txt ', ' a '); Fwrite ($fp, date (' y-m-d h:i:s '). " \ t ". $_server[" REMOTE_ADDR "]." \ t ". $bot." \ t "." http://'. $_server[' server_name '].$_server["Request_uri"]. "   \ r \ n "); Fclose ($FP); }

The third method:

We can judge whether it is a spider by http_user_agent, the spider of search engine has its own unique symbol, the following list takes part.

function Is_crawler () {   $userAgent = strtolower ($_server[' http_user_agent '));   $spiders = Array (     ' Googlebot ',//Google crawler     ' baiduspider ',//Baidu crawler     ' Yahoo! slurp ',//Yahoo crawler     ' Yodaobot ',// Youdao crawler     ' msnbot '//Bing crawler     //More crawler keywords   );   foreach ($spiders as $spider) {     $spider = Strtolower ($spider);     if (Strpos ($userAgent, $spider)!== false) {       return true;     }   }   return false; }

The following PHP code comes with more spider logos

function Iscrawler () {echo $agent = Strtolower ($_server[' http_user_agent ')); if (!empty ($agent)) {$spiderSite = Array ("Tencenttraveler", "baiduspider+", "             Baidugame "," Googlebot "," MSNBot "," sosospider+ "," Sogou web Spider ", "Ia_archiver", "Yahoo! slurp", "Youdaobot", "Yahoo slurp", "msnbot             "," Java (Often spam bot), "Baiduspider", "Voila", "Yandex bot", "Bspider", "Twiceler", "Sogou spider", "Speedy spider", "Google Adsens             E "," Heritrix "," Python-urllib "," Alexa (IA archiver) "," Ask ",             "Exabot", "Custo", "Outfoxbot/yodaobot", "YaCy", "Surveybot",     "Legs", "lwp-trivial", "Nutch",        "Stackrambler", "The Web Archive (IA archiver)", "Perl Tool", "Mj12bot",         "Netcraft", "Msiecrawler", "WGet Tools", "Larbin", "Fish search",         );             foreach ($spiderSite as $val) {$str = Strtolower ($val);             if (Strpos ($agent, $STR)!== false) {return true;     }}} else {return false; }} if (Iscrawler ()) {echo "Hello Spider-Fine!" "; } else{echo "You're not a Spider-Man!"  "; }

Fourth method:

<?php$flag = false; $tmp = $_server[' http_user_agent '];if (Strpos ($tmp, ' Googlebot ')!== false) {$flag = true;} else if ( Strpos ($tmp, ' Baiduspider ') >0) {$flag = true;} else if (Strpos ($tmp, ' Yahoo! slurp ')!== false) {$flag = true;} else if (Strpos ($tmp, ' MSNBot ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Sosospider ')!== false) {$flag = true;} else  if (Strpos ($tmp, ' Yodaobot ')!== false | | Strpos ($tmp, ' Outfoxbot ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Sogou Web Spider ')!== false | | Strpos ($tmp, ' Sogou Orion spider ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Fast-webcrawler ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Gaisbot ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Ia_archiver ')!== false) {$flag = true; } else if (Strpos ($tmp, ' AltaVista ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Lycos_spider ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Inktomi slurp ')!== false) {$flag = true;} if ($flag = = False) {header ("Location: Http://www.php.net ".  $_server[' Request_uri '); Automatically go to http://www.php.net corresponding Page//$_server[' Request_uri '] for the path behind the domain name//or replace the header ("location:http://www.php.net/abc/  D.php "); Exit ();}? >
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.