1, recommended a method: PHP Judge search engine Spider crawler or human access code, from Discuz x3.2
<?phpfunction Checkrobot ($useragent = ") {static $kw _spiders = Array (' bot ', ' crawl ', ' spider ', ' slurp ', ' Sohu-search ') , ' Lycos ', ' Robozilla '), static $kw _browsers = Array (' MSIE ', ' Netscape ', ' opera ', ' Konqueror ', ' Mozilla '); $useragent = str ToLower (Empty ($useragent)? $_server[' Http_user_agent '): $useragent); if (Strpos ($useragent, ' http://') = = = False & & Dstrpos ($useragent, $kw _browsers)) return false;if (Dstrpos ($useragent, $kw _spiders)) return True;return false; function Dstrpos ($string, $arr, $returnvalue = False) {if (empty ($string)) return False;foreach ((array) $arr as $v) {if (str POS ($string, $v)!== false) {$return = $returnvalue? $v: True;return $return;}} return false;} if (Checkrobot ()) {echo ' Robot crawler ';} Else{echo ' People ';}? >
The actual application can be judged in this way, directly not the search engine to perform the operation
<?phpif (!checkrobot ()) {//do something}?>
2. The second method:
Using PHP to implement Spider access log statistics
$useragent = Addslashes (Strtolower ($_server[' http_user_agent ')); if (Strpos ($useragent, ' Googlebot ')!== false) {$bot = ' Google ';} elseif (Strpos ($useragent, ' mediapartners-google ')!== False) {$bot = ' Google Adsense ';} elseif (Strpos ($useragent, ' Baiduspider ')!== false) {$bot = ' Baidu ';} elseif (Strpos ($use Ragent, ' Sogou spider ')!== false) {$bot = ' Sogou ';} elseif (Strpos ($useragent, ' Sogou web ')!== false) {$bot = ' Sogou web ';} ElseIf (Strpos ($useragent, ' Sosospider ')!== false) {$bot = ' SOSO ';} elseif (Strpos ($useragent, ' 360spider ')!== false) {$ bot = ' 360Spider ';} ElseIf (Strpos ($useragent, ' Yahoo ')!== false) {$bot = ' Yahoo ';} elseif (Strpos ($useragent, ' MSN ')!== false) {$bot = ' MSN ';} ElseIf (Strpos ($useragent, ' MSNBot ')!== false) {$bot = ' msnbot ';} elseif (Strpos ($useragent, ' Sohu ')!== false) {$bot = ' Soh U ';} ElseIf (Strpos ($useragent, ' Yodaobot ')!== false) {$bot = ' Yodao ';} elseif (Strpos ($useragent, ' Twiceler ')!== false) {$bot = ' Twiceler ';} ElseIf (Strpos ($useragent, ' Ia_archiver ')!== false) {$bot = ' Alexa_ ';} ElseIf (Strpos ($useragent, ' Iaarchiver ')!== false) {$bot = ' Alexa ';} elseif (Strpos ($useragent, ' slurp ')!== false) {$bot = ' Yahoo ';} ElseIf (Strpos ($useragent, ' bot ')!== false) {$bot = ' other spider ';} if (Isset ($bot)) {$fp = @fopen (' Bot.txt ', ' a '); Fwrite ($fp, date (' y-m-d h:i:s '). " \ t ". $_server[" REMOTE_ADDR "]." \ t ". $bot." \ t "." http://'. $_server[' server_name '].$_server["Request_uri"]. " \ r \ n "); Fclose ($FP); }
The third method:
We can judge whether it is a spider by http_user_agent, the spider of search engine has its own unique symbol, the following list takes part.
function Is_crawler () { $userAgent = strtolower ($_server[' http_user_agent ')); $spiders = Array ( ' Googlebot ',//Google crawler ' baiduspider ',//Baidu crawler ' Yahoo! slurp ',//Yahoo crawler ' Yodaobot ',// Youdao crawler ' msnbot '//Bing crawler //More crawler keywords ); foreach ($spiders as $spider) { $spider = Strtolower ($spider); if (Strpos ($userAgent, $spider)!== false) { return true; } } return false; }
The following PHP code comes with more spider logos
function Iscrawler () {echo $agent = Strtolower ($_server[' http_user_agent ')); if (!empty ($agent)) {$spiderSite = Array ("Tencenttraveler", "baiduspider+", " Baidugame "," Googlebot "," MSNBot "," sosospider+ "," Sogou web Spider ", "Ia_archiver", "Yahoo! slurp", "Youdaobot", "Yahoo slurp", "msnbot "," Java (Often spam bot), "Baiduspider", "Voila", "Yandex bot", "Bspider", "Twiceler", "Sogou spider", "Speedy spider", "Google Adsens E "," Heritrix "," Python-urllib "," Alexa (IA archiver) "," Ask ", "Exabot", "Custo", "Outfoxbot/yodaobot", "YaCy", "Surveybot", "Legs", "lwp-trivial", "Nutch", "Stackrambler", "The Web Archive (IA archiver)", "Perl Tool", "Mj12bot", "Netcraft", "Msiecrawler", "WGet Tools", "Larbin", "Fish search", ); foreach ($spiderSite as $val) {$str = Strtolower ($val); if (Strpos ($agent, $STR)!== false) {return true; }}} else {return false; }} if (Iscrawler ()) {echo "Hello Spider-Fine!" "; } else{echo "You're not a Spider-Man!" "; }
Fourth method:
<?php$flag = false; $tmp = $_server[' http_user_agent '];if (Strpos ($tmp, ' Googlebot ')!== false) {$flag = true;} else if ( Strpos ($tmp, ' Baiduspider ') >0) {$flag = true;} else if (Strpos ($tmp, ' Yahoo! slurp ')!== false) {$flag = true;} else if (Strpos ($tmp, ' MSNBot ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Sosospider ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Yodaobot ')!== false | | Strpos ($tmp, ' Outfoxbot ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Sogou Web Spider ')!== false | | Strpos ($tmp, ' Sogou Orion spider ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Fast-webcrawler ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Gaisbot ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Ia_archiver ')!== false) {$flag = true; } else if (Strpos ($tmp, ' AltaVista ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Lycos_spider ')!== false) {$flag = true;} else if (Strpos ($tmp, ' Inktomi slurp ')!== false) {$flag = true;} if ($flag = = False) {header ("Location: Http://www.php.net ". $_server[' Request_uri '); Automatically go to http://www.php.net corresponding Page//$_server[' Request_uri '] for the path behind the domain name//or replace the header ("location:http://www.php.net/abc/ D.php "); Exit ();}? >