Determining search engine spider crawlers is actually very simple. You only need to determine the source useragent and then check whether there are any strings specified by search engine spider. Next let's take a look at the php Method for Determining search engine spider crawlers, I hope this tutorial will help you. Determining search engine spider crawlers is actually very simple. You only need to determine the source useragent and then check whether there are any strings specified by search engine spider. Next let's take a look at the php Method for Determining search engine spider crawlers, I hope this tutorial will help you.
Script ec (2); script
First, let's look at the spider list.
Search Engine |
User-agent (included) |
PTR or not |
Remarks |
Google |
Googlebot |
√ |
Host ip to get the Domain Name: googlebot.com main domain name |
Baidu |
Baidusp |
√ |
Host ip Address: * .baidu.com or * .baidu.jp |
Yahoo |
Yahoo! |
√ |
Host ip to get the Domain Name: inktomisearch.com main domain name |
Sogou |
Sogou |
× |
* Sogou web spider/3.0 (+ http://www.sogou.com/docs/help/webmasters.htm#07 ″) * Sogou Push Spider/3.0 (+ http://www.sogou.com/docs/help/webmasters.htm#07 ″) |
Netease |
YodaoBot |
× |
* Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider /";) |
MSN |
MSNBot |
√ |
Host ip to get the Domain Name: live.com main domain name |
360 |
360 Spider |
× |
Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 1.8.0.11) Firefox/1.5.0.11; 360 Spider |
Soso |
Sosospider |
× |
Sosospider + (+ http://help.soso.com/webspider.htm) |
Bing |
Bingbot |
√ |
Host ip to get the Domain Name: msn.com main domain name |
Let's take a look at the example.
// Php Method for Determining search engine spider Crawlers
Function checkrobot ($ useragent = ''){
Static $ kw_spiders = array ('bot ', 'crawl', 'spider ', 'slurp', 'sohu-search', 'lycos', 'robozilla ');
Static $ kw_browsers = array ('msie ', 'netscape', 'Opera ', 'konqueror', 'mozilla ');
$ Useragent = strtolower (empty ($ useragent )? $ _ SERVER ['HTTP _ USER_AGENT ']: $ useragent );
If (strpos ($ useragent, 'HTTP: // ') ===false & dstrpos ($ useragent, $ kw_browsers ))
Return false;
If (dstrpos ($ useragent, $ kw_spiders ))
Return true;
Return false;
}
Function dstrpos ($ string, $ arr, $ returnvalue = false ){
If (empty ($ string ))
Return false;
Foreach (array) $ arr as $ v ){
If (strpos ($ string, $ v )! = False ){
$ Return = $ returnvalue? $ V: true;
Return $ return;
}
}
Return false;
}
If (checkrobot ()){
Echo 'spider ';
} Else {
Echo 'human ';
}
?>
Example
PHP anti-IP Resolution Method
/**
* Check IP address and spider authenticity
* (Check_spider ('66. 249.74.44 ', $ _ SERVER ['HTTP _ USER_AGENT']);
* @ Copyright http://blog.chacuo.net
* @ Author 8292669
* @ Param string $ ip Address
* @ Param string $ ua address
* @ Return false | spidername false detection failure is not in the specified list
*/
Function check_spider ($ ip, $ ua)
{
Static $ spider_list = array (
'Google '=> array ('googlebot', 'googlebot. com '),
'Baidu' => array ('baidider ider ','. baidu .'),
'Yahoo '=> array ('yahoo! ', 'Inktomisearch. com '),
'Msn '=> array ('msnbot', 'Live. com '),
'Bing' => array ('stringbot ', 'msn. com ')
);
If (! Preg_match ('/^ (\ d {1, 3} \.) {3} \ d {1, 3} $/', $ ip) return false;
If (empty ($ ua) return false;
Foreach ($ spider_list as $ k => $ v)
{
/// If you find
If (stripos ($ ua, $ v [0])! = False)
{
$ Domain = gethostbyaddr ($ ip );
If ($ domain & stripos ($ domain, $ v [1])! = False)
{
Return $ k;
}
}
}
Return false;
}
Currently, only a few search engines are added for detection, which can be used for anti-resolution queries. Anti-resolution queries are not supported. It is best to limit the speed. Users will use them to forge search engines to capture your resources.