How php judges search engine spider crawlers-PHP source code

Source: Internet
Author: User
Determining search engine spider crawlers is actually very simple. You only need to determine the source useragent and then check whether there are any strings specified by search engine spider. Next let's take a look at the php Method for Determining search engine spider crawlers, I hope this tutorial will help you. Determining search engine spider crawlers is actually very simple. You only need to determine the source useragent and then check whether there are any strings specified by search engine spider. Next let's take a look at the php Method for Determining search engine spider crawlers, I hope this tutorial will help you.

Script ec (2); script

First, let's look at the spider list.





















































Search Engine User-agent (included) PTR or not Remarks
Google Googlebot Host ip to get the Domain Name: googlebot.com main domain name
Baidu Baidusp Host ip Address: * .baidu.com or * .baidu.jp
Yahoo Yahoo! Host ip to get the Domain Name: inktomisearch.com main domain name
Sogou Sogou ×

* Sogou web spider/3.0 (+ http://www.sogou.com/docs/help/webmasters.htm#07 ″)
* Sogou Push Spider/3.0 (+ http://www.sogou.com/docs/help/webmasters.htm#07 ″)

Netease YodaoBot × * Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider /";)
MSN MSNBot Host ip to get the Domain Name: live.com main domain name
360 360 Spider × Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 1.8.0.11) Firefox/1.5.0.11; 360 Spider
Soso Sosospider × Sosospider + (+ http://help.soso.com/webspider.htm)
Bing Bingbot Host ip to get the Domain Name: msn.com main domain name

Let's take a look at the example.

// Php Method for Determining search engine spider Crawlers

Function checkrobot ($ useragent = ''){
Static $ kw_spiders = array ('bot ', 'crawl', 'spider ', 'slurp', 'sohu-search', 'lycos', 'robozilla ');
Static $ kw_browsers = array ('msie ', 'netscape', 'Opera ', 'konqueror', 'mozilla ');

$ Useragent = strtolower (empty ($ useragent )? $ _ SERVER ['HTTP _ USER_AGENT ']: $ useragent );
If (strpos ($ useragent, 'HTTP: // ') ===false & dstrpos ($ useragent, $ kw_browsers ))
Return false;
If (dstrpos ($ useragent, $ kw_spiders ))
Return true;
Return false;
}

Function dstrpos ($ string, $ arr, $ returnvalue = false ){
If (empty ($ string ))
Return false;
Foreach (array) $ arr as $ v ){
If (strpos ($ string, $ v )! = False ){
$ Return = $ returnvalue? $ V: true;
Return $ return;
}
}
Return false;
}

If (checkrobot ()){
Echo 'spider ';
} Else {
Echo 'human ';
}

?>

Example

PHP anti-IP Resolution Method
/**
* Check IP address and spider authenticity
* (Check_spider ('66. 249.74.44 ', $ _ SERVER ['HTTP _ USER_AGENT']);
* @ Copyright http://blog.chacuo.net
* @ Author 8292669
* @ Param string $ ip Address
* @ Param string $ ua address
* @ Return false | spidername false detection failure is not in the specified list
*/
Function check_spider ($ ip, $ ua)
{
Static $ spider_list = array (
'Google '=> array ('googlebot', 'googlebot. com '),
'Baidu' => array ('baidider ider ','. baidu .'),
'Yahoo '=> array ('yahoo! ', 'Inktomisearch. com '),
'Msn '=> array ('msnbot', 'Live. com '),
'Bing' => array ('stringbot ', 'msn. com ')
);

If (! Preg_match ('/^ (\ d {1, 3} \.) {3} \ d {1, 3} $/', $ ip) return false;
If (empty ($ ua) return false;

Foreach ($ spider_list as $ k => $ v)
{
/// If you find
If (stripos ($ ua, $ v [0])! = False)
{
$ Domain = gethostbyaddr ($ ip );

If ($ domain & stripos ($ domain, $ v [1])! = False)
{
Return $ k;
}
}
}
Return false;
}

Currently, only a few search engines are added for detection, which can be used for anti-resolution queries. Anti-resolution queries are not supported. It is best to limit the speed. Users will use them to forge search engines to capture your resources.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.