PHP to judge search engine Spider Crawler method collation

Source: Internet
Author: User
Tags foreach

First look at the spider list





















































Search engine User-agent (included) Whether PTR Note
Google Googlebot Host IP Get domain name: googlebot.com primary Domain name
Baidu Baiduspider Host IP Get domain name: *.baidu.com or *.baidu.jp
Yahoo Yahoo! Host IP Get domain name: inktomisearch.com primary Domain name
Sogou Sogou X
*sogou Web spider/3.0 (+http://www.sogou.com/docs/help/webmasters.htm#07″)
*sogou Push spider/3.0 (+http://www.sogou.com/docs/help/webmasters.htm#07″)
Netease Yodaobot X *mozilla/5.0 (compatible; Yodaobot/1.0;http://www.yodao.com/help/webmaster/spider/"; )
Msn MSNBot Host IP Get domain name: live.com primary Domain name
360 360Spider X mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.8.0.11) firefox/1.5.0.11; 360Spider
Soso Sosospider X sosospider+ (+http://help.soso.com/webspider.htm)
Bing Bingbot Host IP Get domain name: MSN.com primary Domain name

Take another look at the example

<?php
PHP to determine the search engine spider crawler method

function Checkrobot ($useragent = ' ") {
static $kw _spiders = Array (' bot ', ' crawl ', ' spider ', ' slurp ', ' sohu-search ', ' Lycos ', ' Robozilla ');
static $kw _browsers = Array (' MSIE ', ' Netscape ', ' opera ', ' Konqueror ', ' Mozilla ');

    $useragent = Strtolower (Empty ($useragent) $_server[' http_user_agent ': $useragent);
    if (Strpos ($useragent, ' http://') = = False && Dstrpos ($useragent, $kw _browsers))
& nbsp;       return false;
    if (Dstrpos ($useragent, $kw _spiders))
        return True
    return false;
}

function Dstrpos ($string, $arr, $returnvalue = False) {
if (empty ($string))
return false;
foreach ((array) $arr as $v) {
if (Strpos ($string, $v)!== false) {
$return = $returnvalue? $v: true;
return $return;
}
}
return false;
}

if (Checkrobot ()) {
Echo ' Spider ';
}else{
echo ' Human ';
}

?>

Example

PHP Anti-resolution IP method
<?php
/**
* Check IP and spider authenticity
* (Check_spider (' 66.249.74.44 ', $_server[' http_user_agent '));
* @copyright http://blog.chacuo.net
* @author 8292669
* @param string $IP IP address
* @param string $ua UA Address
* @return False|spidername False detection failure is not in the specified list
*/
function Check_spider ($IP, $ua)
{
Static $spider _list=array (
' Google ' =>array (' Googlebot ', ' googlebot.com '),
' Baidu ' =>array (' Baiduspider ', '. Baidu. '),
' Yahoo ' =>array (' Yahoo! ', ' inktomisearch.com '),
' MSN ' =>array (' msnbot ', ' live.com '),
' Bing ' =>array (' Bingbot ', ' msn.com ')
);

if (!preg_match ('/^ \d{1,3}\.) {3}\d{1,3}$/', $ip)) return false;
if (empty ($ua)) return false;

foreach ($spider _list as $k => $v)
{
If you find it,
if (Stripos ($ua, $v [0])!==false)
{
$domain = gethostbyaddr ($IP);

if ($domain && stripos ($domain, $v [1])!==false)
{
return $k;
}
}
}
return false;
}

Currently only a few search engine detection, these are available to do reverse parsing query. Do not do the reverse parsing query, it is best to do speed limits, users will use them to forge a search engine to crawl your resources

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.