We can use HTTP_USER_AGENT to determine whether it is a spider. search engine spider has its own unique logo. The following column takes a part of it.
- Function is_crawler (){
- $ UserAgent = strtolower ($ _ SERVER ['http _ USER_AGENT ']);
- $ Spiders = array (
- 'Googlebot ', // Google crawler
- 'Baidider Ider ', // Baidu crawler
- 'Yahoo! Slurp ', // Yahoo crawler
- 'Yodaobot ', // Youdao crawler
- 'Msnbot '// Bing crawler
- // More crawler keywords
- );
- Foreach ($ spiders as $ spider ){
- $ Spider = strtolower ($ spider );
- If (strpos ($ userAgent, $ spider )! = False ){
- Return true;
- }
- }
- Return false;
- }
The following php code comes with more Spider identifiers
- Function isCrawler (){
- Echo $ agent = strtolower ($ _ SERVER ['http _ USER_AGENT ']);
- If (! Empty ($ agent )){
- $ SpiderSite = array (
- "TencentTraveler ",
- "Baiduspider + ",
- "BaiduGame ",
- "Googlebot ",
- "Msnbot ",
- "Sosospider + ",
- "Sogou web spider ",
- "Ia_archiver ",
- "Yahoo! Slurp ",
- "YoudaoBot ",
- "Yahoo Slurp ",
- "MSNBot ",
- "Java (Often spam bot )",
- "Baidusp ",
- "Voila ",
- "Yandex bot ",
- "BSpider ",
- "Twiceler ",
- "Sogou Spider ",
- "Speedy Spider ",
- "Google AdSense ",
- "Heritrix ",
- "Python-urllib ",
- "Alexa (IA Archiver )",
- "Ask ",
- "Exabot ",
- "Custo ",
- "OutfoxBot/YodaoBot ",
- "Yacy ",
- "SurveyBot ",
- "Legs ",
- "Lwp-trivial ",
- "Nutch ",
- "StackRambler ",
- "The web archive (IA Archiver )",
- "Perl tool ",
- "MJ12bot ",
- "Netcraft ",
- "MSIECrawler ",
- "WGet tools ",
- "Larbin ",
- "Fish search ",
- );
- Foreach ($ spiderSite as $ val ){
- $ Str = strtolower ($ val );
- If (strpos ($ agent, $ str )! = False ){
- Return true;
- }
- }
- } Else {
- Return false;
- }
- }
- If (isCrawler ()){
- Echo "Hello Spider! ";
- }
- Else {
- Echo "you are not a spider! ";
- }
|