Search engine spider name sorting and sharing

Source: Internet
Author: User


Crawlers of major seo search engines will constantly access and capture the content of our site, which also consumes a certain amount of site traffic. Sometimes, they need to block some spiders from accessing our site. In fact, there are only a few commonly used search engines, as long as several commonly used search engine spider is allowed in the robots file, all others are prohibited through the wildcard. Baidu gave a search engine spider name, but the result made him very depressed. He copied and pasted the original text, and many materials were outdated, even recently published articles are also old documents that have not been revised and updated at all, and the names and cases of spider are different, so no accurate information can be found. As a result, he decided to sort out common spider names based on the access logs of his own space. He did not ask for the most comprehensive information, but tried to obtain the latest and most accurate information (the search engine spider names are all extracted by him by himself based on the space logs ).
The latest and most accurate search engine spider names:
1. Baidu Spider: Baiduspider
Check the information on the internet. Baidu Spider names include BaiduSpider and baiduspider. Wash and sleep. That's the old yellow calendar. The latest Baidu Spider name is baidusp. The log also found the Baidu Spider Baiduspider-image, and checked the information (in fact, you can simply look at the name ......), It is a spider who captures images.
Common crawlers of the same type under Baidu also include the following: Baiduspider-image (capture images), Baiduspider-video (capture videos), Baiduspider-news (capture news) and Baiduspider-mobile (capture wap ).
Note: Currently, only Baidu uspider and Baiduspider-image are found on this site.

2. Google spider: Googlebot
This is rarely controversial, but it is also said that Google bot. The latest Google spider name is Googlebot. We also found Googlebot-Mobile, which is used to capture wap content.

3. 360 Spider: 360 Spider
4. SOSO Spider: Sosospider
5. Yahoo Spider: Yahoo! Slurp China or Yahoo! Slurp does not include the current name! No precise data has yet been found, waiting for crawling (not crawling, not catching this spider--, looking for a lot of information, it should be like this now, you are welcome to provide reliable clues about the spider name ......). If you think about it, you don't have to worry about it! The robots name can use Slurp ......
6. Youdao Spider: YoudaoBot and YodaoBot (both)
7. Sogou Spider: Sogou News Spider
Sogou web spider, Sogou inst spider, Sogou spider2, Sogou blog, Sogou News Spider, Sogou Orion spider, repair only found Sogou News Spider in the log. (Refer to other people's robots files. The names of Sogou spider can be summarized by Sogou)
8. MSN Spider: msnbot, msnbot-media (only seeing msnbot-media crawling ......)

9. Bing Spider: bingbot
10. Search for a spider: YisouSpider
11. Alexa Spider: ia_archiver
12. Yisearch Spider: EasouSpider (what is the relationship between the goods and the first one? Let's try another one that has been searched and easily searched together --)
13. Instant Spider: JikeSpider

In the above cases, the spider selects several common ones that can be crawled, and the rest can be captured through robots shielding.

Apache also found YandexBot, AhrefsBot, and ezooms. bot in the log. It is said that these are not good birds ......
The space traffic is sufficient for the time being. When the traffic is insufficient, reserve several frequently used ones to shield other spider to save traffic.

Look at a php function that I used to judge a search engine spider.

The code is as follows: Copy code

Function get_naps_bot ()
{
$ Useragent = strtolower (@ $ _ SERVER ['http _ USER_AGENT ']);
If (empty ($ useragent ))
  {
Return false;
  }
If (strpos ($ useragent, 'Google ')! = False ){
Return true;
  }
   
If (strpos ($ useragent, 'msnbot ')! = False ){
Return true;
  }
   
If (strpos ($ useragent, 'slurp ')! = False ){
Return true;
  }
   
If (strpos ($ useragent, 'baidu ')! = False ){
Return true;
  }
   
If (strpos ($ useragent, 'sohu-search ')! = False ){
Return true;
  }
   
If (strpos ($ useragent, 'lycos ')! = False ){
Return true;
  }
   
If (strpos ($ useragent, 'robozilla ')! = False ){
Return true;
  }     
Return false;
}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.