How can I accurately determine whether a request is a request sent by a search engine crawler (SPIDER ?, Search engine Crawler

Source: Internet
Author: User
Tags nslookup nslookup command subdomain subdomain name

How can I accurately determine whether a request is a request sent by a search engine crawler (SPIDER ?, Search engine Crawler

Websites are often visited by various crawlers. Some are search engine crawlers, and some are not. Generally, these crawlers have UserAgent, and we know that UserAgent can be disguised, userAgent is essentially an option setting in the Http request header. You can set any UserAgent for the request by programming.

Therefore, using UserAgent to determine whether the request initiator is a search engine crawler (SPIDER) is unreliable, the more reliable method is to determine whether the host name corresponding to the requester's ip address is the host name of the search engine's own house.

To obtain the ip host, run the nslookup command in windows and run the host command in linux. For example:

Here I run the nslookup ip command in windows, and the host name of this ip address is a crawl-66-249-64-119.googlebot.com. This indicates that this ip address is a google crawler, and the domain names of google crawlers are xxx.googlebot.com.

We can also obtain the ip host information through the python program. The Code is as follows:

import socketdef getHost(ip):    try:        result=socket.gethostbyaddr(ip)        if result: return result[0], None    except socket.herror,e:        return None, e.message

The above Code uses the gethostbyaddr method of the socket module to obtain the Host Name of the IP address.

The domain names of common spider are related to the domain names of the search engine official website, for example:

  • Baidu's spider is generally a sub-domain name of baidu.com or baidu.jp
  • Google crawlers are generally subdomains of googlebot.com.
  • The Microsoft bing search engine crawler is a subdomain name of search.msn.com.
  • Sogou spider is a subdomain name of crawl.sogou.com

Based on the above principles, I wrote a tool page that provides a tool page to determine whether the ip address is a real search engine, this page provides webpage judgment tools and IP addresses of common google and bing search engine crawlers.

Page address: http://outofmemory.cn/tools/is-search-engine-spider-ip/

The Code provided in this article is python code, which can be implemented through c # code. The principle is the same.

The IP segment of the common search engine spider is included:

Spider name IP address
Baidusp

202.108.11.*220.181.32.*58.51.95.*60.28.22.*61.135.162.*61.135.163.*61.135.168 .*

YodaoBot

202.108.7.215 202.108.7.220 202.108.7.221

Sogou web spider

219.234.81.*220.181.61 .*

Googlebot

203.208.60 .*

Yahoo! Slurp

2018.0.181.*72.30.215.*74.6.17.*74.6.22 .*

Yahoo ContentMatch Crawler

119.42.226.*119.42.230 .*

Sogou-Test-Spider

220.181.19.103 220.181.26.122

Twiceler

38.99.44.104 64.34.251.9

Yahoo! Slurp China

2018.0.178 .*

Sosospider 124.115.0 .*
CollapsarWEB qihoobot

221.194.136.18

NaverBot

202.179.180.45

Sogou Orion spider

220.181.19.106 220.181.19.74

Sogou head spider

220.181.19.107

SurveyBot

216.145.5.42 64.246.165.160

Yanga WorldSearch Bot v

77.91.224.19 91.205.124.19

Baidusp- mobile-gate

220.181.5.34 61.135.166.31

Discobot

208.96.54.70

Ia_archiver 209.234.171.42
Msnbot

65.55.104.209 65.55.209.86 65.55.209.96

Sogou in spider

220.181.19.216

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.