How to identify SEO robots

Source: Internet
Author: User
Tags website performance

The search engine sends their search robots to access and index website content. By default, website administrators also welcome them.
Visit. However, because a robot from a search engine affects website performance to a certain extent, not all machines
People are harmless. Some illegal robots disguise themselves as mainstream search engine roaming devices that traverse websites in large quantities and do not follow
The robots.txt specification severely compromises website performance without other benefits. Therefore, the website administrator needs to verify each machine
Is the identity of a person legal.
In your server log file, you can see the path and IP address of each access,
The user-agent will display the name of the Google bot, MSNBot, and other search engines. Each search engine has its own
User-agent, but this is not enough to prove the legality of the robot, because many spammers may
Their robots are also named Googlebot. They enter the website in disguise and make great efforts to extract content.
Currently, mainstream search engines recommend that website administrators identify real robots in this way:
DNS reverse query to find the host name corresponding to the search engine robot IP address; use the host name to find the IP address to confirm the master
The host name matches the IP address.
First, use DNS Lookup to find the host name corresponding to the robot IP address.
The host names of mainstream search engines are usually as follows:
Google: the host name should be included in the googlebot.com domain name, for example:
Crawl-66-249-66-1.googlebot.com;
MSN: the host name should be included in the domain name search.live.com, for example:

Livebot-207-46-98-149.search.live.com;
Yahoo: the host name should be included in the inktomisearch.com domain name, for example, ab1164.inktomisearch.com.
Finally, perform a DNS query and use the host name to find the IP address (Forward DNS Lookup) to confirm the host.
The name matches the IP address. This proves that the robot is legal.
Now, if you find a robot disguising itself as a legitimate search engine roaming bot, you can go through the server
To block this illegal robot.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.