Anti-Search Engine

Source: Internet
Author: User

Recently, my colleagues are working on a search engine. At the moment, they have developed an anti-search function. The main reference is as follows:

How to restrict robot access to web sites

Robot is an automation that helps search engines collect web pages.ProgramWhen it accesses a web site, it extracts most of the content from the site following the link in the web page, creates indexes for these web pages, and stores them in the search engine database. In some cases, the web administrator or webpage author may not want the robot to extract some content from the site for some reason. In this case, you can use some methods to limit the access scope of the robot.

There are two methods to restrict robot access to the web site. One is the robot restriction protocol used by the web administrator of the site. Currently, most robots comply with this Protocol, the other is the robot meta tag used by the webpage author. Currently, only a small portion of the robot supports this tag.

Robot Protocol

The key of the robotlimit agreement is to place a file robot.txt in the root directory of the web site. When a robot accesses a website, it first reads the file, analyzes the content, and does not access certain files according to the web administrator's regulations. The following is an example of robot.txt:

# Http://


Disallow:/tmp/# these files will soon be deleted


User-Agent: infoseek rorobot 1.0

Disallow :/

The content after "#" is a comment. The User-Agent command is used to specify the robot to which the disallow command under it is valid. "" indicates that it is valid for all robots, in the preceding example, the second User-Agent command indicates that the disallow command is only valid for the robot version 1.0 of infoseek. The disallow command is used to specify which directories or files cannot be accessed. If "/" is specified, all files are not allowed to be accessed. The disallow command can only put one directory or one file in one row, if there are multiple directories, they must be placed in several rows.

The role's robot.txt file is currently being used in the early version of the robot restrictions protocol, and a draft of the Internet on How to restrict the robot is being developed, it has expanded the earlier version of the robot protocol, but has not yet entered the practical stage.

Robot meta tag

Or use the robot meta tag.

Meta tag is a tag used to place invisible information in an HTML file. It must be placed in the head of the HTML file. The robot meta tag is a special meta tag. The following are some examples:

<Meta name = "Robots" content = "index, follow"> ″〉

<Meta name = "Robots" content = "noindex, follow"> ″〉

<Meta name = "Robots" content = "index, nofollow"> ″〉

<Meta name = "Robots" content = "noindex, nofollow"> ″〉

The name part of the robot meta tag is "Robots", and the content part can be a combination of "Index", "noindex", "Follow", and "nofollow. "Index" indicates that the search engine can index the HTML file. "follow" indicates that the search engine can use the link in the HTML file to access other files, "noindex" and "nofollow" are the opposite of "Index" and "follow. When using these commands in combination, there cannot be logical conflicts, that is, you cannot specify "Index", "noindex", "Follow", and "nofollow" at the same time ". In addition, if you want to specify "index, follow", you can use "all" instead. If you want to specify "noindex" and "nofollow", you can use "NONE" instead.

The disadvantage of using the robot meta tag is that it is troublesome to modify every HTML file. In addition, many robots do not support this tag.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.