Robots.txt prevent disclosure of the site's background and privacy to hackers

Source: Internet
Author: User

In order to keep the search engine from indexing the site's background pages or other privacy pages, we have disabled these paths in the robots.txt file. But paradoxically, robots.txt files can be accessed by anyone, including hackers. In order to prohibit the search engine, we leaked the privacy to the hacker.
What's robots.txt doing?

Robots.txt basically every site is used, and placed in the root directory of the site, anyone can directly enter the path to open and view the contents, such as Http://www.cnblogs.com/robots.txt. This file is used to tell the search engine which pages can be crawled and which pages do not crawl.

Robots.txt How to use

Create a file under the root of the website, named Robots.txt, the file name must be this! Then set the rules inside.
For example, I have a blog, I want to set not allow any search engine to ingest this site, robots.txt in the following two lines can be set.

User-agent: *
Disallow:/

If you want to restrict the search engine from accessing our Site Admin directory, then the rule changes to:

User-agent: *
Disallow:/admin/

robots.txt more usage rules, not within the scope of this article.

Robots.txt Anti-hacker

As in the above example, in order to let the search engine does not include the admin page and in the robots.txt to make the restriction rules. But this robots.txt page, anyone can see, so hackers can be more clearly understand the structure of the site, such as the admin directory, include directory and so on.

Is there a way to use Robots.txt's shielded search engine to access features without revealing the backend address and privacy directory?
Yes, that is, use an asterisk (*) as a wildcard character. Examples are as follows:

user-agent:
Disallow:/a*/

This setting prohibits all search engine index directories under the root directory of a. Of course, if your backstage directory is admin, or can be guessed, but if you change the admin to adoit it? Who else would know?

In summary, in order to not let the search engine index the site's background directory or other privacy directory, we have these paths disabled in the robots.txt file. In order for the content in robots.txt to not reveal the background and privacy of the site, we use an asterisk (*) to modify the settings. Finally, in order not to let the hacker guess the real path, we can make these sensitive directories with unconventional renaming.

OK, about robots.txt and website privacy, introduce so much, hope to everyone has help, thank you!

Robots.txt prevent disclosure of the site's background and privacy to hackers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.