In order to keep the search engine from indexing the site's background pages or other privacy pages, we have disabled these paths in the robots.txt file. But paradoxically, robots.txt files can be accessed by anyone, including hackers. In order to prohibit the search engine, we leaked the privacy to the hacker.
What's robots.txt doing?
Robots.txt basically every site is used, and placed in the root directory of the site, anyone can directly enter the path to open and view the contents, such as Http://www.cnblogs.com/robots.txt. This file is used to tell the search engine which pages can be crawled and which pages do not crawl.
Robots.txt How to use
Create a file under the root of the website, named Robots.txt, the file name must be this! Then set the rules inside.
For example, I have a blog, I want to set not allow any search engine to ingest this site, robots.txt in the following two lines can be set.
User-agent: *
Disallow:/
If you want to restrict the search engine from accessing our Site Admin directory, then the rule changes to:
User-agent: *
Disallow:/admin/
robots.txt more usage rules, not within the scope of this article.
Robots.txt Anti-hacker
As in the above example, in order to let the search engine does not include the admin page and in the robots.txt to make the restriction rules. But this robots.txt page, anyone can see, so hackers can be more clearly understand the structure of the site, such as the admin directory, include directory and so on.
Is there a way to use Robots.txt's shielded search engine to access features without revealing the backend address and privacy directory?
Yes, that is, use an asterisk (*) as a wildcard character. Examples are as follows:
user-agent:
Disallow:/a*/
This setting prohibits all search engine index directories under the root directory of a. Of course, if your backstage directory is admin, or can be guessed, but if you change the admin to adoit it? Who else would know?
In summary, in order to not let the search engine index the site's background directory or other privacy directory, we have these paths disabled in the robots.txt file. In order for the content in robots.txt to not reveal the background and privacy of the site, we use an asterisk (*) to modify the settings. Finally, in order not to let the hacker guess the real path, we can make these sensitive directories with unconventional renaming.
OK, about robots.txt and website privacy, introduce so much, hope to everyone has help, thank you!
Robots.txt prevent disclosure of the site's background and privacy to hackers