Is the robots.txt file?
The search engine uses a program robot (also called spider) to automatically access webpages on the Internet and obtain webpage information.
You can create a pure robot file robots.txt on your website, which declares that the website does not want to be accessed by the robot. In this way, some or all of the content of the website will not be indexed by the search engine, or the specified search engine only contains the specified content.
# Where is the robots.txt file stored?
The robots.txt file should be placed in the root directory of the website. For example, when robots accesses a website (such as a http://www.abc.com), it first checks whether the website has a range that determines its access permissions.
# Robots.txt File Format
The "robots.txt" file contains one or more records separated by empty rows (with CR, CR/NL, or NL as the terminator). The format of each record is as follows:
"<Field >:< optionalspace> <value> <optionalspace> ".
In this file, you can use # For annotation. The usage is the same as that in UNIX. The record in this file usually starts with one or more lines of User-agent, followed by several Disallow lines. The details are as follows:
User-agent:
The value of this item is used to describe the name of the search engine robot. In the "robots.txt" file, if there are multiple User-agent records, it indicates that multiple robots will be restricted by this agreement, for this file, there must be at least one User-agent record. If this parameter is set to *, the Protocol is valid for all machines. In the "robots.txt" file, only one record such as "User-agent: *" can exist.
Disallow:
The value of this item is used to describe a URL that you do not want to access. This URL can be a complete or partial path, any URL starting with Disallow will not be accessed by the robot. For example, "Disallow:/help" does not allow access by search engines to/help.html and/help/index.html, while "Disallow:/help/" allows robot access to/help.html, but cannot access/help/index.html. If any Disallow record is blank, it means that all parts of the website are allowed to be accessed. At least one Disallow record is required in the "/robots.txt" file. If "/robots.txt" is an empty file, the website is open to all search engine robots.
For example, http://www.slenk.net/robots.txtcan be pulled.
If this directory exists, the following directory is displayed:
User-agent :*
Disallow:/admini/
The Disallow statement is followed by a directory that the Administrator does not want us to know.
After we get the directory, we can use the method again and use the tool to scan it again to see the background.
If the postmaster changes the background file name to @ # $ D $ # $, you have to worry about it.