Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
The robots.txt file restricts the search engine bots (called bots) that crawl the web. These bots are automatic and see if there are any robots.txt files that restrict their access to a particular page before they can access the Web page. Robots.txt is a simple and effective tool if you want to protect certain content on your site from search engine revenue. Here is a brief introduction to how to use it.
How to place Robots.txt files
Robots.txt itself is a text file. It must be in the root directory of the domain name and named "Robots.txt". The robots.txt file located in the subdirectory is not valid because the rover finds the file only in the root directory of the domain name. For example:
Http://www.nikeyou.com/robots.txt is a valid position, Http://www.nikeyou.com/mysite/robots.txt is not.
Here's a robots.txt example:
User: *
Disallow: CGI
Disallow:/tmp/
Disallow:/~name/
Block or delete an entire Web site using the robots.txt file
To remove your site from the search engine and prevent all bots from crawling your site at a later time, place the following robots.txt files in your server's root directory:
User: *
Disallow:/
To remove your site from Google only, and only to prevent Googlebot from crawling your site in the future, place the following robots.txt files in your server's root directory:
User-agent:googlebot
Disallow:/
Each port should have its own robots.txt file. Especially when you are hosting content through HTTP and HTTPS, these protocols need to have their own robots.txt files. For example, to make Googlebot index only for all HTTP Web pages and not for HTTPS pages, use the following robots.txt file.
For HTTP protocol (Http://yourserver.com/robots.txt):
User: *
Allow:/
For HTTPS protocol (Https://yourserver.com/robots.txt):
User: *
Disallow:/
Allow all bots to access your Web page
User: *
Disallow:
(Another method: Create an empty "/robots.txt" file, or do not use robot.txt.) )
Block or delete a Web page using the robots.txt file
You can use the robots.txt file to prevent Googlebot from crawling pages on your site. For example, if you are manually creating a robots.txt file to prevent Googlebot from crawling all pages in a particular directory (for example, private), you can use the following robots.txt entries:
User-agent:googlebot
Disallow:/private
To prevent Googlebot from grabbing all the files for a particular file type (for example,. gif), use the following robots.txt entries:
User-agent:googlebot
Disallow:/*.gif
To prevent Googlebot from crawling all inclusions? URL (specifically, this URL starts with your domain name, takes over the meaning string, then a question mark, and then any string), and you can use the following entry:
User-agent:googlebot
Disallow:/*?
Although we do not crawl or index the content of Web pages that are blocked by robots.txt, we still crawl the Web site and index it if we find it on other pages on the web. Therefore, Web sites and other public information, such as the location text in links to the site, may appear in Google search results. However, the content on your Web page is not crawled, indexed, and displayed.
As part of webmaster Tools, Google provides robots.txt analytics tools. It can read the file in the same way as Googlebot read the robots.txt file and can provide results for Google user-agents, such as Googlebot. We strongly recommend that you use it. Before creating a robots.txt file, it is important to consider which content can be searched by the user and which should not be searched. In this way, through reasonable use of robots.txt, search engines to bring users to your site, but also to ensure that the privacy information is not included.