The writing of robots.txt Prohibition of inclusion agreement

Source: Internet
Author: User
1. What is robots.txt?
Robots.txt is a plain text file for website and search engine protocols. When a search engine spider comes to visit the site, it first crawls to check if there is a robots.txt in the root directory of the site,
If it exists, the access is determined according to the contents of the file, and if not, the spider crawls along the link. Robots.txt is placed in the root directory of the project.

2. Robots.txt Grammar

1 Allow all search engines to access all parts of the site
Robots.txt is written as follows:
User-agent: *
Disallow:
Or
User-agent: *
Allow:/

Note: 1. The first English to uppercase, the colon is in English state, after the colon has a space, these points must not write wrong.

2 prohibit all search engines from accessing all parts of the site
Robots.txt is written as follows:
User-agent: *
Disallow:/

3 only need to prohibit the spider access to a directory, such as the prohibition of admin, CSS, images and other directories are indexed
Robots.txt is written as follows:
User-agent: *
Disallow:/css/
Disallow:/admin/
Disallow:/images/

Note: The path is followed by a slash and no slashes: for example disallow:/images/has a slash is prohibited to crawl images the entire folder, disallow:/images No slash means that all the path inside the/images keyword will be shielded 4) shielding a Folder/templets, but can also crawl one of the file's writing:/templets/main
Robots.txt is written as follows:
User-agent: *
Disallow:/templets

Allow:/main

5 Prohibit access to all URLs under the ". php" suffix in the/html/directory (including subdirectories)

Robots.txt is written as follows:
User-agent: *
Disallow:/html/*.php

6 only allow access to a file with a suffix in a directory, use "$"
Robots.txt is written as follows:
User-agent: *
Allow:. html$
Disallow:/
7 Disable indexing of all dynamic pages in the site
For example, the limit here is "?" The domain name, such as Index.php?id=1
Robots.txt is written as follows:
User-agent: *
Disallow:/*?*


8 prohibit search engine to crawl all the pictures on our website (if your site uses the name of the other suffix, you can add it directly here)
Sometimes, in order to save the server resources, we need to prohibit all kinds of search engines to index our site pictures, the method here in addition to the use of "disallow:/images/" such as direct shielding folder, you can also take the direct screen image suffix name of the way.
Robots.txt is written as follows:
User-agent: *
Disallow:. jpg$
Disallow:. jpeg$
Disallow:. gif$
Disallow:. png$
Disallow:. bmp$


Write robots.txt to pay attention to the place
1. The first English to capitalize, the colon is in English state, after the colon has a space, these points must not write wrong.
2. Slash:/On behalf of the entire site
3. If a space is appended to the "/", the entire Web site is blocked
4. Do not prohibit the normal content
5. The effective time is a few days to two months



The following case has a gray line of words to show that robots.txt is playing a role. Only included in the site's address bar:






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.