On the writing of website optimization robots.txt documents

Source: Internet
Author: User
Tags root directory

robots.txt files, more or less than friends have heard, may have written their own. In fact, so far I have not written robots.txt file, not not to write, just feel that there is nothing in the blog to stop Spiders crawl. And presumably everyone knows that the probability of a dead link in a personal independent blog should be very small, and it does not require too much dead link processing so I don't feel necessary. However, robots.txt file writing as a personal webmaster must have mastered one of the skills, its usefulness is still very wide. Here on the detailed introduction, is also considered their own revision.

What is a robots.txt file

We see from this file name, it is a. txt suffix name, you should also know that this is a text file, that is, Notepad. Robots, know some English people should be people, is the meaning of robots, for us this robot represents the search engine robot, from the name can guess that this file is our special to the spider to see. Its role is to tell spiders, those columns or those pages do not need to crawl, of course, can also directly block the visit of a spider. Note that this file is placed in the root directory of the site to ensure that the spider can read the contents of the file at the first time.

The role of a robots file

In fact, the most commonly used in the robots file is to screen the dead link within the site. We should know that a site dead link more will affect the weight of the site. But Web site dead link cleanup Although it's not a problem, still need to spend a lot of time, especially if the station dead link more cases, clean up very laborious, this time the usefulness of the document is embodied, we can directly to these dead link in the format of the file to prevent spiders crawl, Want to clean up or later in the slowly clean up. Some Web site content contains some webmaster do not want spiders to crawl the URL or file, can also be directly shielded. For shielding spiders, the commonly used are relatively few.

The writing of a robots file

This should be more important. If you write a wrong to shield the unsuccessful, want to be crawled but write in their own can not be found in time can be a big loss. First we need to know the two tags, allow and disallow, one is allowed, one is not allowed, and its role is more than everyone can understand.

User-agent: *

Disallow:

Or

User-agent: *

Allow:

These two paragraphs are all allowed to crawl all, actually shielding URLs and files are used disallow tags, unless your site only a few want to be crawled using allow tags. This user-agent: The following is the name of the spider, we are the mainstream search engine spider name should be more familiar. The following search spider as an example: Sosospider.

When we want to block the search spider:

User-agent:sosospider

Disallow:/

We can find that this shielding spider with the above allows only one more "/", its meaning has been changed, so in the writing of the time to pay attention to, can not because of more than writing a backslash shielding the spider itself is not aware of. There is in the user-agent: behind the disdain to fix the name of the spider and if it is "*" means for all spiders.

To prevent a directory from being crawled by search engine spiders, set the code as follows:

User-agent: *

Disallow:/directory/

Note that if this is to prevent crawling a directory name must pay attention to "/", without "/" is to prevent access to this directory pages and directories under the page, and with "/" to represent the content page in the block directory, these two points should be clearly divided. If you want to block multiple directories, you need to use

User-agent: *

Disallow:/directory 1/

Disallow:/directory 2/

Such a form cannot be used/directory 1/directory 2/such form.

If you are preventing a spider from accessing a file of a certain type, such as preventing a picture from crawling. jpg format can be set to:

User-agent: *

Disallow:. jpg$

The above is the Shanghai SEO Pony for the entire website of the paper, just the type of typing and attention, such as targeted shielding spiders or other specific writing described less point, But knowing the meaning of allow and disallow can give rise to many other meanings. There is also a specific page of the Web page of the tag to be written, but in general, not much.

Above by Shanghai SEO Pony http://www.mjlseo.com/Finishing, reprint please specify, thank you



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.