The importance of robots.txt files within the site

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Many web sites in the FTP you will find a robots.txt file, a lot of webmaster only know this is to restrict the spider access to the file, this file is there any other role? Let's study together.

What is a robots file? This file is the search engine and the site between the communication bridge between the two agreed to a good grammar file, search engine every crawl a site, will first check this file, as if the key to open the door. If this file does not exist, then the search engine is not limited to crawl. If this file exists, it will be crawled according to the requirements in the file. Some webmaster will ask, we set up a website, must be included in the search engine, why also limit its crawl it? Search engine in the crawl process will search the whole station, your site may exist some of your collection of things, or similar to some no substantive things of the page, Then the search engine crawl will be on your site evaluation of a big discount, do not have the effect of SEO, but the robots file can tell spiders, which pages do not want it to see, but also indirectly reduce the load on the server.

This file has several points to note:

1. The file name cannot be written incorrectly and must be lowercase, and the suffix must be. txt.

2. Files must be placed in the root directory of the site, such as: Http://www.taofengyu.com/robots.txt, to be accessible.

3. File content syntax to be correct, in general, commonly used is user and disallow:

User-agent:* said to allow all search engine spiders crawl and included, if you do not want Baidu to include your site, the * replaced by the "Baiduspider", then the contents of the disallow limit will not be Baidu spiders crawl and included. If you want to limit the crawl whole station, then Disallow file is written "disallow:/", if you want to limit the files under a folder is not crawled, then write "disallow:/admin/", if you want to limit the capture of the file that begins with admin, then write " Disallow:/admin ", and so on, you want to limit the files under a folder for example, if you want to restrict the capture of index.htm files under the Admin folder, then Disallow syntax is written as" disallow:/admin/index.htm ". If the disallow is not followed by "/", it means that all pages of the site are allowed to crawl and collect.

General Web site background management files, program functions, database files, style sheet files, template files, sites, some pictures, etc., there is no need to let spiders crawl.

4. The disallow function must exist in the robots file, which is a key factor in the creation of this file.

Some of the importance and methods of the document is probably all these, hope to bring a little useful to everyone.

Article from Amoy Wind Feather Student Supplies Network http://www.taofengyu.com/reproduced please indicate the source, respect for the author of Labor

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.