Webmaster want to know the search protocol carefully fell on the robots.txt file

Source: Internet
Author: User
Keywords Robots.txt

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

As the saying goes: Understand the technology is not necessarily understand SEO, understand the uncertainty of SEO technology, but for stationmaster, must choose to know the most basic technology, do not require to understand advanced language, but the basic search engine agreement is needed to know. Communicate with the webmaster when found that a lot of webmaster can not grasp the right search agreement, especially in the robots.txt file, Beijing part-time bar to everyone said that this agreement in the end abstruse where.

The robots.txt file is a plain text file below the root directory of the Web site. is the spider index site when the first to read the file, it is used to specify the spider on your site crawl range, when the site has no need to be spiders crawl content, you can use the robots.txt file to add shielding, so as not to increase the site spiders crawl difficulty. But this magic file, can help you better optimize the site, you can also make a part of the webmaster fell on it.

First, write the file format correctly. This file must be named Robots.txt, must be prevented to the root directory of the Web site, must be accessible through http://www.jianzhi8.com/robots.txt, as long as the spider comes to the site index, first crawl the file, and then index the other site according to the index of this file. To correctly write to the file, the first page to the basic functions of the file to understand:

User: In the back add * means to allow any search engine to crawl the site; User-agent:baiduspider meaning is to allow Baidu search engine to crawl the site, if the site only wrote this attribute, it shows that this site only accept Baidu spiders crawl, other spiders can not index.

Disallow: This function means static spider access function after the Web site directory, if all the site is allowed to write: If all are allowed to include: Disallow:, like the paper in Taobao, the ban Baidu Spiders crawl, the site is not allowed to crawl any location.

User-agent:baiduspider

Disallow:/

Disallow function of the difference, the impact on the site is huge, all webmaster must learn to write the correct file, at least the above mentioned function to be clear. Let's look at the following:

User: *

Disallow:/not allowed to scratch any directory on the site

Allow:/Allow crawl

The above of the file to write a serious error, the signature is not allowed spiders crawl the site behind any directory to write is to allow all Search engines crawl Site page, for such a file format, such as Spider database updates, the site will become 0 of the collection, Not because of the site's optimization means error, nor because the site use cheating, but webmaster accidentally fell on a file, this loss is eating big slightly.

The reasonable use of robots files can help the site better included, but the wrong way of writing may lead to serious losses. To the webmaster to say a few ways to improve the optimization by using a robots file:

1, reduce the repeated collection, check the site will be found, spiders will repeat the content of the site, not only can be included. HTML, can also be included with *?* dynamic page, at this time we can use a robots file screen, to avoid repeated collection: Disallow:/*?*.

2, only allow access to the URL introduced by. htm, the site sometimes because of program problems, there will be more than one path to access the page, but in order to allow spiders to include and crawl. htm describes the URL, you can use "$" in the file to restrict access to Url,allow:/*.htm$,disallow:/, This allows access only to URLs that have a suffix of ". htm".

3, some websites in order to enhance the user experience degree, will add a large number of dynamic images to the site, but these pictures are difficult to be crawled by spiders, these pictures on the site's optimization is not significant, at this time and can be added to the file function: Disallow:/*.gif$, and can be prohibited spiders crawl. gif format pictures.

The development of a robots file has been 10 years, many search engines have already recognized the document, the correct file use can better help webmaster optimization site, these search protocols really is not a technical thing, whether the novice webmaster or veteran, these basic things should know, Otherwise improper use will have a great impact. Beijing Part-time Bar (http://bj.jianzhi8.com) hope that the owners of leisure time or to learn some technology, so as not to stumble over the injustice without words.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.