Webmasters must not ignore the use of robots

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

I have been emphasizing the details of the optimization, is now the site of Baidu's requirements is to see your details do well, code, tags and so on have the details, then the robots are also part of the site details, do a good job he has a great help to our website, There may be a lot of new owners do not know what the robots are, below I will give you a few words about the operation of the robots.

The origin of Robots.txt

We must first understand that the robots are not a command or instruction, robots is a website and search engine of the third party agreement, the content of the agreement is Robots.txt content, early in the site is used for privacy protection, he is present in our site root directory of a TXT file.

Ii. the role of Robots.txt

We do a good job on the website online, there will be a lot of irresistible factors are put out by the search engine, which led to the overall decline in the quality of our web pages, resulting in the image of our site in the search engine worse, the role of robots is to shield these irresistible factors do not let spiders release them So which pages should we screen specifically?

1. Shielding some of the content of the page: for you to give an example of the clear, such as: registration page, landing page, shopping page, post page, Message page, search home, if you do 404 error page also want to block.

2. Screen Repeat page: if we found that our site has two content of the same page, but the path is different, we will use a shield screen, spiders will crawl but will not be released, we can in Google Webmaster tools directly to see the number of blocked pages.

3. Block some dead link pages

We can only block those pages with normal features, spiders climb can not mean that spiders can not crawl the address, to be able to crawl to the address and can be crawled to the two concepts, of course, we could handle the dead link we do not need to shield, can not handle such as our path caused by the dead link we need to shield.

4. Shielding some of the longer path: more than the long path of the URL input box we can use a robots shielding.

Iii. Use of Robots.txt

The establishment of 1.robots.txt

Create a new Notepad file locally, name it Robots.txt, and put the file in our root directory so that our Robots.txt is complete, and some open source programs such as dream weaving are self-contained, and we can just download from the root directory when we modify it.

2. Common syntax

User This syntax is to define the search engine crawl program. Disallow this is forbidden meaning. Allow this is the meaning of permission.

We'll get to know the search engine crawler, the spider or the robot.

Baidu Spider We write baiduspider in the robot and Google robots we write Googlebot

Let's introduce the writing, our first line is to define search engines first

User-agent:baiduspider (Note that we must have a space after the colon when we write the robots, and if we want to define all the search engines we need to use the * band instead of Baiduspider)

Disallow:/admin/

The meaning of this sentence is to tell Baidu Spider you do not collect my site's admin folder in the pages, if we put the admin behind the slash to remove this meaning is completely changed, meaning to tell Baidu Spider you do not include my root directory in all the Admin folder in the page.

Allow means to allow, not prohibit, generally will not use it alone, he will work with disallow, together with the purpose is to facilitate the use of directory shielding flexible application, but also to reduce the use of code, for example, we/seo/folder has 100,000 files, There are two files that need to be crawled, we can not write tens of thousands of code that will be tired, we together only need a few lines.

User: * (define all Search engines)

Disallow:/seo/(Prohibit the collection of SEO folders)

Allow:/seo/ccc.php

Allow:/seo/ab.html

At the same time allow these two files need to crawl included, so we four lines of code to solve, some people will ask whether disallow put in front or allow put in front, or disallow placed in front more standard.

This article by http://www.51diaoche.net Original welcome reprint please indicate the original author

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.