Robots.txt wrote a minefield, "poisoned very deeply" (i)

Source: Internet
Author: User
Keywords Write everyone Robots.txt

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Everyone has their own privacy, the site is also so, the site also has not want to be spiders see things, such as membership password and so on, and want to block this part of the privacy, in general, the webmaster will be the first time to think of Robots.txt. Yes, the use of Robots.txt files can definitely give the spider the road, tell it where to go, where not to go, and precisely because of this reason, many webmasters are keen to use Robots.txt, but we really will use Robots.txt? How much do you know about its rules? I am not, but for this aspect is always a bit of experience, in this share out, let everyone have a reference, of course, old birds can fly.

The sequence of Robots.txt writing

For the simplest example, if you want to get the b.html in a file to be crawled, how do you write it?

Allow:/a/b.html disallow:/a/or disallow:/a/allow:/a/b.html?

In Robots.txt writing rules, it does not perform a tree parsing structure, in other words, the parent folder should not be placed at the top of the script, instead, we follow the principle of proximity, and if we put the parent folder in front of it, the spider will think that the path is closed and inaccessible, and our ultimate goal is to visit, so that the goal and reality are very much the same.

The open question of Robots.txt writing

Many stationmaster, In particular, novice webmaster for Robots.txt understanding too one-sided, they think that since Robots.txt can provide access to the spider, then why we are not open, all the files are set to be accessible, so that the site's collection soon rose immediately, in fact, the problem is far from our imagination simple, we all know that some of the site fixed files are not required to transmit To search engine access, if we put the site "all-round open", the consequence is to increase the Web server load, reduce the speed of the visit, slowing the spider's crawling rate, for the site included no use, so for fixed no need to access the file, we directly disallow off on it.

In general, the Web site does not need to access the files have background management files, program scripts, attachments, database files, and so on.

Repetitive issues in Robots.txt writing

We are writing the original content every day, and then updated to their own website, we have thought about what is the purpose of this? Of course, in order to get a good search engine, we all know that the search engine is very important to the original content, the original content of the collection quickly, on the contrary, if your site is full of a large number of copied content, Then I can only regret to tell you that the future of the site is slim. However, this also tells us to actively use the robots file to prohibit duplicate page code, reduce the duplication of the page, but in the writing of the document must remember

After user to join a search engine, such as User-agent:baiduspider disallow:/, if not joined, written as User: * Disallow:/form, is the site of all content "shielding."

Meta issues prepared by Robots.txt

In Robots.txt write rules, there is a rule of the strongest parameters, and if the site and the page label on both robots.txt files and meta tags, then the search engine will obey the more stringent two rules, that is, prohibit the search engine for a page index, of course, if robots.txt files and meta tags are not appear A file, the search engine follows the rules of proximity and indexes all the files before the meta tag.

Details of Robots.txt Writing

1. Anti-slash insertion

Or take disallow:/a/b.html as an example, when writing the statement, if you forget to add a backslash, it is open to all content, which is inconsistent with the idea of writing statements, because the meaning of the backslash is the root directory.

2. The appearance of space

The appearance of the space is very good understanding, because the search engine for this special symbol of the appearance does not recognize, it will only make statements lost due effect.

Now you understand why I say someone knows robots.txt file, but few people know how to correctly write the robots.txt file. In fact, robots.txt file will be other details and attention to the problem, in the next time I will continue and say robots.txt file written in the directory size, the number of user and so on.

This article is from mobile wallpaper net http://www.sjbzz.com, welcome reprint, Thank you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.