Random modification robots.txt Careful search engine does not include your station

Source: Internet
Author: User
Keywords Robots.txt

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Robots.txt is used to tell the bot access, because the Robots.txt protocol is not a specification, but is only established, usually most search engines will recognize this metadata, do not index this page, and the page is linked to the page, using the method is to put the robots.txt file in the site root directory. For example, when a search engine accesses a Web site (such as http://www.admin5.com), it usually checks the site for the existence of the robots.txt file, and if the robot finds it, it will determine the scope of its access according to the contents of the file. Gossip, get to the point:

After my test, for a compliance with the Robots.txt agreement of spiders, if you modify Robots.txt too frequently, can directly lead to your station "cold" Processing! My station has entered the black hole, before hundred because I do not want to let Baidu spider List of items on the old page, Directly b Baidu Spider visit this page, after about 3 days, the effect is obvious, small co also anxi this spider really obedient, and then think of the site before the use of several pages. Now also can not use, and has been the search engine has included more than n pages, now together B, so modified the Robots.txt, this change does not matter, two in the past, small co no longer happy not to rise, the lovely Baidu spider no longer where diligent (before every day in the station climbed more than 20,000), Now less than 20,000 days, dizzy death. When you have to find the reason: said above, once B a certain page, spiders do not index this page, as well as the page link out of the page, I won this award! Many people released their own modified robots.txt file, but all ignore the spider's crawling law: not into a plate, according to human thinking, according to the order of the list of articles crawling, I from their own site, has confirmed this problem:

For example, the first article in the list after climbing, not to climb the second, but from the first content page of the "hot article" and so on to continue to crawl (this requires a better penetration of the station!). DZ official station from the collection page URL can be seen this point, if you want to screen/forum.php?mod=redirect* or/forum-redirect* such a path, may lead to most of the pages are not included.

Do not judge the rules of spider program by the habit of thinking. A lot of stationmaster put robots.txt after, found Baidu rarely included inside page of own, probably is this reason. Therefore, the use of robots.txt files to solve the problem of repeated collection, is very dangerous, from the procedure to do a fundamental solution is the best policy! Small Co is fooled. Hope to have friends with robots.txt!

(stationmaster net starts, author http://www.tok8.cn reprint please indicate source)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.