Talking about the use of Robots.txt robot protocol

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Just wrote a article recalled the new Webmaster optimization process easy to make a small mistake, mentioned in the middle of the Robots.txt robot protocol, can not help but want to talk about the use of Robots.txt Robot protocol skills. Website optimization is a long-term work, but also an interactive work, behind closed doors will not be a good webmaster, so I hope we can communicate a lot. deficiencies, please corrections.

No more nonsense, let's get to the point. Robots.txt Robot protocol We all know is in order to standardize the spider crawling and set, we generally will prohibit the spider crawling data, TMP these directories, will also prohibit its crawling members, orders, inventory and other modules. But in addition to these conventional usage, Robots.txt robot protocol for our optimization work in fact, there are a lot of small skills can be used together, so that our website optimization work better.

One, through the prohibition of crawling dynamic pages or some pages, reduce the repeated collection

The first is that many people are aware of the ban on spiders crawling dynamic pages, reducing the entire site of repeated collection problems. The advantage is to benefit the whole site content page weight concentration, not easy because of repeated collection of content page weight scattered and so on. This is the technique for the general meaning of the site, but for the mall station, information station, question and answer station, such as large web sites, this normative significance is very large.

1, mall, business-to-business and other large Web site conditions Filter page

In the mall, Business-to-business and other large Web sites, often involves the problem of conditional filtering, that is, by deleting the product specifications, brands and so on will appear a large number of similar pages. If this problem can not be effectively resolved on the site will be a lot of similar content is repeatedly included, and so on, in general, this problem can be used for the line of some URL shielding work, or consider using AJAX form. But the effect, not directly using the Robots.txt robot protocol effect is good, recommended or in the URL static rules do well on the basis of the robots.txt to prohibit crawling dynamic pages to deal with.

2, Information station review page

Information Station page comments similar to the Conditional filter page, you also need to robots.txt with the URL rule settings to screen out the dynamic page, to prevent duplication and other issues.

3. Other similar situations

In business-to-business, recruitment, Granville Guest site will also have similar problems, these situations can use robots.txt to effectively standardize the spider crawling, so as to avoid repeated collection and so on.

Second, to induce spiders crawl important pages, improve spider crawling efficiency

The trick is to work with site maps and aggregate page tags to provide more access to these important pages to facilitate spider crawling. Site map, aggregation page label disguised list page and so is the spider crawling the most convenient place, through the robots.txt Protocol allow command, let spiders priority crawling these pages, naturally included in the situation is better.

Third, adjust the weight distribution site

Through the robots.txt protocol can effectively adjust the overall weight of the site, with nofflow tags, such as the use of weights to guide the site's key columns or key pages to achieve a reasonable allocation of the weight of the whole station.

Iv. approach to the edge

One of the first to improve the relevance of a website is to use the robots.txt protocol. It in the root directory to prevent a large number of TXT documents, in the TXT document embedded in large quantities of keywords, and then through the robots.txt to induce spiders to crawl these directories. Of course, this is not to let everyone also do so, after all, black hat means not long, we are talking about security tips.

Here do not know if there is no webmaster site was hung horse, especially by the parasite has been rampant webmaster is very sad and indignant bar. But let's change the way of thinking, parasites this method, also may not be we do a Web page included a good way. That is, through the design of a number of page templates, and then through the program batch generated a large number of pages, the pages will be placed in accordance with the rules of the new directory, through the robots.txt to induce spiders crawling, the effect is also very good. Of course, this means to do, the page template must do very good can, otherwise it will be a great impact on the user experience, I hope the webmaster attention.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.