Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Just wrote a article recalled the new Webmaster optimization process easy to make a small mistake, mentioned in the middle of the Robots.txt robot protocol, can not help but want to talk about the use of Robots.txt Robot protocol skills. Website optimization is a long-term work, but also an interactive work, behind closed doors will not be a good webmaster, so I hope we can communicate a lot. deficiencies, please corrections.
No more nonsense, let's get to the point. Robots.txt Robot protocol We all know is in order to standardize the spider crawling and set, we generally will prohibit the spider crawling data, TMP these directories, will also prohibit its crawling members, orders, inventory and other modules. But in addition to these conventional usage, Robots.txt robot protocol for our optimization work in fact, there are a lot of small skills can be used together, so that our website optimization work better.
One, through the prohibition of crawling dynamic pages or some pages, reduce the repeated collection
The first is that many people are aware of the ban on spiders crawling dynamic pages, reducing the entire site of repeated collection problems. The advantage is to benefit the whole site content page weight concentration, not easy because of repeated collection of content page weight scattered and so on. This is the technique for the general meaning of the site, but for the mall station, information station, question and answer station, such as large web sites, this normative significance is very large.
1, mall, business-to-business and other large Web site conditions Filter page
In the mall, Business-to-business and other large Web sites, often involves the problem of conditional filtering, that is, by deleting the product specifications, brands and so on will appear a large number of similar pages. If this problem can not be effectively resolved on the site will be a lot of similar content is repeatedly included, and so on, in general, this problem can be used for the line of some URL shielding work, or consider using AJAX form. But the effect, not directly using the Robots.txt robot protocol effect is good, recommended or in the URL static rules do well on the basis of the robots.txt to prohibit crawling dynamic pages to deal with.
2, Information station review page
Information Station page comments similar to the Conditional filter page, you also need to robots.txt with the URL rule settings to screen out the dynamic page, to prevent duplication and other issues.
3. Other similar situations
In business-to-business, recruitment, Granville Guest site will also have similar problems, these situations can use robots.txt to effectively standardize the spider crawling, so as to avoid repeated collection and so on.
Second, to induce spiders crawl important pages, improve spider crawling efficiency
The trick is to work with site maps and aggregate page tags to provide more access to these important pages to facilitate spider crawling. Site map, aggregation page label disguised list page and so is the spider crawling the most convenient place, through the robots.txt Protocol allow command, let spiders priority crawling these pages, naturally included in the situation is better.
Third, adjust the weight distribution site
Through the robots.txt protocol can effectively adjust the overall weight of the site, with nofflow tags, such as the use of weights to guide the site's key columns or key pages to achieve a reasonable allocation of the weight of the whole station.
Iv. approach to the edge
One of the first to improve the relevance of a website is to use the robots.txt protocol. It in the root directory to prevent a large number of TXT documents, in the TXT document embedded in large quantities of keywords, and then through the robots.txt to induce spiders to crawl these directories. Of course, this is not to let everyone also do so, after all, black hat means not long, we are talking about security tips.
Here do not know if there is no webmaster site was hung horse, especially by the parasite has been rampant webmaster is very sad and indignant bar. But let's change the way of thinking, parasites this method, also may not be we do a Web page included a good way. That is, through the design of a number of page templates, and then through the program batch generated a large number of pages, the pages will be placed in accordance with the rules of the new directory, through the robots.txt to induce spiders crawling, the effect is also very good. Of course, this means to do, the page template must do very good can, otherwise it will be a great impact on the user experience, I hope the webmaster attention.