How to limit the spider's crawl flow to enhance the SEO effect

Last Update:2014-12-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

If the site is using virtual space, then there will be a certain amount of traffic restrictions, if most of the traffic is occupied by spiders, then we need to spend some extra money to purchase traffic. So if a site a lot of traffic are spiders crawl waste, what skills and methods can be limited without affecting the SEO effect? Gioda that we can use the following ways:

1, to find false spiders for IP shielding. Through the site log analysis, we can know that in fact, many so-called Baidu Spider or Google spider is actually false, we can resolve these false spiders IP shielding, so not only can save traffic can also reduce the risk of site acquisition. Specific operations need to reverse detect IP is not a real spider, the operation method is: Click the beginning of the lower left corner-run-cmd-input command nslookup IP return can see the results. If it's true, spiders have a spider tag, and fake spiders have no markings.

2, shielding ineffective spiders or small search spiders for SEO effects. For example, we know that Google spiders are grasping very large, but for many industries, Google's traffic is very low, SEO effect is not good, so you can screen Google Spiders crawl and save a lot of traffic, such as beautiful said the site will be blocked Google spiders crawl. In addition to Google, there are some spiders such as Pangu search, Bing Spiders, and so on, these flows are very low, or almost no effect of the spider can actually shield off.

3, use robots to limit the crawl of invalid pages or duplicate pages. Some of the pages may have existed before but now there is no, or there is a dynamic and static URLs exist, because there is a reverse link or the database has such a link, the spider will still be crawling, we can find the return of the URL of 404 pages, these URLs are shielded, This will not only improve the capture screen but also reduce the flow of waste.

4, limit the content of the page crawl to improve the crawl efficiency and crawl speed, reduce the crawl flow. For any page, there are a lot of invalid noise areas, such as a site login, registration part, the most the following copyright information and some helpful link navigation, or some templates on the existence of some of the spider can not be identified with the display module, and so on, which we can use Add noffollow tag or Ajax, JS and other methods to limit or mask the crawl, reduce the amount of fetching.

5, external call or CDN acceleration to improve spiders crawl, reduce server response and traffic waste. Most of the current Web site using a large number of pictures, video and other multimedia to display, and these pictures need more download traffic, if we use the external call to the picture, then you can save a lot of spiders crawl traffic. At present, the better way is to put pictures on other servers or upload to some network disk can be.

6, the use of webmaster tools to limit or improve the spiders crawl, or limit the time spiders crawl. At present, Baidu Webmaster platform and Google Webmaster platform has a webmaster grab tools, can be used to limit the spider crawl time and grab amount, we can according to the need for reasonable deployment, to achieve the best results.

The above is a summary of how to improve the spider crawl efficiency, reduce the amount of grasping some methods, of course, in practice, we can also be based on the actual needs of their own to solve, for example, some of the large number of fetching columns in the Sitemap set a lower crawl frequency, For some important content if the inclusion of bad words can also increase the chain or internal chain to improve the crawl, etc., the method is dead, we can according to specific renewal to rationalize the settings to achieve less crawl and higher crawl efficiency. This article by the Origin SEO forum http://www.wocaoseo.com/stationmaster Feeds, thanks A5 provides the publishing platform.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More