Ten lessons to teach you how to avoid spider traps

Last Update:2014-12-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Do SEO children's shoes are aware that the site is indexed by search engines is the key, if your site is not indexed by search engines, then SEO can not talk about. So to the site can be found by search engines and Web pages can be normal collection of search engines, then first have to make their own site to facilitate the crawling spider. The search engine crawls the webpage the tool is called the spider or the reptile robot, English name namely robot. The spiders crawled along the hyperlinks to our many pages, but some pages are not climbing, some because the site itself has some unfavorable spider crawling, resulting in its corresponding page is difficult to be indexed by search engines, which formed a "spider trap", generally refers to the search engine unfriendly website production technology, These techniques are not conducive to spiders crawling our pages, in order to avoid this situation, small nameless summed up the following factors to prevent spiders into the trap.

1: The session ID of the page, and some sales of the site in order to analyze some of the user's information will be used to track the user's conversation ID, access to the site each user will be added to the sessions ID and added to the URL, the same spider every visit will also be treated as a new user, Each time the spider to visit the URL will add a session ID, which will produce the same page but different URLs, this will produce a copy of the content page, resulting in a highly repetitive content page, but also one of the most common spider traps. For example, some sites in order to improve sales performance, and play window sessions, such as Hello from xxx friends.

2: Common sense spider traps, the use of mandatory registration or login to access the page, this is very difficult for spiders, because spiders can not submit registration can not enter the user name and password login to see the content, for spiders, we directly click to see the content is also spiders can see the content.

3: Like the use of flash site, the reason is like the use of Flash site is because for some small and medium-sized enterprises, because the flash itself can do many kinds of effects, especially in navigation on the visual effect is strong, so many enterprises like to get a flash to show their company's strength, culture, Products, and even some of the site's website home is a flash, or after a long time through flash to another page, or the link through the Flash to let users click into a new page, but for spiders it is difficult to read the contents of Flash, So spiders are also hard to click on the links on Flash.

4: Dynamic URL, add too many symbols in the URL or URL parameters, etc., this spider trap I have mentioned in the URL optimization, although said that with the search engine technology development, dynamic URL for spiders crawling has become more and more is not a problem, but from the search engine friendliness, Static even pseudo static URLs are relatively better than dynamic URLs, you can see a lot of SEO peers in the URL of the processing mode.

5: Framework, which is used everywhere in the early framework, and now the framework of many sites have been very rarely used, one is because now with the development of the major CMS system, site maintenance is more and more simple, the early use of the framework of the site because of the maintenance of the Site page has a certain convenience, now it is not necessary, and is not conducive to the search engine is also the framework of less and more use of one of the reasons.

6:js, although now the search engine for JavaScript links can be tracked even in the attempt to disassemble the analysis, but we should not look at the search engine to overcome their difficulties, although said through JS can do some good navigation, but CSS can also do; In order to improve the site on the search engine friendly suggestions to make the Web page can be better spider crawling, try not to use JS, of course, in SEO, JS has a benefit is webmaster do not want to be included in the page or links can be used JS. Another way to eliminate JavaScript Spider program traps is to use < NoScript > tags. The < NoScript > tag is to provide alternative code for browsers that do not support JavaScript. Spider programs do not execute JavaScript, so they are replaced by processing < noscript > code.

7: Deep Web pages, some pages have no entrance, and distance from the homepage of the site is very far away, this kind of page is relatively difficult to crawl to spiders, of course, for those who have high weight of the site may be another matter. The page of the website to be included, first need the basic weight, the weight of the first page is generally the highest, and then the weight of the home page can be passed to the inner pages, when the weight of the internal page can be included in the threshold, the page will be included, according to this theory, the weight transfer between the pages will be diminishing, therefore, Within the page and the first page of the closer the distance, the easier to get more home weight transmission. A good site structure allows more Web pages to be included.

8: Mandatory use of cookies, for search engines is tantamount to the direct disabling of cookies, and some sites in order to achieve certain functions will take mandatory cookies, such as tracking the user access path, remember the user information, or even steal user privacy, If the user does not enable cookies when accessing such sites, the displayed page will be abnormal, so the same Web page cannot be accessed properly for spiders.

9: Various forms of jump, for 301 redirect believe that many SEO child shoes have been very familiar with, but for the other 302, or meta refresh,javascript,flash, such as jump, spiders are very objectionable, and 301 is not to be used when the last resort, Any jumps to some extent to the spider's crawling brings obstacles, so you know.

10:robots.txt writing errors and a variety of cheating methods, such as hidden text, hidden links, etc., using camouflage Web page in the judge whether the visitors are spiders or ordinary browsers and display different pages, using the default error of the 404 page, the same will bring crawling obstacles to spiders. This article source: Shenzhen SEO website: http://www.zhsem.com/Please respect original, reprint please specify, thank you!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More