Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
As we all know, Twitter in the previous period, the PR value from 9 to 0, and many pages are not Google robot crawl, all the uproar! Although it has now been restored. But what's going on here? Beijing website Optimization Research Center editor Solitary According to the wind quote Tanioku content for everyone to decrypt the Twitter site turmoil truth!
First, Twitter PR and many pages are not crawled by their own technology, and Google's search mechanism does not matter.
There are five major technical issues that have led to Twitter's turmoil:
(1) The problem of setting up the robots
Twitter set up two standards for robots with www and no www. as follows: (A for the set of robots without WWW, b for the robots with WWW settings)
A:The file at Twitter.com/robots.txt looks as follows:
#Google Search Engine Robot
User-agent:googlebot
# crawl-delay:10--Googlebot ignores Crawl-delay FTL
Allow:/*?*_escaped_fragment_
Disallow:/*?
Disallow:/*/with_friends
#Yahoo! Search Engine Robot
User-agent:slurp
Crawl-delay:1
Disallow:/*?
Disallow:/*/with_friends
#Microsoft Search Engine Robot
User-agent:msnbot
Disallow:/*?
Disallow:/*/with_friends
# Every bot that might possibly read and respect this file.
User: *
Disallow:/*?
Disallow:/*/with_friends
Disallow:/oauth
Disallow:/1/oauth
B:the file at Www.twitter.com/robots.txt file looks as follows:
User: *
Disallow:/
Twitter sets two different standards for robots with www and no www, so you can see that:
1, for the Web site with www and without WWW, search engines according to the standards of robots, the return of the search results are not the same, but also non-standard.
2. Twitter is blocking search engines from crawling Web sites with www.
3, through the blockade with WWW website, even if its 301 redirect to the website without www, it is futile!
4, as with the WWW and without the WWW, there are external links, but with the WWW to prevent the search engine crawl, then to enhance the overall weight of Twitter site, the external link value is not effectively used!
(2) 302 redirect Problem
Twitter.com/vanessafox used 302 redirects to twitter.com/#!/Vanessafox. As we all know, 302 redirect for the temporary transfer, will crawl new content and save the old URL, and the original link will not all transfer!
(3) Failed to comply with Google Ajax crawl standards
The Twitter URL is Ajax and uses #! to tell Google to get the _escaped_fragment_ version URL from the server. Without the use of 301 redirects, many related URL pages are lost, and Ajax and redirection are not well combined.
(4) Rate limit
The rate limit can be seen in the HTTP headers.
http/1.1 Okdate:mon 20:48:44 gmtserver:histatus:200 okx-transaction: 1311022124-32783-45463x-ratelimit-limit:1000
(5) Web site is not standardized
Twitter.com/vanessafox Display search results, is twitter.com/vanessafox. These two URLs cause the same place. This is caused again by PageRank dilution, repetition, and normalization problems. The best way here is to normalize the URLs for a change
(The simplest way is to select all lowercase), and then 301 redirects all changes. In addition, Twitter can simply add rel = Spec attributes to specify all pages of the spec version.
As you can see from the above, Twitter is in trouble with Robots.txt,http status codes and URL normalization. This is a problem for many large web sites. As for Google PR toolbar is refreshing, it is Google's problem.
From the Twitter technology loophole can warn webmaster:
(1) Set the robots must be unified, centralized weight, in order to prevent lost. (2) 302 redirect No 301 redirect Good (3) must comply with Google Ajax Crawl standard (4) do not in the HTTP header limit rate (5) URL to standardize, unified, can not be mixed together at random!
The first source of the communication in accordance with the Wind SEO Center (www.seo0359.com) stationmaster alone in accordance with the wind original! If reproduced, please keep the original link address! Internet Elite Group: 76933546, invites the majority of well-known webmaster to join!