In fact, Quanzhou, SEO alone before the Shanhui teacher's "se actual password", which said the robots.txt just, personal feeling or very detailed, and did not study the large web site are how to set up, today think of, to analyze the domestic microblogging Sina, Tencent, Sohu, NetEase 4 Big Platform respective robots.txt file setting, how to write a robots.
1. Sina Weibo
Description: Allow all search engines to crawl
2. Tencent Weibo
Description: Allows all search engines to crawl except for some system files. and added two site map, one is a certified member of the Personal microblogging home page address, the other is the address of the microblog message. XML format site Map There is a limit to a map file can only list 50,000 URLs, a file can not be more than 10m, too many words to create a new site map, solitary Rattan deliberately to check the next Tencent Weibo's first XML map, the map file has about 41000 of the url,2m how big. After a while to see whether Tencent is also a new site map processing too many URLs.
3. Sohu Weibo
Sohu Weibo is the most interesting, because the rise of the first few months of fast keyword ranking by the use of Sohu Weibo itself is the high weight, and later the legend of Sohu Weibo shielding Baidu Spider, let us take a look at this robots.txt file. The first part of the statement is to allow Baidu Spider Crawl, the second part of the statement is to allow Sogou crawl, the third part of the statement is to prohibit all search engine crawl.
According to the official Baidu document--it is important to note that the order of the disallow and allow rows is meaningful, robot determines whether to access a URL based on the first allow or disallow row that matches successfully.
So the last part of the statement on Baidu and Sogou is invalid. That is to say, Sohu Weibo only allows Baidu and Sogou to crawl the page.
There is also a point is that the lone cane found Sohu micro bo robots.txt almost in June around the time to do a revision, shielding Baidu, Sogou other search engine crawl, but other search engines do still index, the volume is also constantly increasing, the difference is Google, Youdao, Bing is just index, Not included. Search does not seem to support the robot file or how, so there are snapshots, extract descriptive text. Yahoo also included, just quick to look after, can not determine whether it is just index.
4. NetEase Weibo
NetEase Weibo can not find a robots file
Take a look at the four blog platforms included:
|
Baidu Total included |
Baidu Day included (half day) |
Note |
Sina Micro Blog |
8.7 million |
6400 |
PR8 allows all search engines to crawl |
Tencent Micro Blog |
1.22 million |
10,500 |
PR6 allows all search engines to crawl |
Sohu Micro Blog |
25.8 million |
1580 |
PR6 allow Baidu, Sogou crawl |
NetEase Micro Blog |
537,000 |
792 |
PR6 not set limits |
From the above you can see Tencent micro-blog included in the amount of more than other micro-blog, day included in the rankings for Tencent Weibo > Sina Weibo > Sohu Weibo > NetEase Weibo
Tonight is also a whim, want to talk about these several microblogging, should rest, sleep a few hours, tomorrow morning to climb up. This article Quanzhou seo solitary rattan (www.gutengseo.com) Personal humble opinion, welcome to Pat Bricks.