The deeper the penetration of the Internet, the more smooth the world will be; the more transparent information, the more important the integrity --- question

All those who care about the search engine know that the search rate and search accuracy rate are two basic indicators to measure the search engine system, in many articles about the search engine technology evaluation, the search rate and search accuracy rate are frequently mentioned.

Search accuracy is a difficult indicator to quantify. There is no qualitative conclusion about how to judge the search result of a search engine and the intent of the user. Therefore, many search engines with the title of intelligence, socialization, and meta-search are also working in this direction. It is intriguing that most companies claim that their search engines are based on those technologies, but it does not tell us an acceptable measurement. Of course, this does not blame them.

The search rate can be quantified. The simplest measure is the index volume. There was a debate between Google and Yahoo: "The test proves that Google is still the king of search!" cowhide ". How did Google fight back this debate? As KESO said about Google's success in anti-commercial operations, Google soon revealed that it would reduce the size of the index library. Obviously, searching for the Internet is still an arduous goal.

Similar scenes are happening in China. The article published by Yahoo China is often seen on the "100 Internet news articles that must be read by IT people every day", with themes such as creating a blue ocean, blind search, and worm-catching activities, baidu is rarely heard of such news. Baidu's low profile may allow him to add a point closer to Google's image.

Who is the best Chinese search engine? In addition to the search rate and search accuracy rate, there has recently been another debate: "Who knows Chinese most ". Then, when who knows the Chinese most, sogou actually came up with the following sentence: "sogou understands the INTERNET better ". I think Yahoo market personnel may want to learn something from Pepsi's success.

As a search engine enthusiast, I am also very concerned about the search rate and search accuracy rate. So I am ready to test the indexing database of Yahoo China and Baidu. This is not a good task, but I decided to try it.

Test Method of the search rate: Basic Method: Sampling Test
It is implemented in two phases:
1. Test the index volume of the specified site (analyzed in this article)
2. Test the indexing of basic keywords (analyzed in the next article)

The data in this article is mainly a statistical analysis of the index volume of the specified site.

The basic information is divided into three parts:
1. The origin of the specified site, for fair consideration, the origin of the specified site are: http://site.baidu.com and http://site.yahoo.com.cn. A total of 4784, which can be downloaded from the following link.
2. Calculate the index volume of the site above, that is, use the site: domain method to obtain the index volume of the site by the search engine. In order to increase comparability, sites with 0 indexes are removed (the original data may be included in the attachment due to network errors ).
(The number of site changes in Baidu at the front-end time is abnormal, and now it is basically normal. In another article, we will analyze this phenomenon .)
3. Related Analysis.

The results are as follows:



(The analysis process is omitted, detailed data can be downloaded from here: http://www.search-analysis.com/baiduVSyahoochina-01.rar)

1. Baidu's index database is larger than Yahoo China's.
Among the 3793 most important sites that both parties considered: Baidu index volume: 1626829061, Yahoo China Index volume: 1018594668, high: 608234393, 0.6 billion higher.

2. Baidu's index volume distribution chart is close to "Long Tail", and the long tail curve is perfect. The closer the image is to the long tail, the better the architecture of the search engine index library.
(The reason is: "cobaini's Aesthetic opposition to tianchu is an important reason for his rejection of the Tolle system..."-Thomas Kuhn, the Copernican Revolution)

Obviously, the above analysis may not cover all aspects, so I am going to continue in-depth research from the following aspects:

2: [Baidu vs Yahoo China] Correlation Between the indexing volume and PR;
3: [Baidu vs Yahoo China] correlation between index volume and Alexa ranking;
4: [Baidu vs Yahoo China] how to test the expansion rate of the search index database;

