Not reliable.
The stuffed bun is not reliable, the stuffed bun stuffing report is more unreliable.
The Korean team is not reliable, the Japanese team is not reliable, the Chinese team off the chain is reliable.
17tech said Lu Bowang's report is not reliable, Lu Bowang said CNNIC report is not reliable, Yahoo feel Eric report not reliable.
Sogou think the report is not reliable, Baidu said, you are not reliable, I will do the report.
This is the end of my last article, "whose words are not reliable".
There are so many things that are not reliable recently and too many people who are not reliable. However, Luebbe's report is due to rounding up a 0.1% error, encountered a query, the author seems quite a bit wronged, and his first it Longmen on the presentation of the CNNIC report defects, there is some reason. After several search reports (Eric and Yi) have been questioned about the results of the survey, it seems that the search report is really not a lot of reliable.
Baidu is really starting to make a report. Recently, Baidu opened a new two-level channel: Baidu Data Research Center (data.baidu.com), which has six industry reports available for download, respectively, banking industry, fund industry, network games, automobile industry, cosmetics and household appliances industry.
Network Survey Advantages of large web sites
I have some confidence in the consulting industry of big websites, they can at least be assured of the number of samples in the data source, which supposedly should be more reliable. Luebbe, in answer to my question about the cost of investigation, said that although the telephone survey cost significantly lower than the household investigation, but also produced some errors, the investigation was limited by funds and produced some errors. In my opinion, as long as cnnic the number of people in China to survey the Internet, other Internet users more in-depth investigation can be completed through the network questionnaire. The network investigation cost is extremely low, but the investigation sample can increase greatly, the data accuracy also has the assurance. As a result, large web sites have a certain advantage in data collection.
is Baidu's report reliable?
Baidu Although all of a sudden compiled 6 of this report, people did not see the relevant network questionnaire, then Baidu is through what method to obtain data? The author has opened a report on how the Survey method is described:
Search engines can capture the user's active needs of the text expression (that is, keyword query), so the user's real needs of the grasp than the portal more dominant. By Cookie tracking, you can hook up the key words with specific requirements. We assume that each cookie represents a potential consumer (technically and statistically, while the technical background can block cookies used by multiple users, such as an internet café cookie), the cookie's collection of keywords in a given period of time can fully reflect the information he or she is concerned about. We carry out system coding, cleaning and analysis of these information, cross-analysis the keyword of different Cookie search, discover the group behavior of Netizen searching, finally integrate into industry report.
The original Baidu is relying on the user's cookie and the key word collation analysis generated by the report. So Baidu's report is reliable?
From the number of samples, Baidu is China's highest flow of the site, the number of visitors per day hundreds of millions of times, Baidu reported that the number of samples reached the million order. Visible, the number of users surveyed is indeed very considerable.
But from the survey method, I think there is a certain loophole.
First, Baidu's technology behind the background "shielding many users of cookies," then means that the Internet café users of the statistics are removed. But the proportion of domestic netizens in Internet cafes is 1/3 strong, absolute quantity is huge. Removing this information can lead to deviations in results, most notably on online gaming.
Second, the domestic Internet users also commonly exist in a family of multiple users, such as a shared computer between the couple, the family shared a computer situation (I think these are also more common);
Third, some users use more than one computer, such as the author of the Office and the family use of different computers, according to the statistics of Baidu, these users of the data are repeated calculation, and this is the statistical taboo.
Its four, Baidu all users of the survey results are based on Baidu search users. Non-search engine users or non-Baidu users, it is difficult to statistics, this part of the user's conservative estimate is also around 1/3 (according to CNNIC search engine data).
As a result, Baidu's report, like Cnnic's report, still has some loopholes in its computational methods, although large sampling data can compensate for some of these deficiencies. The author thinks that the report of Baidu only has certain reference meaning, and can not completely reflect industry panorama.
Of course, the author is more concerned about how Baidu is using our cookies to produce reports, Baidu is not moved our cookies, moved our cake.
Baidu, please don't touch my cake.
Cookies, English refers to the milk to eat snacks, I directly translated as "cake." In the Internet, the word "cookie" has a completely different meaning. A "cookie" is a small amount of information that is sent by a network server to store on a Web browser so that the next time a unique visitor returns to the network server, it can be read back from that browser.
Cookies can keep the login information to the user's next session with the server, in other words, the next time you visit the same site, users will find that they are logged in without having to enter a username and password. We often find this situation when we log on to the forum.
in Baidu's report presentation, cookies were specifically mentioned:
By "keyword combination" to determine the individual cookies represent personal information, such as gender orientation, income range, industry-related preferences, we have a "group" of various needs, such as a new graduate (21-23 years old) the overall consumption characteristics of women.
But Baidu does not use the specific user's personal information as the analysis object, a Cookie is just a bridge to collect requirements (key words), it represents the person is actually a virtual person associated with the actual needs, we have no idea who he (she) is, there is no contact method, but through their search trajectory, We can know his (her) needs.
It seems that cookies in Baidu's report is a very crucial link. It can make Baidu does not use the traditional market survey questionnaire, based on the analysis of user cookies, can collect a variety of personal information: through the people's search trajectory, and get the user's needs information, and finally form a report.
the cookie data that Baidu uses should have the following two kinds:
The first is that in most cases, people do not login Baidu search, because did not enter the account number and password, Baidu can not through the user ID and password to distinguish between the individual survey (so-called virtual people), in order to be able to distinguish, Baidu has to record users of some computer information, such as intranet IP, operating system information, Browser information and so on, otherwise you will not be able to distinguish between different users.
The second type of user login in the case (bar, know the blog and other products) use of search engines, Baidu through the cookie record of the user's ID and other information to distinguish between different objects. This kind of user information in Baidu's database may be more detailed, including users to the site to provide age, gender, occupation, and so very comprehensive information.
Only after the completion of this step, Baidu will be able to collect the people to retrieve the track to classify, analysis of the results, otherwise, only a bunch of keywords, there is no meaning.