The big data age has come, but big data is not omnipotent. Its core is not large, it contains the calculation and mode of thinking changes, too optimistic and simple understanding, can promote "big Data superstition"
Thanks to the development of mobile Internet and smart phones and smart wearable products, data such as behavior, location and physical characteristics can be easily recorded, making it possible to collect large data.
The value of this new data form, such as the magic trend prediction capability, has been widely discussed by the public and has become a selling point for many marketers. From automobiles, cosmetics to sports, it seems that all industries can use large data, accurate positioning, find consumers, forecast trends, to win the future.
In the view of supporters, the ability of large data is that every data point can be captured. The analysis of large data can deduce the astonishing accurate result, the classical sampling statistic method faces the elimination. At the same time, the data is big enough to speak for itself, "the reason behind the data is no longer important, people just need to know that there is a statistical correlation between the data," The theory may end.
There is no doubt that larger, newer and faster large data have profound insights and value, but that it is too optimistic and simple to have big data.
First of all, hundreds of years of statistical history has taught us that through statistical data to understand the real world is never perfect, the reality of the sample errors and deviations and other "traps", not simply rely on larger, newer, faster data can be solved.
Second, the large data value density is low, the content is mixed, find "genuine" information is not easy. And "Know it, not the reason why", only consider pure relevance, do not pay attention to the causal relationship between the data and conclusions of the analysis method, in the reality often can not withstand scrutiny. In theory, for example, the public mood of an event can be inferred from every statement in the analysis, but it should not be overlooked that microblogging active users represent themselves and do not represent a wider group.
In particular, many data are still in the "island" state, and large data in single or small areas is not only of limited value, but also of the risk of one-sidedness. Only when the data crosses the boundaries of the industry, the accuracy of the data increases as the relevance strengthens. To get through the data "island", the integration of data still have a long way to go. In addition, data collection, storage and handling, although more and more convenient, but from the technical perspective, how to collect valuable information from the massive data, but also lack of powerful tools.
There is no doubt that the big data age has come, but big data is not omnipotent. The core of big data is not large, it contains the transformation of calculation and thinking mode, too optimistic and simple understanding, can promote "big Data superstition". A more pragmatic approach might be to respect the traditional statistical experience and not to be superstitious about large data and make good use of large data while not dwarfing large data as "new wine in old bottles". Otherwise, with the "data can tell the conclusion" of the fallacy, it may fall into the data "trap", so that large data produced "big mistake."