The big data era is approaching, but big data is not omnipotent. Its core is not a large scale. It involves the transformation of computing and thinking modes. too optimistic and simple understanding may contribute to "Big Data Superstition"
Thanks to the development of mobile Internet, smart phones, and smart wear products, data on people's behaviors, locations, and even physical characteristics can be easily recorded, making big data collection possible.
The value of this new data form, such as the magical trend prediction capability, has been widely discussed by the public and has become a selling point for many sellers to promote marketing. From automobiles, cosmetics to sports, it seems that all industries can use big data to precisely locate and find consumers, predict trends, and win the future.
In the opinion of supporters, the capability of big data is that every data point can be captured. By analyzing big data, we can export amazing and accurate results, and the classic sampling statistics method is facing elimination. At the same time, the data is too big to be able to speak on its own. "The reason behind the data is no longer important. People only need to know that there is a statistical correlation between the data." The theory may end.
Without a doubt, big data with larger scales and faster updates has profound insights and brings value. However, it is too optimistic and simple to believe that big data is omnipotent.
First of all, the history of statistical development over the past few hundred years has told us that the real world has never been perfect through statistical data, and there are various "traps" such as Sample Errors and deviations in reality ", the solution is not simply to rely on larger, updated, and faster data.
Second, the value density of big data is low and the content is mixed. It is not easy to find the "genuine" information. However, "knowing what it is, not knowing why it is" only considers the pure correlation, does not pay attention to the analysis method of the causal relationship between data and conclusions. In reality, it is often difficult to think about it. For example, theoretically, we can infer the public sentiment caused by an event by analyzing every speech on Weibo. However, we cannot ignore that Weibo active users can only represent themselves, it does not represent a wider group.
In particular, many data is still in the "isolated island" status. Big data in a single or few fields is not only of limited value, but also has a one-sidedness risk. Data accuracy can be improved only when data spans industry boundaries and associations are enhanced. It is still a long way to integrate data through isolated data islands. In addition, although data collection, storage, and handling are increasingly convenient, there is still a lack of powerful tools to find valuable information from massive data.
There is no doubt that the big data era is approaching, but big data is not omnipotent. The core of big data is not a large scale. It involves the transformation of computing and thinking modes. An overly optimistic and simple understanding can contribute to the "Big Data Superstition ". A more practical attitude may be that, on the basis of respecting the traditional statistical experience, big data is "old bottles and new wines", while not blindly trusting big data and making good use of big data. Otherwise, the misunderstanding of "data can draw conclusions by yourself" may fall into the "trap" of data, so that big data produces "big errors ".
Big data cannot be a "big error"