In recent years, large data in our country has been the unanimous attention, all walks of life has raised a big attention to large data, the upsurge of application of large data. This rapid acceptance of new things, reflects China's reform and opening-up, modernization of the public awareness of the gratifying situation, encouraging. However, due to the rapid development of http://www.aliyun.com/zixun/aggregation/13568.html "> Large data technology, some of the mistakes of the simple knowledge are also circulated, if not corrected in time, will result in the misunderstanding of large data, affect economic and social development. The current domestic very popular "Big Data Age" (Victor Maire-Schoenberg etc, 178.html "> Zhejiang publishing House, 2013, hereinafter referred to as the" times ") put forward three views of serious fallacy, hereby pointed out, in order to attract attention.
"Not causation, but mutual relationship"?
One of the main ideas of the Times is that the big data age is "not a causal relationship, but a correlation". In fact, as early as 18th century, British skeptic Hume pointed out, "not only our rationality can not help us to find the final link between causes and results, and experience shows us their constant union, we can not rely on their own reason to believe, Why do we extend that experience beyond the special things we have observed. We are merely assuming that we will never prove that the things we have experienced are necessarily similar to those we have not found. ”
The book of the Times, which has long been put forward for centuries, is a new concept in the age of large data, not only obsolete but also wrong. Because, simply speaking, the big Data age "is not a causal relationship, instead, the author does not understand that causality itself is also a kind of interrelationship, that is, the relationship between cause and result, so the causal relationship is not more meaningful than causality itself, in fact, it is even a kind of synonymous with repetition.
The right point of view should define what kind of relationship the causal relationship is, which has been more deeply understood by the 20th century Study of natural science and mathematical philosophy. The invention of computer makes people begin to understand the starting point of knowledge from the point of view of computer language expression and transmission of information. The arrival of the big data age makes people suddenly enlightened.
Before the publication of the Chinese version of the Times, Professor Li Dewei, an economist, it has been put forward that the era of large data no longer emphasizes causation, nor does it simply classify causality into mutual relationship, but accurately points out that there is an isomorphism relationship between the object movement sequence, especially the corresponding and isomorphic relationship between human cognition and external objective things, The expression, transmission and storage of information is an isomorphism relationship, that is to say, external objective things movement and human's subjective cognition are the phenomena of the objective world, are coordinated, one by one correspond to each other, and the subjective cognition image is only a symbolic system which carries and transmits the external objective phenomena. Whether from the person to realize, or from the external experience of the abstract, are isomorphic, corresponding relationship.
"Not random samples, but all data"?
"Time" a book that the Big Data era "is not a random sample, but the whole data", the understanding of things is no longer from the random sampling of some samples, but from all data. This statement ignores all and part of the dialectical relationship. It is impossible for man to complete all things in a limited time, and absolute truth can only be realized in the process of cognition which is successive and never ceases. The development of any thing always has the past, present and future, now is now, the future has not appeared, all cases can not be achieved in a limited time, the understanding will never end. The future is infinitely larger than the past and the present. Because of this, Popper put forward, "the full name proposition is not verifiable, can only be falsified." ”
In fact, the sampling method of the past small data age compared with the current large data method, can only be said that the large data age could use more accurate, comprehensive data, to include a larger factor simulation model to track, analyze simulation reality, to obtain more accurate results than in the past. However, compared with the whole, has always known the minority, errors, errors can not be completely eliminated. For example, by means of a census, what are the characteristics of Chinese people now? Do not say all of the existing 1.3 billion of the population of all the attributes are not possible (because things have an infinite level of attributes), even if the full understanding of the existing 1.3 billion of the population of all the attributes, it does not mean that the past, the future of the Chinese people can be fully aware. The Chinese in the future are infinitely larger than the existing Chinese who already know them. Therefore, the large data and the small data is only a huge, comprehensive, real-time data to understand things, but to master all the data in a limited time is always impossible.
"Not precision, but confounding"?
The era authors say the big data age is "not precision, but hybrid", meaning that the small data age is precision, the big Data age because the master of a large number of data can no longer adhere to the accuracy, but rely on large data more vague action. This is clearly wrong. Because in the small data age can grasp accurate small data, but most of the data is omitted, discarded, the realization of the results can not be accurate, comprehensive, truth and wrong boundary is not very clear, then the understanding is vague and biased. In the age of large data, because of the more comprehensive data, can understand things in a larger scope, so can be more accurate, quantified, so that some of the middle fuzzy areas can be more accurate understanding of the accuracy and ambiguity, error itself are more accurate quantification. For example, in the computer information system, the release of more information, through repeated alignment, error correction mechanism, reduce noise, reach accuracy. This is the human understanding of the activities of the original is to do so, "hearing for the virtual" to "see for real" to error correction, small data one or two back and forth can not be correct, repeated a large number of data, can achieve more and more high-precision, error rate is getting smaller.
China's population is the world's largest, the information industry market, the most opportunities to develop information, large data and intelligent industry. But China now has a tendency to be blind to foreign winds in its understanding of big data. For foreign large data theory, we should keep a sober understanding with a critical eye.
You can also browse:
1. What data analysis in the Big Data age
2. The big data age requires imagination
3. The reading experience of the big data age