Victor Lejes Schoenberg's most insightful insight in the big Data age: The big changes in life, work and thinking, he made it clear that the biggest shift in the big data age was to abandon the desire for causation and instead focus on the relationship. That is, if you know what "is", you don't need to know why. This has subverted the human's thinking practice for thousands of years, and posed a new challenge to human cognition and the way of communicating with the world.
Gartner, a renowned It research organization, has used its "Magic Quadrant" for professional it markets as a way of evaluating in the two-dimensional matrix, the horizontal axis is the forward-looking integrity, the longitudinal axis is the execution force, if this method is used to evaluate the large data age book, it is probably located in the lower right corner near the center of the longitudinal axis.
In the 2012, I read 3 books about large data, which are the "Victor Maire Schoenberg", "Big Data" and "Big Data times". Compared with the other two, the Schoenberg of this book is in the perspective of "Impact analysis", which has enlightening value in the change of thinking in the age of large data. It also says that the book is more valuable to corporate executives and CIOs, and that it doesn't have much discussion technology, but rather a shift in mindset (Paradigm shift).
In short, the value of this book can be summed up in two "three" and One "one": The 1th "three" is 3 ideas about large data, emphasis on the change of values and ideas in the era of large data transformation; the 2nd "Three" is about the 3 elements that large data affects business change: the interaction between data, technology and innovative thinking; a "One" is about the governance and privacy under the generalization of large data.
There is no need to dwell on the value of large data itself, here we focus on 3 thinking changes about Big data: 1. Not a random sample, but a total of data; 2. Not precision, but confounding, especially the simple algorithm of large data is more effective than the complex algorithm of small data; 3. It's not causation, it's correlation.
The Big Data Age book reminds readers large data is full data, at least the dimensions of the whole, which brings the observation and analysis of things in the angle of change, especially in relation to the traditional IT system data, large data emphasizes the external and real-time data, these two features also make "analysis" The Evidence analysis is possible, but the book ignores the analytical value of combining external data with enterprise internal data. For example, for the government, the analysis of large-scale public health events, infectious diseases can be faster use of large data (such as microblogs) to find the current situation, but to schedule resources, or need to combine the "small data" accurate decision-making.
The 2nd core concept of a simple algorithm for large data comes from Google's insights, but also from the core concepts of Hadoop (a distributed system infrastructure, developed by the Apache Foundation). A simple algorithm for large data is a statistical logic, this, as in the thermodynamic analysis model, is not concerned with specific molecular motions, but is concerned with the macroscopic relationship between temperature, volume, and pressure, and the intrinsic understanding of this idea suggests that the reader be obtained from the Book of Wu's Beauty of mathematics, Only by truly understanding the large data based on statistical thinking mode can we understand its unique advantages and limitations. This method can solve the problem of large range, real time and parallel processing which cannot be solved in the past, and bring new insights, it is not to speak with the probability, but the details of the person. The idea from internet companies is that they want to solve the 80% trend problem first, and then slowly refine it.
3rd, the Big data focus on "what", rather than "why", often online shopping people will be more easily realized. Many of the website's recommendation engine has this ability, it can when customers buy books, recommend customers just like other books, customers may not know "why", in fact, the site does not care about "why", ("Why" can be analyzed by the academic experts). But the site, based on statistical analysis of tens of millions of people or even hundreds of millions, can discover "correlations", or large numbers that are better at analyzing the connections that humans cannot perceive, and advising people to take action. This revolutionary thinking is serious, the former "Beer + diaper" Data Warehouse story needs data collation, cleaning conversion and expert modeling mining, the procurement behavior of the association may be easily discovered by Hadoop algorithms. Because of the low threshold of analysis, this method has become a common tool, and derives the business model of cloud service of large data, which can be purchased by enterprises as "analysis is Service" (Analytics as a services), and the domestic Ali Department is committed to the establishment of this model.
The 2nd part is about the business model of large data, the most valuable is the analysis of the business ecology of large data, in addition to the well-known data, technology, the author believes that there is a 3rd type of thinking of large data companies, including data brokers and so on, which is too concerned about the trend of technology itself is a good reminder. An interesting topic is that the authors argue that statistically based data scientists will gradually replace industry experts, because the new real links that big data finds may subvert traditional industry experts, a topic that academia may be interested in. An intriguing example is that natural language translation based on statistical analysis of large data a few years ago, it was better than the linguistic faction based on semantic understanding, and a group of algorithms for language translation mentioned in the book even jokingly said, "Every time we group a language expert, we improve our translation accuracy."
The 3rd part is about big data to become George Orwell "1984" in "large brother", that is, through technical means to achieve the ubiquitous monitoring, privacy and abuse is the most worrying problem. I think this topic is too public, and there are many articles in the discussion, is not the nature of the book, and the Rise of large data is a gradual process, the practical cases of various industries are still emerging, the industry should focus on industry innovation, on the topic of public discussion or left to scholars, the government and the future.
Western authors have a class of advocates of the concept, the most famous is "out of control" of the author KK (Kelly), this category by the reader as a missionary author, like to promote the concept of subversion, to produce a past life (before/after) Comparison of the shock force. The author of this book is the same, so subversive, the powerful data era seems to be coming, however, such authors will be accused of "pipe kill no matter"-put forward the idea, not specific feasibility of responsibility. Back to Gartner's "Magic Quadrant," which is mentioned earlier, progressive execution is the key to the tendency of big data to blossom in every industry.
(Responsible editor: Lu Guang)