From time to time, Xiaobian is immersed in thinking about big data principles from time to time. This is not about Hadoop versus relational databases or the confrontation between Mahout and Weka, but more fundamental wisdom - using data as a "New Age Currency "Way of thinking. However, the data may be described as "a new era of oil" closer, or, we also need a new metaphor more comprehensive interpretation of the value of the data and content.
The metaphor itself is neither fact nor proof, but they do create the topic that leads us to the truth. Metaphor makes complex concepts more understandable, as the quoted classics in this article - they help explain the basic principles of big data. This article will list eight truths that are closely related to Big Data, which everyone may have heard, at least slightly, and sorted by time. Finally, I will make my own speculations and share "the truth of the future" with my friends.
1. "Relevance is not causal relationship"
We have heard this statement more than once. In the university philosophy class, I learned that such a version of the underlying fallacy called post hoc ergo propter hoc translates as "the latecomer". Sounds somewhat obscure, more straightforward to explain, that is to say, "B happened after A thing, so B matter from A thing."
You can read O'Reilly Radar's blog. In one of his articles, "Hidden Costs of Guessing," Alistair Croll pointed out: "The most obvious correlation is demonstrated in the expertise of big data ... Parallel computing, algorithmic improvements, and the exact nature of Moore's Law are already significant This lowers the cost of analyzing the data set, "resulting in a" data-driven society that is both smart and stupid. "Final Conclusion • Stay smart and respect the differences between relevance and causation. Model is only performance, not a conclusion.
2. "All models are wrong, but some of them do work"
George EPBox, an incident statistician, wrote this conclusion in his 1987 textbook, "Empirical Model Construction and Response Surface." Throughout his teaching career, Box has struggled to translate his ideas into models that are a good fit for big data analytics. In December 1976, the Journal of the American Statistical Association published an article entitled "Science and Statistics", which specifically demonstrated past and present life and practical significance of the model.
3. Big data (almost) insight into everything
If you still can not agree with this conclusion, please force yourself to accept as soon as possible. This statement stems from a statement issued by Scott McNealy in 1999, saying "everyone will say goodbye to privacy ... please learn to adapt." It is worth mentioning that McNealy is the co-founder and co-founder of Sun Microsystems CEO. There are many examples of today's big data intrusion into personal life: analysts have the ability to infer the speaker's gender based on social speech or to determine if pregnant women exist in their home through buying habits; businesses such as Acxiom, which stock large amounts of business information, usher in a brilliant business Leap; forecast and disaster prevention information integration is rising; the US National Security Agency "prism door" incident has been victorious in the world.
4. "Eighty percent of the business-related information comes from unstructured forms, mainly text (but also video, image and audio)"
In an article in 2008 there was such a conclusion - although as it was said at the time, unstructured data may have played an important role as early as the early 1990s due to the difficulty of accurately quantifying it, except that we did not realize it To All in all, more than 80% of the statements are just vague concepts that can not be taken too seriously because, as far as I know, none of the evaluation mechanisms systematically weighs the issue. Nonetheless, it is believed that every scholar who maintains the same philosophy as Box will consider the 80% "unstructured" argument to be instructive - even if it is not true. Regardless of the exact number, text and content analysis should be permanent members of everyone's kit.
5. "This is not information overload, but filter failure"
Clay Shirky made this assertion at the Web 2.0 Expo in New York in September 2008. Shirky's assessment of the filter itself seems a bit conservative, such as "the increase in the amount of data does not mean that it will lead to better conclusions", but this happens to coincide with my point of view. But the premise is that things should not be done too far. Everyone should not think of Eli Pariser as "the concept of filter is purely a bubble." His vision can only reach the level of automation at most, and can no longer look to the broader future.
6. "The same meaning can be expressed in many different ways, and the same expression can cover many different meanings"
At the March 2009 IEEE Intelligent Systems Conference, Google employees Alon Halevy, Peter Norvig, and Fernando Pereira stated the above in an article entitled "Irrationality of Data Validation." How does the irrationality of data appear? The answer they gave was that semantic interpretation of "imprecise and ambiguous" natural language was the best example. In addition, machine learning to infer the relationship, so as to achieve the interpretation of large-scale aggregate content can prove this point. .
7. "At the heart of big data is not data! The value of big data is analytics"
Gary King, a professor at Harvard University, expressed his opinion when attending IEEE meetings with several Google employees in Article 6. However, I do not fully agree with King's statement. Of course, there is value in the process of verifying the data needs and developing the ideal scenario to collect and organize data structures. Analysis helps us discover these values, so I came to King's shoulder to come up with a more accurate statement that the value of big data is achieved through analysis.
But this is just my opinion, may not be able to get King's own identity. Interested friends on this topic can click here to view the December 2010 issue of MIT Sloan Management Review by Steve LaValle, Eric Lesser, Rebecca Shockley, Michael S. Hopkins, and Nina Kruschwitz The article "Big Data, Analytics, and Paths from Viewpoint to Value."
8. "The importance of intuition has not been affected"
The phrase comes from Phil Simon, the author of a paper titled "Business Big Data Can not Be Ignored: Big Data," released earlier this year. (I provided some texts and sentiment analysis for the writing of the article.)
Simon explains that "big data does not, at least for now, replace intuition; the latter merely serves as a complement to the former, and the relationship is coherent and unobtrusive, by no means black or white." Tim Leberecht A similar statement was made in June of this year by CNN about why big data will never replace business intuitions.
Finally, the truths of these eight guidelines for the future need to be complemented by a last-minute supplement - but this is not yet widely understood:
9. The future of big data lies in synthesis and context
The missing element of most solutions lies in the ability to integrate information from different sources, which in the right way considers the context in which the content is relevant and leads to accurate conclusions. Here I would like to quote the argumentation process (of course, somewhat out of context) involved in a heuristic paper by design strategist Jon Kolko. First, Kolko cites cognitive psychologists - who try to study the connection between intuition and solutions - as an example. The parties will "understand the person, the place and the connection between the events according to the actual background, find out the exact time of the incident, and make judgments and actions on the possible future situations."
Kolko regards design synthesis as a key element and is "a way of combining data manipulation, organization, alignment, and filtering with the background, aiming to transform data into information and knowledge." Jeff Jonas, an IBM researcher, believes that a "universal purpose" type background system will help locate different data within the same data space. Such a solution enables us to scale scalable, real-time and unprecedented exploration of changing viewing space.
Is this not exactly the development goal that we set for big data? From model testing to actionable conclusions. The nine truths I hope to summarize can help you understand this development path for big data.