Limitations of large data: algorithms cannot completely replace human judgment

Source: Internet
Author: User
Keywords We we facts we facts these we facts these statistics we facts these statistics big Data

Mathematical model

If you've heard the joke, interrupt me: There are three statisticians who go hunting rabbits. They found a rabbit. The first statistician shot a foot off the rabbit's head. The second statistician shot a foot off the rabbit's tail. The third statistician shouted, "We got him!" ”

Even if you don't think the joke is funny, you're probably working with a manager like the one it describes. Their math level may be impeccable, but sadly, their results in the real world are worthless. Lies, fucking lies. What must the big organizations have to master in order to increase the chances that their quantitative analysts will produce real value rather than statistical illusion? How can executives who don't understand mathematics make sure they are not blinded by big data?

We may find the best answers to these questions in Samuel Abesman's book The Half-Life of Facts (the half-life of Facts) and Nate Hill's book, Signal and Noise (the Signal and Noise). These two mutually independent and complementary works delve into how the "data" becomes "evidence", and so many seemingly inscrutable mathematical models cannot distinguish between these two kinds of things. These two books have accepted and further expanded Elmo-Taleb's popular and insightful book "Blinded By Random Phenomena" (fooled by Randomness) and "Black Swan" (The Dark Swan), and Nobel laureate Daniel Kahneman's brilliant works "thinking, fast and slow (Thinking, Fast and slow) describes the subject of uncertainty and the number of self-deception. Like its pioneers, Abesman and Hill also wrote works that were both entertaining and operable.

Both authors cite the ironic quip of Mark Twain, will-Rogers and Charles Caitlin: "It is not what we do not know that leads us to the predicament, but what we know, but not the real." Both have explored the media and mechanisms used to differentiate between "real" and "less authentic" knowledge. Abesman and Hildu argue that the current dominance is "less real" knowledge. The more data is processed, the more attention is paid to it.

The researcher of Applied Mathematics, Harvard University's Institute of Quantitative Social Sciences (Harvard's Cato for Quantitative Social) Abesman deconstruct the definition of "fact". To the reader's mercy, he has not fallen into the mire of postmodernism philosophy. Instead, he delves into how serious scientists determine what they think they know and relate to the things they are studying. This "scientific measure"-the science of how science measures its processes and progress-is very helpful in determining what scientists call the "facts" lifecycle and ecosystems. In this way, Abesman raises some interesting questions, such as: "The fact" is how it was born? How do they usually replicate, mutate, and evolve? How long will they fade away?

Pathological defects

Abesman's provocative core view is that there is a virtual physical phenomenon composed of facts. "Facts" follow established laws and trajectories, depending on how they are defined and measured. "When we read the news every day, we may have to face a fact about our world that is completely different from what we think we know," he wrote. "But it turns out that these changing changes, although in our view they have a real phase change, but not unexpected, nor random." By applying probabilities, we can understand their overall behavior, but we can also predict these changes by searching for slower, more regular changes in the way we perceive them. The rapid change of fact, like everything else we see, has its own rules, measurable and predictable. ”

What does "measurable" and "predictable" mean? Abesman is very good at describing institutional, personal, and probabilistic biases that distort the way science and scientists assess, publish, and eliminate "facts."

"The most obvious example of this is in the field of negative results," Abesman wrote. He cites what evolutionary biologist John Maynard Smith once said: "Statistics is a science that lets you experiment 20 times a year and then publish a false result in nature." However, if 20 independent scientists carry out the same experiment, 19 of them will fail, and their careers will naturally not go further. This situation is of course distressing, but that is how science works. Most of the ideas and experiments were unsuccessful. But most importantly, the results of the failure are rarely announced. ”

The crux of the problem is not that statistical science or scientific statistics have pathological flaws, but that this known pathological defect can create a motivation for us to rethink, revise and redesign the things we measure and test. We need "facts" to help us update our thinking and understanding of "facts". Science-and the increasingly digital technology that drives and supports it-provides a powerful model for businesses that have difficulty understanding their growing volumes of data and cannot add value to the data.

In this regard, the half-life of facts is a primer on the epidemiology of epistemology, i.e. the process of understanding the nature of knowledge and cognition in a discipline, a profession, or a culture. Abesman's work will urge policymakers around the world to rethink a question about how their organizations translate interesting data into useful facts.

Statistical data driven

The statisticians, the New York Times web site FiveThirtyEight blogger Nate Hill, used a completely different, but Abesman-compatible approach to exploring knowledge, facts and predictability. Through a plethora of detailed illustrations and episodes, Hill's book sends a sobering set of warnings about the hubris of predictions. "This book is about the difference between what we know and what we think we know, rather than what we know," Hill wrote. ”

From weather, earthquakes, global warming, soccer to subprime mortgages and the global financial crisis, Hill explains why Modelers and forecasters have trouble translating yesterday's data into tomorrow's "you can bet on it" predictions. Although these microscopic case studies are certainly superficial, they do not evade mathematics and take a consistent and fair attitude towards most of the most important hypotheses. If the book is a better editor, he may be urging Hill to sacrifice quantity and write more insights, but the breadth of these examples undeniably reveals "predictive pathology".

Abesman's analysis unit is true, and Hill focuses on "predictive effectiveness". Hill has good manners and self-awareness, and he acknowledges that human weakness is a design constraint. "But I think our beliefs never achieve perfect objectivity, rationality and accuracy," Hill wrote. Instead, we can strive for less subjectivity, less irrationality, and less mistakes. Predicting according to our beliefs is the best (and perhaps the only) way to conduct self testing. If objectivity is related to a greater truth than our own condition, then prediction is the best way to examine how closely the connection between our individual views and that greater truth is, and most objectively often those who make the most accurate predictions. ”

What I would like to know, however, is whether Hill is fully aware of the cumulative effect that he has had in mixing the warning stories with shocking failures that might have a bearing on the readers who have their stories in mind. He offers one more example of a flawed and biased model, using flawed and biased methods, to construct models with flaws and biases. He has repeatedly elaborated the "overly fitted" statistical model. In order to adapt to the data, statisticians have struggled to debug their models, which, in turn, tend to significantly reduce the accuracy of these models and thus fail to use them for reliable predictions, Hill explained.

Hill's story provides a fair sample of the current model builders. In this regard, the book predicts that the future of the New world will be full of many statistical data-driven success stories, neither happy nor courageous. In this world, the average performance distance from the world level may differ from several standard deviations.

Hill cites Phillips-Telloc's classic study of expert opinion. The study shows that "experts" in a much more disturbing field of expertise are often poorly behaved in predicting possible outcomes. In addition, experts tend to be overly confident about the quality of their predictions, and in short, expert opinion often gets the worst of the two worlds: an arrogant attitude gives the wrong answer. This is not the secret of success.

From IBM's supercomputer Watson, Google's search algorithm, to Amazon's recommendation engine, data-driven computing systems can undoubtedly achieve extraordinary success, especially when they focus on real-life testing rather than abstract theories. "Companies that really ' know ' big data, like Google, don't spend a lot of time building models," writes Hill. "These companies conduct hundreds of thousands of experiments a year and test their ideas on real customers." ”

However, after reading these two books, we can draw an ironic conclusion: the more data and facts a person obtains, the more meaningful the prediction is, and the more important the judgment becomes. The co-evolution of humans, datasets and algorithms will ultimately determine whether "big data" will create new wealth or destroy old values.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.