Lead: In the big Data age, we just need to know "what" without needing to know "why". But does big data really make a big difference to science? Is it as magical as it might be? Or is this just an illusion brought about by an overreaction? The authors tell you that the big data age has not really come.
Without taking into account the achievements of the current big data, will big data revolutionize science? Will it help us build a better world?
Before we answer this question, let us return a little bit in time. Recently, I was invited to speak at the "tracing the source of light" (how the Light Gets in) Cultural and arts Festival in the town of Hay-on-wye, UK. The organizer of the cultural festival settled me in the beautiful Dabinton Manor (Great Brampton House). There, I met other invited speakers from other cultural festivals, such as physicist George Ellis, Carlo Rovelli, Carlos Frenk, Tara shears, biologist Rupert Sheldrake, psychiatrist David Nutt, As well as the journalist Colin Tudg) with David Malone et. (I hope to communicate with Ellis and Sheldrake as soon as possible.) )
One afternoon, I took part in an open debate on big data, together with reporters Kenneth Cukier and Angela Saini, and sociologist Laurie Taylor. The brochure for the Culture festival has made the case for this debate: "In an age when we can gather data about the size of the stars, will we replace concise theories with complex real-world data models? Does Big data mean the end of the theory?" These questions were presented by the Economist Economist data editor Cukier and Viktor Mayer-schonberger, professor of Internet governance at Oxford University, published in their 2013 bestseller, Big Data: A change in human life, The revolution of work and thought.
They wrote in an article based on the book: "At present, there is far more than the previous mass of data dissemination around us, large data is accompanied by this realistic condition of the outcrop, and it has been given a lot of unexpected uses." Although the network makes it easier to collect and share data, large data is not synonymous with the Internet. Compared to the Internet, large data focus on the content of more than ordinary communication and communication. The idea of large data is that by analyzing the amount of data, we can understand a lot of things that are not understood by relying on a small amount of data. ”
The most interesting view of Cukier and Mayer-schonberger is that large data will allow us to solve problems without having to understand them. The big data, they write, will shift the focus of researchers ' attention from "causal relationships to interconnectedness". Chris Anderson, a former Wired magazine editor, made similar remarks in his 2008 article, "The End of Theory", that is, "this represents a change in the way of thinking, from trying to understand the underlying causes of how the world works, to simply understand the interconnectedness of events, Then use this connection to solve the problem. ”
If big data means a digital technology, then I love big data. Digital technology has changed the way journalists and scientists collect, analyze and disseminate information. For example, I could use a computer to search Google for cukier information, not even home, and instantly find other readers ' reviews of him, even Michiko Kakutani, a quirky commentator in the New York Times.
Not only that, Cukier also believes that scientists can get a lot of conclusions just by digging up the correlation between data, which is also true. For example, a century and a half ago, epidemiological studies showed a strong correlation between smoking and cancer. But so far, we still can't understand exactly how smoking triggers cancer. However, the discovery of this correlation has sparked another anti-smoking campaign over the past few decades. There is no dispute that these movements are more effective in reducing the incidence of cancer than all the advances we have made in detection and treatment (as I pointed out in a recent article).
At the same time, I agree with Cukier's other view that theory may hinder the solution of the problem. You are, for example, a judge who is grappling with whether a convicted murderer will commit another crime. You may consult psychiatrists or other so-called psychologists to make predictions based on your favorite psychological school theory. But you might as well use an insurance company's method of calculating premiums to see the recidivism rate of a criminal with a similar background to your murderer.
For a number of reasons, however, I still disagree with the enthusiasm of Cukier and other supporters about big data, and I'm even bored. First of all, their rhetoric reminds me of the hype of Chaos in its successor, the "Complexity theory" of researchers. For chaos and complexity, I put the two together in my 1996-year book, "The End of science", creating a new noun "hybrid." Two areas have pledged to use faster computing and more sophisticated software, and scientists can analyze the problems that have been limited by the traditional methods of reductionism. Some hybrid scientists want to be able to discover a new theory that explains the "self-organizing" system of a complex set of phenomena-even an "inverse entropy" force.
However, such discoveries never took place, and the actual progress that Cukier and Schonberger envisioned did not occur. Take genetics, the Human Genome Project was completed ahead of schedule in 2003, thanks to advances in computer technology and other technologies. The cost of extracting and analyzing genetic data from human beings and other organisms has been declining ever since.
But it is disappointing that all this progress has not produced much medical progress. At the time of writing, none of the US gene therapy was approved for the market, and Europe passed only one. Efforts to find specific genes to deploy complex behavioral traits and disorders have been fruitless, and the war on cancer has never been more successful.
Like geneticists, neuroscientists are drowning in data. While scanners and other tools are becoming more powerful, neuroscientists are still unable to explain exactly why the brain is thinking, or why the mind is often problematic. Thomas Insel, director of the American Institute of Mental Health (national Cato of Mental Tiyatien), recently proposed that we should thoroughly rethink our definition and diagnostic methods for schizophrenia, depression and other psychiatric disorders. Our treatment of these diseases remains primitive and terrifying.
The economic collapse of 2008 provides an opportunity to actually test big data. Wall Street bankers have the fastest computing, the most sophisticated software, and the largest database that money can buy, but many do not anticipate the economic collapse of that year. So far, practice has proved that the good hope that big data can make economics and other social sciences truly scientific (accurate and predictable) remains an illusion.
I hope and firmly believe that progressive information technology will, one day in the future, truly revolutionize the medical, social sciences and other fields. But before that day arrives, let us suppress the blind hype and excessive publicity about big data.