Stassy Sch (Stacy Snyder) once dreamed of becoming a teacher. In the summer of 2006, she completed the course required to acquire a teacher's qualification certificate and passed all her exams; But her dreams suddenly burst when she was about to come true. A head of her school told her that she was unable to obtain a teacher's qualification certificate.
The head of the school showed a picture of her uploading to her MySpace page, in which she wore a pirate hat and drank in a plastic cup. The photo was shown to her friends, and may even be a funny thing, but the school does not think such behavior is in conformity with the teacher's standards. The university promised to remove the picture from the Internet, but it was too late to be indexed by search engines and recorded by web crawler. She wanted her photos to be forgotten, but the internet was not allowed.
This is Victor Maire-Schoenberg (Viktor Mayer-sch?nberger), a professor at the University of Oxford's Internet Institute, published in 2009, "deletion: The value of forgetting in the Digital age" (translated as "deletion: the choice of the big Data Age", hereinafter referred to as "delete") A case cited in a book. Forgetting is human nature, he tells readers, but with the development of http://www.aliyun.com/zixun/aggregation/14054.html > Information technology, Memory becomes more and more easy, forgetting is more and more difficult; The inability to forget will not only cause an unnecessary disturbance to the individual, but also create trouble for the enterprise they store more data, but many of them lose value over time. Therefore, he called for the introduction of a forgotten mechanism, such as setting a retention period for data, "Let us remember to forget".
Although no practical "forgetting" method has been proposed, the unique perspective of deletion is still widely concerned by academia and Internet industry. This book not only won a number of book awards, but also translated into German, Italian, Korean and other languages.
When "Big Data" becomes a new hot spot in the industry, Victor, who has an in-depth study in this field, recently published his new book, Big Data Age: a big change in life, work and thinking (hereinafter the Big Data Age), which introduces readers to the value of large data and the changes it will bring through a lively case.
On December 11, Victor came to Beijing with a Chinese version of two books, exchanging large data-related topics with readers and insiders on different occasions.
Three major transformations define big data
Big Data is one of the hottest concepts in the IT world this year, but even a lot of practitioners who often talk about the concept are unable to give accurate answers to what "big data is." Because of this, large data is also questioned as a hype out of the pseudo proposition.
Victor also did not directly give the definition of big data in the introductory part of the big data age, and he made it clear that "big data is not an exact concept". However, he described the characteristics of large data with three major transformations. The three transformations are explained in detail:
"First, in the big data age, we can get all the data associated with a phenomenon, not just a small sample," he said. For example, a study of the results of the illegal manipulation of a sumo game has analyzed 64,000 games, which is not a big number, but because it is the game of the past decade, it is big data. Large data is relative rather than absolute. ”
"Second, with more data, we can accept more mixed, more inaccurate data." If we have only 50 data points for a single thing, then each data point must be very precise, because each data point is useful, but if we have 50 million, remove 10, or even get rid of 1000, it's not too much of a problem. ”
"Third, we analyze big data mainly to predict what the future is, not why." We are concerned with revealing what is going to happen, rather than revealing the causal relationship of why these things happen. Because a lot of times we think we found the reason behind it, but we didn't actually find it. More time to know "what" is enough. For example, knowing where the flu will spread is enough, I don't need to know why, it's enough to know when to buy a ticket online, and I don't need to know why the price is the lowest. ”
Large data should not be oversold
On the face of it, the two books published contradict each other: the Big Data age emphasizes the value of the data, while the deletion suggests that the data should be "forgotten". In this respect, Victor said, the two books complement each other, and the above two points can perfectly interact.
"Big data can work well only if there is no noise and no useless data," he said. In the deletion, I think we need to have the possibility of getting rid of outdated data that are irrelevant to us now. If Amazon forgets a book record that is not related to your current interest and preference, it will have a better effect. Only good data can bring good predictions. ”
For big data being oversold, Victor also expressed concern: "It is exaggerated, as if everything suddenly became big data, big data can solve all the problems, in fact people do not know what it is, once people find that it is not omnipotent, will feel discouraged, and then the big data is discarded." "In his view, although the big data is very strong," people need to understand that it is not, we cannot exaggerate it too much.
Dialogue Victor: Data retention should be determined by the user
Amazon, Google, these big companies have accumulated a lot of data, but small companies, startups do not have much data, in the big data age what should they do?
Yes, it's interesting. For a long time, the strength of these big companies has been in their infrastructure such as server clusters. With cloud computing, startups can buy computing and storage capabilities to address infrastructure deficiencies as needed, but they have no data. The data is exclusive to some large companies. Of course, if a small company chooses the right field, it can also get data. For example, INRIX company, they develop navigation software, but also provide real-time road conditions, tell you where traffic jams. How did they get the information? Their basic navigation service is free, but if you agree to download the app, it will send your speed and other information back to the back end, so you become the sensor on this platform. Millions of people use inrix every day, so Inrix has millions of sensors and gets a lot of data. They can record the speed of people driving in certain weather, tell the insurance company, or inform the Government as a reference for strengthening road safety.
Typically, businesses can only predict by the data they collect, but their data is limited. For example, I searched Amazon for a book, but eventually through other channels to buy, Amazon does not know, it will still recommend the relevant books, I do not need, how to solve such problems?
In fact, there are already some companies sharing their own data, such as in the field of online advertising. But the point is whether you want your data to be shared by the enterprise, and if you are willing to let them share the data, you can get a better recommendation. Now in Silicon Valley, some startups are trying to build an information-sharing platform that is controlled by individuals and consumers.
In contrast, users may be more trusting of large companies and more willing to share information with them, so how does a small company let users share more data?
It's not necessarily. To put it another way, many people are reluctant to share data with Google or Facebook, which they think are too powerful; they are more willing to share data with small companies and start-ups. Interestingly, big data can help big businesses and small businesses, but not for midsize businesses. For example, 200-500-person enterprises, they are not small enough, not as flexible as start-ups, and not as strong as Google, so squeezed in the middle, do not have their own advantages.
Will the future enterprise's analysis of large data depend more on cloud computing or more on the internal computing power of the enterprise?
It all depends on the size and capacity of the company and the stage of development they are in, without a unified answer. Now that computing and storage capabilities are available externally, a company should see whether it is more cost-effective to use internal processing power or to use cloud computing.
Do you think the laws relating to privacy protection should be adjusted in the big data age?
Yes. Privacy allows individuals to more trust in the Internet, E-commerce, if there is no privacy, I would be very cautious about what I do on the Internet, because once I tell someone, I can take it back, I can not control it. So we need privacy laws, but privacy laws also need innovation. European privacy laws now stipulate that companies can only save data until the primary purpose is completed. This is a law enacted in the age of small data that is no longer applicable in the age of large data, because the value of data is often not reflected in its primary purpose, but in its secondary and even third-place use, which may be a use that you do not know when collecting data. So what we need to do is to keep the data from being saved only to the point where its primary purpose is done, but to give the decision to the individuals who are involved in the data and let them decide whether to delete the data.
In "delete" you said that the enterprise should keep the data time limit, this will affect the enterprise for large data use?
I mean, how long the data can be saved should be determined by the data-related individuals. For example, if I want my data to stay on Amazon for a long time, I need to have this right and will, but I have to have the right to delete that data at the same time. Amazon will also benefit because if I tell them to "delete my book 8 years ago because it has nothing to do with my current interest", this will eliminate some of the noise, their recommendations will be more accurate, I may buy more books.
If large data can accurately predict the future, we can make the most reasonable choice in the face of choice, but many times our personality, the joy of our lives stems from some irrational choice, whether large data will let us lose these?
Only when we know what the truth is and when we are rational can we be irrational. We can rationally to face unreasonable situation, this is our initiative to choose irrational, in line with human habits. For example, the data tells me that smoking is not good, but I can still smoke, this is irrational, but this irrational decision is based on a rational choice, because I know the truth, so I make the choice is rational. If there is no data, I do not know when I was rational, when irrational, many times I think I made the right decision, actually is a wrong decision. So our lives can still be fun after we have the data.
You say that forgetting is human nature, but the fact that today's human beings also lose some of the capabilities that our ancestors possessed, then can human's ability to lose oblivion in the digital age be seen as an evolution?
It can be said, but evolution should be a slow process, especially to reorganize the brain. By chance, evolution may deprive humans of a certain capacity, but it will take many years to complete.
(Responsible editor: The good of the Legacy)