Large data has penetrated into every industry and business functional area today, becoming an important production factor. The excavation and application of massive data indicates a new wave of productivity growth and consumer surplus.
October 28, 2013, three Xinjiang terrorists driving a jeep to the Tiananmen Square, the terrorists were all burned on the spot, but the police spent only more than 10 hours to seize 5 associates. It's a state secret how quickly the police lock down suspects, but from the telltale signs of the media, we can still find ubiquitous surveillance video and telecom tracking playing a crucial role. From the Tiananmen Square retrospective of the mass of information, through a number of fuzzy matching methods can quickly filter information, and finally find out the relevance of terrorist activities and suspects-this is the power of large data.
No biggest, only bigger
Wikipedia defines large data: large data or huge amounts of data, large amounts of data, big information, refers to the amount of data involved in a large scale, so that within a reasonable time to intercept, manage, process and organize into human can read information. For example, the IBM team, in order to win the computer over Kasparov, has collected nearly 100 years of the games of 600,000 masters, this is the big data, the human brain can not remember all these games and effective use. In 1997, the Chess Grandmaster, Kasparov, first lost to IBM's deep Blue computer in the Jeopardy program, became hit's news. The secret to a computer's ability to beat a human brain is to games large data stored in a deep blue computer. Scientists have developed an AI game software that can find the most appropriate steps from a large number of games, which is beyond the reach of the human brain.
Some people generalize the characteristics of large data into 4v:volume (large volume), produced (variety), velocity (high speed), value (low density). Let's review the "double 11" section of the past, when Taobao mall reached 188 million deals, with a record total turnover of 35.019 billion yuan. These transactions form the day of the crazy online shopping big data.
Such records are first reflected in the magnitude of the data. We know that a high-definition film capacity of about 1GB, and 1024 GB is a TB, and then 1024 TB is a PB, and large data often reach PB order of magnitude, visible data is too large to imagine. Secondly, it is the diversity of data, the variety of the transaction, the seller's information, the buyer's information, The Courier's information and the payment information, which constitute a diversified data chain of the industry. Third, the data produced by the speed is very fast, the speed of the search results also require fast, to find a class of goods in millions of items, its retrieval speed only need 1 seconds, this is the traditional technology can not be achieved. Finally, it needs to be explained that the content of large data, although the real and complete reflection of the objective world, but its value density is very low, if not to study mining, large data will not automatically produce useful results. In the massive surveillance video of Street View, for example, the criminals may have only a few seconds left.
Large Data Age
Victor Maire Schoenberg, the UK's Big Data Authority, wrote a book titled "The Big Data Age", the first to assert that human beings have entered the era of great data without any reversal. In 2000, he estimates, only about One-fourth of the information was digitized, while the other three-fourths was still in the form of newspapers, books, films and tapes, but by 2007 humans had stored more than 300 bytes, equivalent to 300 billion gigabytes of information. The big Data age has brought great changes to people in life, work and thinking.
First, the form of data is represented by the original relational data (such as spreadsheets) more as a non relational type of data (such as user comments, data storage mode from the original centralized storage into distributed storage, large data has to be stored in different local storage servers, Internet access, Constitute the so-called cloud storage.
Secondly, there is a fundamental change in the way of data processing, people can not only use a computer processing data, must rely on the cloud platform behind the network, cloud computing, in order to effectively deal with large data. On the large data processing, we can see three interesting changes: in the small digital age, people limited to the difficulty of obtaining data, can only use random sampling to obtain data samples, and then based on the sample data analysis and prediction. Once the sample is biased, the resulting result can make a great error.
In the big data age, we can easily get all the data and no longer need samples. Alibaba, for example, can get data from all buyers, and it can easily count the amount of transactions that day of "Singles Day", figure out which areas are the most active, and can broadcast the deal in real time through the media. This is the full data model of large data, the scope of data processing is the whole, not the sample. The second change is no longer blindly pursuing the accuracy of data. Because of the diversity, richness and dynamics of large data (which are produced in large numbers while processing), it is not necessary to emphasize the accuracy of data. The complex data will be mixed together, it seems that there is no use, or even some of the wrong data, but there is no relationship, this is the nature of large data, seemingly unrelated to the useless pile of data contains unlimited business opportunities.
Think about it, when people in Baidu more than ever search for "cold" "Fever" and other keywords, often means that there will be outbreaks of influenza, and even can predict what the flu, this is the power of large data. The third change is to focus on the correlation between the data, not the causal relationship. For example, by digging the day Cat Mall trading data, found that the purchase of the Metro coffee machine buyers, there will be a high proportion of the purchase of pet food, the business will lose no chance to recommend you buy Royal dog food. There is no causal relationship between coffee machines and dog food, but there is an intrinsic correlation. The correlation between data is the value contained in large data, and also the business opportunity that the merchant pursues. The relevance of large data tells us that we do not need to study "why" in the face of intricate and complex data, as long as we know what "is" is enough.
Finally, the big data age will spawn a data mining industry, with a number of digital scientists. To put it simply, data mining is the process of analyzing and calculating the data with certain algorithms and getting the information and knowledge we need. The traditional statistical analysis is to classify data according to known categories and then look for valuable data. If the given classification is unreasonable or wrong, then the statistical result will not produce the best results. and data mining is called "clustering" method, it does not need artificial classification, but by the algorithm analysis of the attributes of data, automatic aggregation of data into "class", so that "class" similarity between the small, "class" within the similarity as much as possible. For example, the insurance business covers all kinds of people, various occupations, so the design of a potential customer target group, the need for a large number of data mining, in order to find different customer base and important factors, this is not in advance artificially set. To "let the data speak for themselves" in order to adapt to local conditions to develop marketing plans, scientific calculation of break-even, to create more profits for insurance companies.
Large Data dividend
It has been asserted that data will become an important asset for mankind and become a more important resource for reusable development than oil and gold. I also agree with this view. Recently, the media reported that "three horses" together to buy insurance news, this is a large data dividend to save the example. "Three horses" using Alibaba, Tencent and Ping An insurance three companies to grasp the advantages of large data, the establishment of the network insurance companies-public Ann Online, this is a milestone in the Internet financial innovation, the purpose of using large data for insurance consumers accurate positioning and precision marketing, aiming at the main is the number of consumers. It can be seen that the use of large data technology will be the future of insurance companies to seize the market is a very important link.
Another useful application would be to use large data to guard against telecom fraud. Telecom fraud is a major disease in today's society, if the telecommunications, banking, Internet, public security and other parties to abandon the interests of entanglement, sharing their large data, so the maximum to eliminate telecommunications fraud is entirely possible. As long as we analyze and excavate the large data of the parties, find out the data factor of the telecom fraud correlation, then establish the dynamic monitoring model, then the police can find the fraudsters quickly according to the data chain once the relevant data appears.
Stocks of experts want to earn large data concept stocks dividend. Where is the dividend for the big data? The owners of large data, large data technology companies and large data value diggers (that is, the data scientists who provide thinking). Ma Yun said: The future of the world is the world of data. The big data age has shaken everything from industry, agriculture, commerce, technology to government, health care, education, culture and other areas of society, and people's lives are increasingly being changed by data. Can say, large data is a more precious resource than oil, gold, who mastered enough data, who grabbed the commanding heights, enhanced competitiveness, also mastered the future.
Negative list of large data
Large data is undoubtedly a treasure trove of resources, it contains great value, waiting for people to dig. But, like the coin has two sides, the big data also has its negative list, I attribute it to the data monopoly, violates the privacy and the data misleading three main aspects.
Data monopoly is the biggest hidden trouble of large data. We know that large data technology makes human attitudes, emotions, behavior and other aspects of the past that are difficult to measure, can be turned into data for analysis and prediction. Once big data is in the hands of a handful of businesses or government departments, they reject the flow of information in order to protect their interests, which not only wastes data resources, but also hinders data innovation and creates data monopolies. For example, if the national property data can be shared, it is very useful for the country to understand the whole and real situation of the property, but also can easily dig out the suspected corruption suspects, but these data are often held in the local departments, can not form effective sharing.
Invasion of privacy is the shadow of large data, only the big data under the law of the sun can be expelled. The United States "Prism plan", which was revealed by Snowden, is to use the ability to access large data, monitor the database of nine operators such as Internet, telecom and so on, and excavate "useful information" to achieve the purpose of collecting intelligence and secret monitoring. Almost all of the information, such as telephones, e-mails, documents, videos, photos, chats, and so on, is exposed to the prism, and the big data opens the door to privacy violations. Without legal restrictions on the acquisition, access, and sharing of large data, personal privacy will no longer exist.
Recently, a photo-sharing software (Snapchat) has been used in the United States because it meets the needs of young people to protect their privacy. If you share a photo with a friend in Snapchat, it will automatically be deleted immediately after the other person has read it, while the screen shots are forbidden to read and burn after reading. Therefore, Snapchat is also suitable for sending trade secrets or sensitive information, burned photos will not leave traces on the internet. This is a case of people rebelling against big data. But in daily life, people can not avoid the use of telecommunications, the Internet, micro-blog, micro-mail, QQ and other services, through these services recorded large data, almost transparent to reveal a person's social network.
Data misdirection is a side of large data risk, and if the results of data mining are not evaluated, the use of large data can lead to erroneous results. Although large data tolerate data errors, how can someone actively get "dirty" data, the whole large data will be artificially distorted, adding false information. For example, when we are on Taobao, always pay close attention to the seller's credit rating, but there are always some sellers fake, by selling themselves and other fictitious transactions, to "Save Score" "Brush credit", and some sellers even hire people to "brush Drill Chong crown." If a crown seller's credit is obtained by "dirty data", the buyer's deception is a big probability event.
The dividend of large data can only be divided by powerful enterprises or government departments, which is disadvantageous to the healthy and harmonious development of the whole society. We call for legislation as soon as possible to set up large data-sharing platforms, break data monopolies, eliminate data gaps, protect personal privacy, and make big data the most important production factor in the new economy, so that ordinary people can share the dividend of big data.
(Responsible editor: The good of the Legacy)