Baidu can know what we are concerned about, Taobao can insight into what we like, this is happy or worry? Large data has penetrated into every industry and business functional area today, becoming an important production factor. The recent publication of the journal Nature and Technology suggests that the excavation and use of massive amounts of data heralds a new wave of productivity growth and a wave of consumer surpluses.
No biggest, only bigger
Wikipedia defines large data: large data or huge amounts of data, large amounts of data, big information, refers to the amount of data involved in a large scale, so that within a reasonable time to intercept, manage, process and organize into human can read information.
The IBM team, in order to get the computer to beat Kasparov, the chess champion, has collected nearly 100 years of the games of 600,000 masters, the big data that the human brain can't remember all these games and use effectively. In 1997, the Chess Grandmaster, Kasparov, first lost to IBM's deep Blue computer in the Jeopardy program, became hit's news. The secret to a computer's ability to beat a human brain is to games large data stored in a deep blue computer. Scientists have developed an AI game software that can find the most appropriate steps from a large number of games, which is beyond the reach of the human brain.
Some people generalize the characteristics of large data into 4v:volume (large volume), produced (variety), velocity (high speed), value (low density). Let's review last year's "Double 11" section, the day Taobao Mall reached a 188 million deal, the total turnover reached a record 35.019 billion yuan. These transactions form the day of the crazy online shopping big data.
Such records are first reflected in the magnitude of the data. First, we know that a high-definition movie has about 1GB of capacity, and 1024 GB is a TB, and then 1024 TB is a PB, and large data often reach PB order of magnitude, visible data volume is too large to imagine; second, the diversity of data, the variety of transactions, the seller's information, the buyer's information, Courier information, the payment of information, constitutes a diversified industry data chain; third, the data generated by the speed is very fast, the speed of the search results also require fast, to find a class of goods in millions of items, the retrieval speed of only 1 seconds, which is not achieved by traditional technology. Finally, it needs to be explained that the content of large data, although the real and complete reflection of the objective world, but its value density is very low, if not to study mining, large data will not automatically produce useful results. In the massive surveillance video of Street View, for example, the criminals may have only a few seconds left.
Large Data Age
Viktor Mayer-sch?nberger, the UK's big data guru, wrote a book titled "The Big Data Age", which for the first time asserted that humanity had Victor Maire Schoenberg into the big data age. In 2000, he estimates, only about 1/4 of the information was digitized, while the other 3/4 was still in the form of newspapers, books, films, tapes, but by 2007 humans had stored more than 300 bytes of data, equivalent to 300 billion gigabytes of information. The big Data age has brought great changes to people in life, work and thinking.
First, the form of data is represented by the original relational data (such as spreadsheets) more as a non relational type of data (such as user comments, data storage mode from the original centralized storage into distributed storage, large data has to be stored in different local storage servers, Internet access, Constitute the so-called cloud storage.
Secondly, there is a fundamental change in the way of data processing, people can not only use a computer processing data, must rely on the cloud platform behind the network, cloud computing, in order to effectively deal with large data. On the large data processing, we can see three interesting changes: in the small digital age, people limited to the difficulty of obtaining data, can only use random sampling to obtain data samples, and then based on the sample data analysis and prediction. Once the sample is biased, the resulting result can make a great error.
In the big data age, we can easily get all the data and no longer need samples. Alibaba, for example, can get data from all buyers, and it can easily count the amount of transactions that day of "Singles Day", figure out which areas are the most active, and can broadcast the deal in real time through the media. This is the full data model of large data, the scope of data processing is the whole, not the sample. The second change is no longer blindly pursuing the accuracy of data. Because of the diversity, richness and dynamics of large data (which are produced in large numbers while processing), it is not necessary to emphasize the accuracy of data. The complex data will be mixed together, it seems that there is no use, or even some of the wrong data, but there is no relationship, this is the nature of large data, seemingly unrelated to the useless pile of data contains unlimited business opportunities.
Think about it, when people in Baidu more than ever search for "cold", "fever" and other keywords, often means that there will be outbreaks of influenza, and even can predict what the flu, which is the power of large data. The third change is to focus on the correlation between the data, not the causal relationship. For example, by digging the day Cat Mall trading data, found that the purchase of the Metro coffee machine buyers, there will be a high proportion of the purchase of pet food, the business will lose no chance to recommend you buy Royal dog food. There is no causal relationship between coffee machines and dog food, but there is an intrinsic correlation. The correlation between data is the value contained in large data, and also the business opportunity that the merchant pursues. The relevance of large data tells us that we do not need to study "why" in the face of intricate and complex data, as long as we know what "is" is enough.
Finally, the big data age will spawn a data mining industry, with a number of digital scientists. To put it simply, data mining is the process of analyzing and calculating the data with certain algorithms and getting the information and knowledge we need. The traditional statistical analysis is to classify data according to known categories and then look for valuable data. If the given classification is unreasonable or wrong, then the statistical result will not produce the best results. Data mining uses a method called "clustering", it does not need artificial classification, but by the algorithm analysis of the attributes of data, automatic aggregation of data into "class", so that "class" similarity between the small, "class" within the similarity as much as possible. For example, the insurance business covers all kinds of people, various occupations, so the design of a potential customer target group, the need for a large number of data mining, in order to find different customer base and important factors, this is not in advance artificially set. To "let the data speak for themselves" in order to adapt to local conditions to develop marketing plans, scientific calculation of break-even, to create more profits for insurance companies.
Large Data dividend
It has been asserted that data will become an important asset for mankind and become a more important resource for reusable development than oil and gold. I also agree with this view. Recently, the media reported that "three horses" together to buy insurance news, this is a large data dividend to save the example. "Three horses" using Alibaba, Tencent and Ping An insurance three companies to grasp the advantages of large data, the establishment of the network insurance companies-public Ann Online, this is a milestone in the Internet financial innovation, aimed at using large data for insurance consumers accurate positioning and precision marketing, aiming at the main is the number of consumers. It can be seen that the use of large data technology will be the future of insurance companies to seize the market is a very important link.
Another useful application would be to use large data to guard against telecom fraud. Telecom fraud is a major disease in today's society, if the telecommunications, banking, Internet, public security and other parties to abandon the interests of entanglement, sharing their large data, so the maximum to eliminate telecommunications fraud is entirely possible. As long as we analyze and excavate the large data of the parties, find out the data factor of the telecom fraud correlation, then establish the dynamic monitoring model, then the police can find the fraudsters quickly according to the data chain once the relevant data appears.
Stocks of experts want to earn large data concept stocks dividend. Where is the dividend for the big data? The owners of large data, large data technology companies and large data value diggers (that is, the data scientists who provide thinking). Ma Yun said: The future of the world is the world of data. The big data age has shaken everything from industry, agriculture, commerce, technology to government, health care, education, culture and other areas of society, and people's lives are increasingly being changed by data. Can say, large data is a more precious resource than oil, gold, who mastered enough data, who grabbed the commanding heights, enhanced competitiveness, also mastered the future.