June 19, 2013, the author of the micro-letter of concern on a common account "oil prices early know" pushed a message: "Oil price early know the friendship tip: according to the oil price of the following analysis, the early morning of June 22 oil price or increase (probability of more than 70%), up to about 100 yuan/ton. "The next day, oil prices have been known to continue to prompt the increase in oil prices, and given a rise of 0.1 yuan/liter, June 21, the oil price early know that the news is already released by the NDRC's oil price upgrade notice.
Oil prices have been known three days ahead of time to forecast the oil price adjustment information, since the line, their forecast accuracy rate has exceeded 95%! This is a typical example of large data applications, which is what I see as a good example of a large data application landing in China.
About large data, starting last year, in the world with cloud computing, Internet of Things, 3D printing and so on, has become a hot topic. But what is big data? What are the characteristics of big data? How do we apply large data? What changes will it bring to our lives? The discussion of these problems has been carried on, many enterprises are thinking, how to use large data in enterprise IT construction, realize the innovation of enterprise operation.
Large data, the definition of Baidu is: refers to the amount of data involved in a large scale to not through the current mainstream software tools, in a reasonable time to achieve capture, management, processing, and collation to help the business decision-making more positive purpose information.
Gartner gives this definition. "Big Data" is a huge, high growth rate and diversified information assets that require new processing models to have more decision-making power, insight discovery and process optimization capabilities.
IBM's 4V description of large data features is currently accepted by the industry: (1) Volume, data volume is huge. From TB level, jump to PB level, (2) produced, data type is numerous. Not only traditional formatting data, but also web logs, videos, pictures, geographic information, etc. from the Internet. (3) value, low density, high commercial value. In video, for example, a continuous uninterrupted monitoring process may be useful for only two seconds. (4) Velocity, fast processing. 1 Second law. This last point is fundamentally different from the traditional data mining technology.
If you simply follow these four features to understand large data, you may be able to interpret large data as either a full volume or a holographic data. And such data applications seem to be built only in oversized or large projects, and what is the difference between these and traditional data warehouses?
One of the first data scientists to have insight into the trends of the big data age, the three big data features given by Schoenberg, may give us a better understanding of big data. Schoenberg's large data features can be described in three words: more, more chaotic, more relevant.
More here, for the research object itself, to consider more dimensional information related to the object, rather than the traditional enterprise internal information, such as the operator in the study of customer out rate prediction, not only to study customer billing data, but also to the customer's location information, Even in the SNS online speech information and so on added in. So big data doesn't necessarily have to be full (and who can define what the full amount is?). But only gradually increasing "more".
More chaotic, is the collection of data noise more, even in the study of a problem, the predicted results of a large disturbance of the data dimension. This requires the use of the Internet "trial and error" thinking, and constantly study possible in the collection and data processing in the formation of noise, repeated practice, in large data to Amoy out the most useful "small data." The above mentioned oil prices early in the application, one of the developer experience is in the context of SNS text information processing of the continuous algorithm adjustment, the removal of noise, including other topics on the oil price interference, so that small data sets more accurate. For example, a related big V in the discussion of taxi prices, said that if the taxi price rises, then the oil price must also be higher. The human brain can quickly judge such a language, subject to the question of taxi prices, and the machine is hard to understand. If you get the information about the oil price rising from such a sentence, it is a disturbance to the whole oil price judgment.
Relevance is the finding of correlations between data and better predictions of the development of research objects. An example of how Google's engineers can predict influenza earlier than the U.S. official Health department is a good illustration. Google's data engineers are not pathologists, and they cannot know what the cause of the flu is, but they can predict the impending onset of influenza by showing some information about the flu.
From the above three features and examples, large data applications, not just national strategy, enterprise strategy such a large application, it can precisely through countless and our lives closely related to the "small application" to continue to promote development, from the high altar to walk down into the real market application.
However, in the large data gradually into our daily life, we should also be aware that any technology development, is a normative (system), technology, application and constantly cooperate with each other for the common development process. The recent uproar of the "Prism Gate" incident, let the people have a cool understanding of the big data. June 17, the author on Weibo wrote this sentence: "The snow Snowden incident at the end of the" data right "to the public view, who would like to live in the 1984 under the rule of Big Brother? Some want to be Big brother, but the people are not last century. The first hurdle of large data or the first development breakpoint appears gradually. ”
Therefore, as soon as possible to achieve "normative (system), technology, application" matching should also be the responsibility of every practitioner. The manufacturers who manipulate large data technology can participate more in the basic normative research, and the application Explorer can accumulate experience and participate in the construction of basic theory in the process of deepening application. and the relevant departments shouldering the direction of national information should attach great importance to the construction of large data norms (System), after all, this is not what an industry or enterprise can be completed alone.
Small application of large data, is turbulent flow into our lives, oil prices early know is a good example, with this exploration, I believe that life is closely related to our health early know, travel early know, traffic early know, stock early know ... It's not far.