When it comes to big data, it feels like the word has been burning all over the world all night, and the government is studying it to make a science plan; Scholars study it as the first step of revolutionary innovation; the company studies it, sees it as the leader of the future, who misses, who is behind the times. The heat of big data does not mean that a deep understanding of big data, but rather an excessive hype risk, such as cloud computing. OK, so what is the magic thing, and what can be done, the following is my understanding of the big data, wrong, bad please point out.
First of all, the concept of it! There are many descriptions of it on the web, and a lot of it refers to a collection of data that cannot be sensed, acquired, managed, processed, and serviced by conventional machines and hardware and software tools in a certain amount of time (the average quantity is to reach pt level). The description of the concept should be well understood.
In terms of some historical sources of large data. Looks like a lot of new things, new technology, new ideas are starting from the United States, this time is no exception. As early as the Clinton administration, the United States announced the implementation of the National Information Infrastructure Program, an information network of communications networks, computer networks, data networks and consumer electronics that transmits images, languages, text to institutions and families. And then in the Ministry of Defense issued NCW plan, and the development of each year. In this context, March 29, 2012, http://www.aliyun.com/zixun/aggregation/10075.html "> The United States Government announced the" large Data Research and development Initiative (BDR DI), intended to promote a large number of Acquisition of knowledge and insight in complex data sets. By the time the Obama administration launched a "Big Data research and Development initiative", the Big Data era began, and the US and the world started a big data-research boom.
In terms of some of the characteristics of large data, this is the introduction of large data generally have to say, then I also talk about. The main generalization is 4v. namely: scale (volume), multiplicity (produced) and rapidity (velocity) + authenticity (veracity) or value. Scale is actually the size of the data, such as the U.S. stock market daily turnover of 7 billion shares, the amount of data Google processed every day is 24PB ... The size of these data is unimaginable before. Diversity is a description of the diversity of data types, including: Pictures, text, videos, numbers, ... It is because the diversity of data types makes the analysis of large data difficult. Rapidity is the speed at which data volumes grow. But why the fourth one has two definitions, in fact, is a different point of view, one is the IDC company that large data is by value, one is IBM think large data must have authenticity.
If you want to study large data should have some changes in thinking, there are mainly the following three kinds: (1) can analyze more data, associated with the data can be analyzed, but no longer rely on sampling. (2) no longer pursue accuracy. When there is a large amount of real-time data, absolute precision is no longer the main goal of pursuit, properly ignoring the micro-level of precision, will have a better insight at the macro level. (3) no longer keen to find the causal relationship between things, but the relationship between each other. The relationship can not accurately explain the cause of a social phenomenon, but it will reveal its development process. Perhaps this is the difference between traditional data mining. There is an image of the analogy, the previous data mining pond fishing, to large data is the sea fishing, the previous analysis is toward their own booking target direction to dig, and the big Data era is not know what results, perhaps nothing, so like the sea fishing. The most famous example of this is the relationship between beer and diapers, which was not known to be relevant, but the correlation was obtained through the analysis of large data.
Finally, I want to talk about some applications of large data. Remember the core value of large data analysis first. Full samples to eliminate the impact of information noise, improve the accuracy rate, because of efficiency, to compensate for accuracy. Although the traditional sampling statistics may be more accurate, but often lag, this lag is sometimes fatal. The combination of large data era and social computing, traffic engineering in large data era, only power grid in large data era, large data analysis and the change of marketing means can be applied in many aspects.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.