American social thinker Toffler in the third wave, said, "If IBM's mainframe opened the information revolution of the big screen, then ' Big Data ' is the third wave of the CLS movement." "Large data with its" easy to understand "concept, a wide range of potential application needs and the enormous economic and social benefits to be expected, is becoming a further hot spot in the field of information technology after cloud computing and Internet of things, and will have a profound impact in all areas of socio-economic development.
In June 2011, the McKinsey Global Research Institute (MGI) published a study titled Big Data: The next frontier for innovation, competition and productivity, first proposing "The big data Age has come" and raising global attention to big data from an economic perspective. The report noted that the current large data scale and its storage capacity is growing rapidly, has penetrated into various industries and business functional areas, as can be compared with physical assets and human capital is an important factor of production. Large data is the next technology frontier for productivity improvement following traditional it. With appropriate policy incentives, the use of large data will be a key ingredient in future competitiveness, productivity, innovation and the creation of a consumer surplus, making the biggest difference between a leader and another. Companies that have not introduced new analytical techniques and new data types are unlikely to become leaders in their industry.
In March 2012, the Obama administration announced the launch of a "big Data research and development program". The program involves the National Science Foundation of the United States, the National Institutes of Health, the United States Department of Energy, the United States Department of Defense, the U.S. Department of Defense Advanced Research Program, the United States Geological Survey 6 federal departments, has pledged to invest more than 200 million U.S. dollars, vigorously promote and improve the large data-related collection, organization and analysis tools and technology, To advance the ability to acquire knowledge and insights from a large and complex set of data. The Obama administration's announcement that investing in large data areas is a watershed in big data from business behavior to national strategy shows that big data is officially elevated to the strategic level, with large numbers beginning to be valued at all levels of the economy and in every field.
The triple connotation of large data
There is no uniform definition of large data in the industry. Different vendors, different users, different points of view, the understanding of large data is not the same. The basic definition of large data in the McKinsey report is that large data is a collection of data that is larger than the capabilities of typical database software acquisition, storage, management, and analysis. According to Sadie, the big data is a relative concept, and there is no strict standard that limits how large data sets are to be called large data. In fact, as time passes and advances in data management and processing technologies, the scale of data sets that meet large data standards is and will continue to grow. At the same time, the size of "big data" is not uniform for different industry areas and applications.
Although the "big Data" directly represents the static object of the data collection, however, the "big data" mentioned above is not only the large-scale data collection itself, but also the unification of data object, technology and application.
1. From an object perspective, large data is a collection of data that is larger than the capacity of typical database software acquisition, storage, management, and analysis. It should be noted that large data is not a large number of data simple, meaningless accumulation, the large amount of data does not mean that there must be a considerable use of the future. Because the ultimate goal is to obtain more valuable "new" information from large data, it is necessary to require these large numbers of data to exist or far or near, or direct or indirect correlation, it has a considerable value in the analysis and mining. Whether the data is structurally and correlated is an important difference between "big data" and "large-scale data".
2. From a technical standpoint, large data technology is a technology that quickly obtains valuable information from various types of large data and its integration. The biggest difference between "big data" and "mass data" and "mass data" is that the concept of "big data" contains the processing behavior of data objects. In order to be able to complete this behavior, quickly mining more valuable information from large data objects, to make large data "alive", we need to use flexible and multidisciplinary methods, including data clustering, data mining, distributed processing, and so on, which requires the ability to integrate all kinds of technology and hardware and software. It can be seen that large data technology is an important tool for the discovery and presentation of the value contained in large data.
3. From an application perspective, large data is the behavior of large data sets, integrated application of large data technology, and access to valuable information. is due to the specific application of close contact, or even one-to-one contact, only to make "application" become an indispensable part of the content of large data.
It needs to be clear that the ultimate goal of large data analysis is to discover new association rules from complex data sets, and then to dig deeply and get new information effectively. If the amount of data is not small, but the data structure is simple, high repeatability, analysis and processing requirements are only based on the existing rules of data grouping classification, not with the specific business close combination, relying on the existing basic data analysis and processing technology is enough, it can not be counted as a complete "large data", but "large data" of the primary stage of development.
The impact of large data on the information industry
The upsurge of large data in the new generation of information technology convergence, the application of IoT, mobile Internet, digital home, social network and other applications make the data scale rapidly expand, the demand for large data processing and analysis is increasing, which promotes the development of large data field. In turn, large data analysis and optimization results are fed back into these applications, further improving their use experience, supporting and promoting the development of the next generation information technology industry.
"Software and Information services research," said the report, large data will bring new growth point for the information industry. IDC predicts global data will reach 10 trillion TB in 2015. Faced with the massive data of explosive growth, the information system based on traditional architecture is difficult to deal with, and the traditional business intelligence system and data analysis software are lack of effective analysis tools and methods in the face of large data, such as video, picture, text and other unstructured data. The information system is confronted with the urgent need of upgrading, which brings new and broader growth points to the industry.
At the same time, large data will accelerate the integration of information technology products innovation and development. Large data is facing the challenges of effective storage and real-time analysis, which will have a significant impact on chip and storage industry, and promote innovation of integrated data storage processing server and memory calculation. The requirement of rapid data processing and analysis will promote the application of business intelligence, data mining and other software in enterprise-level information system, and become an important means of business innovation.