Large data refers to a huge number of data sets, because it can be mined from the valuable information and received attention. The Wall Street Journal called the Big Data Age, the intelligent production and the wireless network Revolution the three major technological changes leading to future prosperity. The McKinsey report points out that data is a means of production and that big data is the next frontier for innovation, competition and productivity gains. The World Economic Forum's report found big data as new wealth, worth as much as oil. Therefore, the developed countries have to use large data as a new round of competition to seize the commanding heights of the important fingers.
The advent of the big data age
The development of Internet, especially mobile Internet, accelerates the infiltration of informatization to all aspects of social economy and the daily life of the masses. Data show that 1998 global Internet users average monthly traffic is 1MB (megabytes), 2000 is 10mb,2003 year is 100mb,2008 year is 1GB (1GB equals 1024MB), 2014 will be 10GB. Total network traffic up to 1EB (ie 1 billion GB or 1000PB) time in 2001 is a year, in 2004 is one months, in 2007 is a week, and 2013 is only one day, that is, the amount of information generated in a day can be engraved with 188 million DVD discs. China's internet users in the world's first, the amount of data produced every day is also in the forefront of the world. Taobao station has more than tens of millions of transactions per day, the number of daily data generation more than 50TB (1TB equals 1000GB), storage 40PB (1PB equals 1000TB). Baidu Company's current total data is close to 1000PB, the number of storage Web pages close to 1 trillion pages, daily processing about 6 billion times search requests, dozens of PB data. A 8Mbps (megabits per second) of the camera can produce 3.6GB data one hour, a city if installed hundreds of thousands of traffic and security cameras, the amount of data produced per month will reach dozens of PB. Hospitals are also where data is generated. Now, a patient has a CT image of dozens of gigabytes (GB), while the country's annual outpatient service is billions of, and their information needs to be kept for a long time. In short, large data exist in all walks of life, a big data age is coming.
The information explosion does not start today, but in recent years people have been more aware of the rapid growth of large data. On the one hand, the number of Internet users is increasing, on the other hand, the number of networked devices represented by Internet of things and appliances is growing faster. 2007 Global 500 million equipment networking, 0.1 per capita, 2013 Global will have 50 billion equipment networking, 70 per capita. With the development of broadband, the per capita network access bandwidth and traffic also increased rapidly. The global new generation of data has increased by 40% a year, that is, the total amount of information can double every two years, and this trend will continue. At present, a single dataset capacity of more than dozens of TB or even a few PB is not uncommon, its size is too large to be allowed to use conventional software tools to crawl, manage and process its content.
The greater the size of the data, the more difficult it is to process, but the more valuable it may be to dig, which is the reason for the large data heat. First of all, large data reflect public opinion and popular opinion. The massive data produced by netizens on the net record their thought, behavior and even emotion, which is the product of the deep fusion of the real society and the network space in the Information Age, which contains rich connotations and many regularity information. According to China Internet Network Information Center statistics, at the end of 2012, the number of Chinese netizens is 564 million, mobile phone users 420 million, through the analysis of relevant data, can understand the public demand, appeal and opinions. Second, business and government information systems generate a steady stream of data every day. According to Symantec Company's research report, the total information storage of global enterprises has reached 2.2ZB (1ZB equals 1000EB), an annual increase of 67%. Hospitals, schools and banks will also collect and store large amounts of information. The government can deploy sensing units such as sensors to collect information needed for environmental and social management. In 2011, the British Nature magazine published a special issue, said that if the more effective organization and use of large data, mankind will be more opportunities to play the role of science and technology to social development.
Areas of large data applications
Large data technology can be applied to all walks of life. On the macroeconomic front, IBM Japan has set up an economic indicators forecasting system, which searches the Internet news for 480 economic data affecting the manufacturing industry and calculates the forecast value of the purchasing Managers ' index. Using the mood analysis tool provided by Google, Indiana University has summed up six moods from nearly thousands netizens ' comments, thus predicting the changes in the Dow Jones Industrial Index, with an accuracy rate of 87%. On the manufacturing side, Wall Street hedge funds analyzed the sales of their products based on customer reviews of their shopping websites; some enterprises use large data analysis to realize the management of purchasing and reasonable inventory, through analyzing the online data to understand the customer's demand and grasp the market trend. Data show that global retailers due to blind purchases caused by sales losses of $100 billion a year, this analysis of the information.
In agriculture, Silicon Valley has a climate company that obtains decades of weather data from databases such as the United States Meteorological Bureau, and makes a precise chart of the correlation of rainfall, temperature and soil conditions with annual crop yields, predicting farm production and selling personalized insurance to farmers. In the business world, Wal-Mart analyzes sales data, understands customer shopping habits, comes up with goods that fit together, and can segment customer groups and provide personalized services. In the financial sector, Wall Street's "de Winter capital Market" Company analyzed 340 million Weibo account messages, judging people's feelings, deciding whether to buy or sell a company's shares, based on the rules of buying shares when they are happy, and selling them when they are anxious. Ali company based on Taobao on the situation of small and medium-sized enterprises to screen out the financial health and pay attention to the integrity of enterprises, they issued no guarantee of loans. Has lent more than 30 billion yuan, the bad debt rate is only 0.3%.
In health care, the "Google Flu Trend" project, based on Internet search content, analyses the spread of disease across the globe, compared with reports from the U.S. Centers for Disease Control and Prevention, tracking the disease at a rate of 97%. The social network provides a sharing platform for clinical symptom exchange and diagnosis and treatment for many patients with chronic diseases, which doctors can use to obtain statistical data on clinical outcomes that are not normally available in hospitals. Based on the large data analysis of human genes, the individualized treatment of the right remedy can be realized. In the field of social security management, through the mining of mobile data, we can analyze the source, travel, real-time traffic flow information and congestion of floating population. Using text messages, microblogs, micro-letters and search engines, you can collect hot spots, tap public opinion, and track the source of disinformation. Massachusetts Institute of MIT through the 100,000-person mobile phone calls, text messages and space location and other information processing, extract the temporal and spatial regularity of human behavior, crime prediction. In the field of scientific research, scientific discoveries based on intensive data analysis are the fourth example after experimental science, theoretical Science and computational science, and the material genomics and synthetic biology based on large data analysis are emerging.
McKinsey's 2011 report speculated that if large data were used for health care in the United States, a potential $300 billion trillion a year would be available for public administration in Europe, with a potential annual surplus of 250 billion euros for service providers to use personal location data Using large data analysis, retailers can increase operating profit by 60%, and the cost of manufacturing equipment assembly will be reduced by 50%.
Challenges and implications of large data technology
At present, the application of large data technology still has some difficulties and challenges, which is embodied in the four links of large data mining. First, in the area of data collection. To the data from the network, including the Internet and the organization of the information system to attach the space-time logo, Quweicunzhen, as far as possible to collect heterogeneous or even heterogeneous data, if necessary, can be compared with historical data, multi-angle verify the comprehensiveness and credibility of the data. The second is data storage. To achieve low cost, low energy consumption, high reliability objectives, usually using redundant configuration, distribution and cloud computing technology, in the storage of the data according to a certain rules of classification, through filtering and to the weight, reduce storage, and add a tag for later retrieval. The third is data processing. Some industries have data that involves hundreds of parameters, its complexity is embodied not only in the data sample itself, but also in the interaction dynamics between multi-source heterogeneous, multiple entities and multiple spaces, it is difficult to describe and measure the traditional methods, the complexity of processing is very large, and it is necessary to measure and process the dimension of multimedia data such as High-dimensional image. Semantic analysis using contextual associations, synthesizing information from a large number of dynamic and possibly ambiguous data, and exporting comprehensible content. The visual presentation of the results makes the results more intuitive for insight. At present, although the computer intelligence has made great progress, but only for small-scale, structured or class structure of the data analysis, not deep data mining, the existing data mining algorithms in different industries are difficult to general.
The application of large data technology is very promising. At present, our country is in the journey of building a well-off society in an all-round way, the task of industrialization, informationization, urbanization and agricultural modernization is very heavy, the construction of the next generation of information infrastructure, the development of modern information technology industry system, the sound information security system, the promotion of the wide use of information network technology is Large data analysis is of great significance for us to comprehend the situation and national conditions, grasp the law, realize scientific development and make scientific decision, and we must recognize the important value of data.
We have a lot of work to do in order to develop big data in this gold mine. First, large data analysis requires technology and product support with large data. Some information technology (IT) enterprises in developed countries have put forward their efforts to transform themselves into large data solution providers by means of increasing development efforts and mergers. Some foreign enterprises for free to undertake large data analysis of the signs, both for training, but also in order to obtain information. It is difficult to evade the risk of information leakage because it relies too much on foreign large data analysis technology and platform. Some daily life information seems irrelevant, in fact, it can also touch the national economic and social pulse. Therefore, we need to have autonomous control of large data technology and products. In March 2012, the United States government issued the large Data research and development initiative, a major technology deployment following the announcement of the "Information Superhighway" in 1993, and the federal government and some ministries have arranged funds for large data development. We have a lot of gaps with developed countries and we need national policy support.
China's population ranks first in the world, will become the most data-producing countries, but we do not pay attention to data preservation, the utilization of storage data is not high. In addition, some sectors and institutions in our country have large numbers of data but are unwilling to share them with other sectors, resulting in incomplete or repetitive investment. The government should break the data separatist and blockade through the reform of the system mechanism, pay attention to the public information, and should pay attention to data mining. The United States federal government set up a unified data portal to provide information services to the community and encourage mining and use. For example, to provide weather and flight delays around the relationship between airlines to promote the punctuality rate.
The excavation and utilization of large data should be in the laws. At the end of last year, the NPC passed the decision to strengthen the protection of network information is a good start, the current need to develop "information disclosure law" to adapt to the arrival of the big data age. Now many organizations and enterprises have a large number of customer information. It should not only encourage the data mining for the community, but also prevent the infringement of individual privacy, promote data sharing and prevent the misuse of data. In addition, it is necessary to define the rights and scope of data mining and utilization. The security of large data system itself is also worthy of special attention, pay attention to technical safety and management system security, prevent information from being damaged, tampered, leaked or stolen, protect the information security of citizen and country.
The era of large data calls for innovative talents. Geithner consultancy predicts that big data will bring 4.4 million new IT jobs and thousands of non-it jobs worldwide. McKinsey forecasts that the United States needs 440,000-490,000 of the depth of data analysis talent by 2018, a gap of 140,000 to 190,000 people; need to be familiar with the needs of the unit and understand the large data technology and applications of managers 1.5 million, the talent gap is greater. China is a great country of talents, but the innovative talents who can understand and apply large data are scarce resources.
Large data is a concentrated reflection of the new generation of information technology, is an application-driven service field, is a new industry with infinite potential, and its standard and industrial pattern has not yet formed, which is a valuable opportunity for China to realize the leap-forward development. We should pay attention to the development and utilization of large data from the strategy, take it as the effective hand of changing the mode of economic growth, but pay attention to scientific planning, avoid herd.
(Author: Academician of Chinese Academy of Engineering)
(Responsible editor: Lu Guang)