In recent years, with the rapid development and popularization of computer and information technology, the scale of industry application system expands rapidly, and the data produced by industry application is exploding. With hundreds of TB or even dozens of to hundreds of petabytes of industry/enterprise data that is far beyond the existing traditional computing and information systems processing capabilities, the search for effective data-processing technologies, methods and means has become an urgent demand in the real world. Baidu's current total data volume has more than 1000PB, the daily need to deal with the Web page data reached 10PB~100PB, Taobao accumulated transaction data volume up to 100PB; Twitter posted more than 200 million messages a day, Sina Weibo posted up to 80 million a day, telephone communications recorded in a province of China Mobile to 0.5PB~1PB Monthly, a provincial capital city public Security Bureau road vehicle monitoring data three years up to 20 billion, total 120TB. According to the world's authoritative IT Information consulting analysis company IDC Research Report predicts: The world's data volume in the next 10 years from 2009 0.8ZB to 2020 35ZB (1ZB=1000EB=1000000PB), 10 will increase 44 times times, an annual growth of 40%.
In the early years people called massive data "massive data," but in fact, the idea of big data was raised as early as 2008. In 2008, on the 10 anniversary of the founding of Google, the famous Nature magazine published a special issue devoted to the future of large data processing related to a series of technical issues and challenges, which put forward the "big data" concept.
With the popularization of large data concepts, people often ask, how large data is called large data? In fact, it is difficult to have a very quantitative definition of large data. Wikipedia gives a qualitative description: Large data is the inability to use traditional and commonly used software technology and tools to complete the acquisition, management and processing of data sets within a certain time. Further, the focus of today's "Big data" is in fact not just the definition of data scale, it represents the development of information technology into a new era, represents the explosive data information to traditional computing technology and information technology challenges and difficulties, representing the large data processing needs of new technologies and methods, It also represents new inventions, new services and new development opportunities brought about by large data analysis and application.
Due to the urgency and importance of the large data processing demand, in recent years, the large information technology has been paid great attention and attention in the global academia, industry and governments, and the world has set off a research upsurge which can be compared with the informational expressway in the the 1990s. The governments of some developed countries in the United States and Europe have put forward a series of large data technology research and development plans from the National science and Technology strategic level to promote the research and application of large data technology by government agencies, major industries, academia and industry.
As early as December 2010, the Scientific and Technical Advisory Council (PCAST) and the Information Technology Advisory Board (PITAC), a U.S. president's office, presented a strategic report to Obama and Congress on the planning of the digital future, raising the level of large data collection and use to the strategic height of national will. The report cites 5 common challenges across science and technology, and the first major challenge is the "data" issue. "How to collect, preserve, manage, analyze and share data that is growing exponentially is an important challenge that we have to face," the report said. "Every agency and department in the federal government needs to develop a ' big data ' strategy," the report suggests. In March 2012, U.S. President Barack Obama signed and released a "Big data research and Development Innovation Program" (Large R & D initiative), by the National Nature Foundation (NSF), the Health and Health Administration (NIH), the Department of Energy (DOE), the Ministry of Defence (DoD) 6 major sectors joined together to invest 200 million U.S. dollars to start large data technology research and development, this is the United States government in 1993 announced the "Information Highway" plan after another major technology deployment. The White House Office of Science and Technology policy has also specifically supported the establishment of a large data technology forum to encourage large data exchange and cooperation between enterprises and organizations.
In July 2012, the United Nations published in New York a white paper on big data, big data for development: challenges and opportunities, and the world's largest data research and development has entered an unprecedented climax. This white paper summarizes how governments use large data to respond to social needs, guide the economy and better serve the community, and recommend that Member States establish "Pulse laboratories" (Pulse Labs) to tap into the potential value of large data.
Due to the characteristics and importance of large data technology, the concept of "data science" has emerged at home and abroad, i.e. data processing technology will become a new scientific field in parallel with computational science. The late famous Turing prize winner Jim Gray in a speech in 2007, "data-intensive scientific discovery" (data-intensive scientific Discovery) will become the fourth paradigm of scientific research, scientific research will be from experimental science, theoretical Science, Computational science, developed to the current emerging data science.
In order to keep pace with the global development of large data technology, our government, academia and industry have also paid great attention to the large data. CCTV's famous "dialogue" program April 14, 2013 and 21st invited the "Big Data Age-life, work and thinking of the great change," author Victor Maire-Schoenberg, as well as the United States large data storage technology company LSI president of the "dialogue" program, did a two-year Big data topic talk show " Who is exploding big data "," who is digging bokhary data ", the national CCTV media to the big Data's attention and the propaganda manifests that the big data technology has become the national and the Society general attention focus.
And the domestic academia and industry are also rapid action, extensive research and development of large data technology. Since 2013, the National Natural Science Foundation, the 973 plan, the nuclear Shishou, 863 and other major research projects have included large data research as a major research topic. In order to promote the research and development of China's large data technology, the 2012 China Computer Society (CCF) launched the CCF large data expert committee, CCF Expert committee also established a "large data technology development strategy Report" writing group, and has written published the " 2013 White Paper on large data technology and industry development in China.
Large data brings huge technological challenges, as well as great technological innovation and business opportunities. The accumulation of large data contains many in the small amount of data does not have the depth of knowledge and value, large data analysis and mining will be able to bring great business value to the industry/enterprise, to achieve a variety of high-value value-added services, and further enhance the industry/enterprise economic and social benefits. Because big data implies huge depth value, the US government sees big data as "the new oil of the future", which will have far-reaching implications for future technology and economic development. Therefore, in the future, a country with the size of data and the ability to use data will become an important component of comprehensive national strength, the possession of data, control and use will become a new focus between countries and enterprises.
The application of large data research and analysis is of great significance and value. Victor Maire-Schoenberg, known as "The Prophet of large data times", enumerates a large number of detailed data application cases in his book "Big Data Times", and analyzes and forecasts the development status and future trend of large data, and puts forward many important viewpoints and development ideas. He said: "Big Data has opened a major transformation of the times," said Big data will bring great changes, change our lives, work and thinking, change our business model, affecting our economic, political, technological and social aspects.
With the growing demand for applications in the large data industry, large data parallel computing will be needed in the future for a growing number of research and application areas, and large data technologies will permeate every application area involving large-scale data and complex computing. In addition, the technology with large data processing will have a revolutionary influence on traditional computing technology, which affects computer architecture, operating system, database, compiling technology, program design Technology and method, software engineering technology, multimedia information processing technology, artificial intelligence and other computer applications. And the combination of traditional computing technology to produce a lot of new research hotspots and topics.
Large data poses many new challenges to traditional computing technology. Large data makes it difficult for many traditional serialization algorithms to be effective in small datasets, which can not be completed in an acceptable time in the face of large data processing, while the characteristics of large numbers contain more noises, sparse samples and unbalanced samples, which makes the effectiveness of many existing machine learning algorithms reduced. As a result, Dr. Lurkey, Microsoft's global vice president, noted in the 2012 national first "China Cloud/Mobile Internet Innovation Grand Prix" award Conference Theme Report: "Large data makes most of the existing serial machine learning algorithms need to be rewritten."
The development of the
Large data technology will bring new challenges and opportunities to the professionals who study computer technology. At present, domestic and foreign it enterprises demand for large data technology talents is growing rapidly, in the next 5-10 years the industry will need a large number of skilled personnel to master the data processing technology. The IDC study says, "in the next 10 years, the number of servers worldwide will increase by 10 times times, while enterprise data Center management data information will increase 50 times times, enterprise data center will need to process the number of data files to grow at least 75 times times, and the number of IT professionals in the world can only increase by 1.5 times times. "As a result, there will be a huge gap between large data processing and application requirements and the number of skilled personnel available over the next 10 years." At present, because the domestic and foreign colleges and universities to develop large data technology personnel training time is not long, technology market master the large number of data processing and application development technology is very scarce, so the technical personnel in this area is very popular, in short supply. Almost all the well-known it enterprises in China, such as Baidu, Tencent, Alibaba and Taobao, Qihoo 360, and so on, a large number of large numbers of technical personnel need.