In March 2012, the Obama administration announced the launch of a "big Data research and development program". The program involves the National Science Foundation of the United States, the National Institutes of Health, the United States Department of Energy, the United States Department of Defense, the U.S. Department of Defense Advanced Research Program, the United States Geological Survey 6 federal departments, has pledged to invest more than 200 million U.S. dollars, vigorously promote and improve the large data-related collection, organization and analysis tools and technology, To advance the ability to acquire knowledge and insights from a large and complex set of data.
The Obama administration's announcement that investing in large data areas is a watershed in big data from business behavior to national strategy shows that big data is officially elevated to the strategic level, with large numbers beginning to be valued at all levels of the economy and in every field.
National Science Foundation: promoting the core technology of large data science
The National Science Foundation and the National Institutes of Health will conduct joint tenders for large data to improve the ability to extract and analyze information.
The National Science Foundation and the National Institutes of Health will conduct joint bidding for large data, the aim is to improve the core scientific and technological tools, improve the ability to extract important information from a wide range of large data sets and effectively manage, analyze and visualize it, accelerate the production of scientific and technological achievements, and lead the country into a new Previous unattainable research field. Among them, the Institute of Health is particularly interested in data sets related to health and disease, including imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological and clinical data sets. The National Science Foundation, in addition to providing funding for large data tenders to maintain its focus on basic research, a comprehensive and long-term strategy is also being implemented, including the development of new methodologies for more effective knowledge acquisition from data, related infrastructure investments, management, organization and data delivery for large data research groups Research on new methods of education and personnel training.
Specific initiatives have been taken to encourage research universities to set up interdisciplinary postgraduate courses to nurture a new generation of data scientists and engineers, and to invest $10 million in a computing development project at the University of California, Berkeley, which integrates 3 powerful data-transfer methods, including machine learning, Cloud computing and crowdsourcing; provides the first phase of financial support for "Earthcube", which will allow Earth scientists to acquire, analyse and share Earth-related information, and pay a 2 million-dollar bonus to a research and training team to support a student training program, Teach them how to use graphics and visualization tools to parse complex data; A professional research group of statisticians and biologists is providing 1.4 million of billions of dollars in research and development funding to study protein structures and biological pathways; Explore how to use large data to transform education and learning patterns.
DOD: Using Data support decisions
The U.S. Department of Defense will invest 250 million of dollars a year in support of research programs aimed at innovating ways to use massive amounts of data.
The Pentagon "has placed huge bets on big data," they invest about 250 million dollars a year (60 million dollars to support new research projects) and conduct a series of research projects in various military departments aimed at using massive amounts of data in innovative ways, creating real, Autonomous systems capable of independently manipulating and making decisions, improving the environmental and situational awareness of combatants and analysts, and enhancing support for tasks and processes. The Defense Department's goal is to increase the ability of analysts to extract information from arbitrary language and text data by up to 100 times times, and to expect that the number of goals, activities and events they observe will be increased to the same extent.
The specific projects carried out include: Multi-scale anomaly Detection Project, which aims to solve the anomaly detection and characterization of large-scale datasets. At present, multi-scale anomaly detection applications can conduct internal threat detection and detect individual abnormal actions in the daily network activity environment. The Insight program is mainly to solve the existing information, surveillance and reconnaissance system deficiencies, automation and human-computer integration reasoning, so that the time sensitive to the larger potential threat analysis. The plan aims to develop a resource management system for automatic identification of network threats and unconventional acts of warfare through the analysis of image and non-image sensor information and other sources of information.
In addition, the U.S. Department of Defense Advanced Research Projects Bureau will carry out XData project. The project plans to invest 25 million dollars a year in the next 4 years to develop computing and software tools capable of analyzing massive, semi-structured and unstructured data. The core issues to be addressed include the development of scalable algorithms for processing irregular data in distributed data repositories, and the creation of effective human-computer interaction tools to support fast customizable visual analysis for a variety of processing tasks. The XData project will support the open source software toolset to help developers flexibly develop software that enables users to quickly implement massive data processing capabilities and keep pace with mission data flows for specific defense applications.
National Institutes of Health: free access to the thousand-person genome Project data
Free access to research datasets of human genetic variability created by the international People's Genome Project, for free visit and use by researchers.
The National Institutes of Health (NIH) announced that the largest data set on human genetic variability, created by the international People's Genome Project, is publicly available on Amazon's cloud services (AWS). Up to now, the volume of data has reached about 200TB, the equivalent of 16 million file cabinets filled with text data or more than 30,000 standard DVDs. The data set is extremely large, and few research institutions have enough computing power to make efficient use of it. Now AWS has made the thousands of genome Project data sets available for free access and use by researchers, who only pay for the computing services they use.
Department of Energy: accelerating scientific discoveries through advanced computing technology
The Department of Energy will spend $25 million to build an extensible data management and visualization Institute to help scientists manage data effectively.
The United States Department of Energy will spend $25 million to establish an extensible data management, analysis and Visualization (SDAV) Institute. The Sdav Institute will bring together the expertise and experience of 6 national laboratories and 7 prestigious universities to develop new tools, organized by the Department of Energy's Lawrence Berkeley National Laboratory (Lawrence Berkeley Nation Laboratory), Help scientists effectively manage and visualize data on the Department of Energy supercomputer. The move will further streamline and accelerate the development process, enabling scientists to use the department's research facilities for more fruitful scientific research and discoveries. The need for these new tools is all the more pressing as the current stream of data streams running on the Energy Department's supercomputer is growing in size and complexity. Major project plans carried out include:
The High-performance storage System can analyze and process the petajoules data, extract the information from the huge scientific dataset, discover its main features and understand the relationship. The system is widely adapted from the Department of Energy to the grid, including cosmology and weather data, sensor data, etc.
Biological and Environmental Research Program: Atmospheric Radiation Measurement Climate Research facility is a multi-platform scientific user facility that provides accurate observational studies of important atmospheric phenomena. It is primarily used to address the challenge of quickly gathering and submitting solutions from hundreds of files to meet the needs of users.
The United States Nuclear data program is a multifaceted effort involving 7 national laboratories and two universities, providing specialized databases spanning multiple fields, nuclear physics, compilation and cross-validation, relevant experimental results for the important nature of all nuclear nuclei, maintenance and extensive use.
(Responsible editor: The good of the Legacy)