The purpose of big data: producing small data
weak water 3,000, take only one scoop. If you have everything, then I just need to be able to answer the questions I care about. if we want to use a smartphone to locate within a specified range select an Italian restaurant. With just a few clicks, the smart terminal will list the current location aroundTenItalian Restaurant within a kilometer. This simpleLBSapplication, the database used to be queried is large and complex (the geodatabase includes data for all restaurants around the world, including their basic information, latitude and longitude, street address, user reviews, etc.), but the resulting data set is very small for the content of interest (for example, Only the location of the five restaurants and the corresponding labels will be displayed on our smart terminal, and the exact address, phone number and rating can be added after the click. All we need is to choose a meal in one of the five restaurants.
In this example, the data information that answers the questions we care about is obtained from the large data set. But ultimately your analysis and conclusions are done with a small data set (five restaurants that meet your search criteria).
The purpose of big data resources is to produce a variety of small data sets. There is no analytical work directly in the big data resources, the use of big data resources is generally limited to search and retrieval. Big Data resources actually collect and organize a lot of complex data in a variety of ways, and in such a resource, you are ready to answer all your questions. Of course, in the future, producers and organizers of data have a lot to do, such as how to identify bars and restaurants? What's the difference between a takeout shop and a restaurant? Which data should be collected? What should I do if data loss occurs? How to save data effectively, etc.)
Big data rarely carries out a thorough analysis (and, of course, it is possible), in most cases, by filtering, drastically reducing data dimensions and numbers, and dividing big data into relatively small data. This rule applies to the analysis of data in scientific research.
Pathfinder of "Square kilometer mirror array" in Australia
Pan Star Program (Panoramic Survey telescope and RapidResponse System,pan-starrs)
Large Hadron Collider
(panoramic Survey telescope and Rapid Responsesystem,pan-starrs, translated into panoramic Sky Survey telescope and quick response system) PB level data volume. Researchers are using these raw data to survive small datasets for research and analysis.
Yao Variant
The following example illustrates the feasibility of getting a subset of data from a large data set. Yao variant is a rare supermassive black hole that is released at a velocity close to the speed of light, which is a highly variable energy source of high density and is assumed to be a supermassive black hole in the center of the host galaxy. Yao Variant is one of the most intense celestial activities in the observed universe, and has become an important topic in Galaxy Astronomy. Cosmological explorers want to learn as much as possible about these strange objects. The first step in the study is to collect as much information as possible about the objects associated with the Yao Variant. Then, in all of the collected Yao variant objects, various comparisons, measurements and recognition are made to determine their overall characteristics. Finally found in the wide-area infrared detector (WISE) collectsto thethe entire observable cosmic infrared data, one of the gamma-ray signatures of the Yao variant is not included in other celestial features. Researchers fromWISEdata, the infrared features similar to this gamma ray are extracted, which means that the observed celestial phenomena are -group objects are related to Yiu variants. By on this -The group object was further studied, making the researchers think that about Maxthe group object is a flare variant. It MaxThe objects of a group are analyzed from astronomical data. This is how big data resources work, and a way to construct a small data set that can be used for efficient analysis.
The purpose of big data: producing small data