After the arrogant data has entered the people's view, it gradually becomes the focus that people pay attention to generally. Large data is the PB era of science, in essence, the challenge of large data is the PB era of science challenges, but also the cognitive science including data mining challenges. So what about data mining in the big data age?
In modern times, people often say that large data mainly includes three sources: the first is the large data of nature, the natural environment on Earth, very big. The second is the big data of life. The third and most important is the social data that people care about. These data are commonly found in people's mobile phones, computers and other devices. Today a report can be known to people all over the world within 3 minutes.
Obama's inaugural social arena, so many faces, every face has a story, behind each person has big data support. Human face is a very important identifier of data security, how to recognize the face clearly? People think a lot of ways. Now there are 800,000 cameras in Beijing, we drive and shop under the supervision of the camera every day. We can use the camera to do identity authentication, age identification, emotional computing, kinship discovery, psychological recognition, regional recognition, national recognition. The main form of streaming media is unstructured, the correlation between features, the accuracy of equipment algorithm and so on, all seriously restricts the progress of large data face mining. How to extract the required feature attributes from these massive data and clarify the relationship between the features is now a problem.
Technology drives computer development
The Turing model was presented by the 1936 genius mathematician Turing, which later turned the Turing model into a physical computer, which had three large chunks: CPU, operating system, memory and external storage, as well as input and output. In the first 30 years of computer development, the CPU, operating system, software, middleware, and application software were the most invested. At that time people focused on the improvement of computational performance, we call this era of computing age.
Computing has made great efforts in software, especially high-performance computers. We think that the calculation has played a leading role in the first 20 years, and its sign speed is the molar velocity. In this era of computing leadership, we mainly do the excavation of structured data. The father of the relational database Edgar presented a relational model in 1970, which uses relational algebra as its core operation and represents the relationship between entities and entities in two-dimensional form. Over the past thirty or forty years, databases and data warehousing techniques from all walks of life, as well as data mining from Database discovery Knowledge, have become a huge information industry.
Relational algebra is the formal theory and constraint of relational database, first, the top-level design and data structure, then fill in the data after cleaning. Data revolves around the structure, and the data goes around the program. Users need not care about the data acquisition, storage, analysis and extraction process. Through the data mining, we can find the classification knowledge, the related knowledge, the sequential knowledge, the abnormal knowledge and so on from the database.
With the expansion of the database industry, people are not satisfied with the database, so the databases into large data, this encountered two unavoidable challenges, the first challenge is because the formal constraints of relational algebra is too harsh to represent the real data; The second challenge is that as the volume of data increases, The performance of relational algebra operations dropped dramatically. At this time, our storage technology has been rapid development, the human entered the search age. Search because storage is cheap, storage speed of about every 9 months, so storage driven the pace of technology, this search era after more than 20 years of development, led us into a semi-structured data mining era. The representative of this era is the father of the World Wide Web Berners-Lee, he proposed hypertext ideas, developed the world's first Web server, so we can retrieve the contents of another server from one server, the server with the support of software can be published including text, tables, pictures, The fragmentation of audio and video hypermedia information.
As a result, the client server architecture and cloud computing structure are booming, and there is no formal constraint on algebra, relying mainly on norms, standards, all media in the form of entity, or even software, entities through hyperlinks to produce links.
Formal theory is much looser than relational algebra, creating a flexible and diverse entity, at which point the data begins to revolve around the entity, and the entity moves around the link. In the background of cloud computing, data mining can also be regarded as the search and personalization service in the cloud computing environment, there is no fixed query method and no unique, 100% accurate query results.
Large data Mining in network
With the 6-month Internet bandwidth doubling, humans have entered the interactive era, interacting with the development of computing and storage.
The large data mining in the mobile internet age is mainly the unstructured data mining in the networked environment, which reflects the fresh, fragmented, heterogeneous and sentimental original ecological data.
The characteristic of unstructured data is that it is often low value, strong noise, heterogeneous, redundant cold data, a lot of data in the memory is no longer used. The formal constraint of data is becoming more and more relaxed, and it is getting closer to Internet culture, window culture and community culture.
The attention of the object has also undergone great changes, mining attention is the first small audience, only meet the needs of small people to meet the demand for more small public, so that an important thought is to be better than the top of the Top-down design, emphasizing the authenticity and timeliness of mining data, to find links, find anomalies, find trends , to find value in short.
At present, depth learning is a kind of data adaptive simplicity. If we are in the depth of Baidu search for a face pixel search, so many people face who is who? The rapid increase in data volume, various media forms can be fragmented, organizational structure and mining procedures around the data, the program to be fragmented, and can be virtual reorganization at any time, Mining is often the discovery of different communities in the human-computer interaction environment and community intelligence, in the unstructured data mining, the natural data cleaning, the natural formation of semi-structured data and structured data to improve the efficiency of data use.
Swarm intelligence is a word that has been said a lot lately, we used to do a Turing test on the computer, let the computer distinguish which code is human generation, which is the machine generated, this is Mellon University, in the network shopping, login site, application site will encounter the adaptation code is used. To mention the third representative, Louis, he proposes to apply the appropriate code.
If cloud computing supports large data mining to discover value, then we think that cloud computing is inherently an internet-based model of public participation, whose computational resources are dynamic, scalable, virtualized, and provided in a service way. Produced from the traditional configuration brought about by the system upgrades, more concise, flexible and diverse, personalized, mobile phones, games consoles, digital cameras, television sets of different nuances, there have been more icloud products, user-friendly interface, personalized, can become a large data mining terminals.
The excavator supports a variety of large data applications, if we have data collection center, Storage Center, Computing Centre, Service Center, must have data mining center, so that can support large data timely application and value of timely discovery.
Large data signs a new era, the characteristics of this era is not only the pursuit of rich material resources, and not only the ubiquitous Internet to bring convenient and diverse information services, but also contains different from the material value of data mining and value conversion, The discovery of information value in the virtual world leads to more precise control of material and energy in the physical world, as well as a new spiritual and cultural phenomenon brought about by large data mining.