The big data detonated the Internet information, it not only updated the Internet Data Application mode, more importantly, it also deeply affect people's lives. Large data age, many people realize that the understanding of data analysis from the "backward analysis" towards the "forward analysis" direction change. As a result, people's thinking pattern has been changed, which brings about the problems of data, analysis and usage. If these challenges are addressed, it will also mean the beginning of large data development in depth.
Data statistical analysis what is the matter
In recent years, the rapid development of it communication industry, including Internet, IoT, cloud computing and so on, the rapid growth of data has become a serious challenge and valuable opportunity faced by many industries, so the modern information society has entered a big data age. In fact, the big data change is not just people's daily life and working mode, enterprise operation and business model, and even cause the fundamental change of scientific research model. In general, large data is a collection of data that cannot be perceived, acquired, managed, processed, and serviced by conventional machines and hardware and software tools within a certain period of time. Network large data refers to "people, machines, things" in the network space of the three-dimensional interaction and integration generated by the Internet and the large data available.
The application of data to the production of life can effectively help people or enterprises to make more accurate judgments of information in order to take appropriate action. Data analysis is the process by which organizations collect data, analyze data and make it into information. It refers to the process of using analytical methods to deal with the data in order to solve the problem of decision-making or marketing in life production. The so-called data statistical analysis, is the use of statistical methods to deal with the data. In previous market research work, the data statistic analysis can help us to excavate the hidden information in the data, but this kind of data analysis is "The Backward Analysis", analyzes is what has already happened. In large data, statistical analysis of data is "forward analysis", which is predictive.
Ii. How to analyze large data
1. Visual analysis.
The data is structured, including relational databases in the raw data, and the data is semi-structured, such as our familiar text, graphics, and image data, as well as data from different configurations of the network. Through the analysis of various data, it is clear that different types of knowledge structure and content can be found, including generalized types of knowledge that reflect the representations, and that are used to reflect the aggregation patterns of data or to distinguish between their categories according to their attributes; An associative knowledge that reflects dependency or association between an event and other events; predictive knowledge of future data based on current history and current data. At present, there are many new techniques of knowledge discovery, one of which is visualization method. Data visualization technology has 3 distinct characteristics: first, with the user's strong interaction. Users are no longer the subject of information dissemination and can easily manage and develop data interactively. Second, the multidimensional nature of the data display. Under visual analysis, the data classifies, sorts, combines, and displays the values of each dimension, so that you can see multiple properties or variables that represent the data for an object or event. Third, the most intuitive visual characteristics. Data can be displayed with images, curves, two-dimensional graphics, three-dimensional bodies and animations, and can be visually analyzed for their patterns and interrelationships.
2. Data mining algorithm.
Data mining refers to the knowledge discovery in the database, whose history can be traced back to the first KDD International conference convened by the United States Detroit in 1989, and the first International conference on Knowledge Discovery and Data Mining (MINING,DM) was held in Canada in 1995, The data-mining term was quickly circulated by the Conference, which vividly compared the information stored in the database to deposits. The purpose of data mining is to find useful and appropriate data from a large amount of data in a cluttered database, and to reveal the hidden and unknown potential value of information. In fact, data mining is just one step in the entire KDD process.
There is no unified definition of data mining, in which "data mining is a never complete, ambiguous, large and contains a lot of noise in the actual application data, the extraction of the implied, not known in advance, but potentially useful knowledge or patterns of the process" is widely accepted definition. In fact, the information contained in the definition-a large number of real data sources contains noise, new knowledge that satisfies the needs of the user, knowledge that is understood and applied effectively, and knowledge that is not required to apply to all areas and can only support a particular application discovery problem. These characteristics all show its effect on data processing, in the effective processing of large and disorderly data, but also can find hidden in the data of useful knowledge, and ultimately for decision-making services. From the perspective of technology, data mining is the use of a series of related algorithms and techniques to extract from a large number of data for people's needs of information and knowledge, hidden in the data behind the knowledge, can be based on concepts, patterns, laws and rules and other forms of the form presented.
3. Predictive analysis capability.
Predictive analysis allows analysts to make predictive judgments based on visual analysis and data mining results. Predictive analysis is one of the areas of application that large data analysis will eventually be implemented, visualization analysis and data mining are the preliminary work, as long as the information in large data mining characteristics and links, we can establish a scientific data model, through the model to bring new data, so as to predict future data. As a subset of data mining, memory computational efficiency drives predictive analysis, which brings real-time analysis and insight, so that real-time transaction data flow can be processed more quickly. The data processing mode of real-time transaction can strengthen the monitoring of information by enterprises, and also facilitate the business management and information update circulation. In addition, the ability to predict and analyze large data can help enterprises to analyze future data information and evade risk effectively. After predictive analysis of large data, both individuals and businesses can better understand and manage large data than before.
Although the current trend of large data development is good, but large network data for storage systems, transport systems and computing system have put forward a lot of stringent requirements, the existing data center technology is difficult to meet the needs of large network data. Therefore, the progress and development of science and technology plays an important role in the support of large data, and the revolution of large data needs to consider the revolutionary reconstruction of IT industry. Network large data platform (including computing platform, transmission platform, storage platform, etc.) is the bottleneck in the network large data technology chain, especially the high-speed transmission of large network data, need revolutionary new technology. In addition, since in the large data age, any data is valuable, then these valuable data has become a selling point, leading to competition and infringement of the occurrence. In fact, as long as there is data, there must be security and privacy issues. With the advent of large data age, the increase of network data makes personal data face great risks and threats, therefore, the network needs to make more reasonable rules to ensure the security of network environment.