Data analysis: refers to the process of analyzing a large amount of data collected with appropriate statistical analysis methods, extracting useful information and forming conclusions, and carrying out detailed research and summary of the data. This process is also the supporting process of the quality management system. In practice, data analysis can help people make judgments in order to take appropriate action. The mathematical foundation of data analysis was established in the early 20th century, but it was not until the advent of computers that actual operations became possible, and data analysis was promoted. Data analysis is the product of the combination of mathematics and computer science.
Data mining: Also translated into data exploration and data mining. It is a step in database knowledge discovery. Data mining generally refers to the process of searching for information hidden in a large amount of data through algorithms. Data mining is usually related to computer science, and achieves these goals through many methods such as statistics, online analytical processing, information retrieval, machine learning, expert systems (relying on past rules of experience), and pattern recognition.
To distinguish the two in more detail, you can understand from the following aspects:
Data analysis can be divided into broad data analysis and narrow data analysis. Broad data analysis includes narrow data analysis and data mining. We often say that data analysis refers to narrow data analysis.
Data analysis (narrow sense):
(1) Definition: In simple terms, data analysis is to analyze data. Professionally speaking, data analysis refers to the use of appropriate statistical analysis methods and tools to process and analyze the collected data according to the purpose of the analysis, extract valuable information, and play the role of data.
(2) Role: It mainly realizes three major roles: status quo analysis, cause analysis, and predictive analysis (quantitative). The goal of data analysis is clear, make assumptions first, and then verify whether the assumptions are correct through data analysis, so as to obtain corresponding conclusions.
(3) Methods: Common analysis methods such as comparative analysis, group analysis, cross analysis, and regression analysis are mainly used.
(4) Results: Data analysis generally obtains an indicator statistic result, such as the sum and average value. These indicator data need to be interpreted in conjunction with the business in order to play the value and role of the data.
Data mining:
(1) Definition: Data mining refers to the process of mining unknown and valuable information and knowledge from a large amount of data through statistics, artificial intelligence, machine learning and other methods.
(2) Function: Data mining focuses on solving four types of problems: classification, clustering, association, and prediction (quantitative and qualitative). The focus of data mining is to find unknown patterns and laws; as we often say about data mining cases: beer With diapers, condoms and chocolate, this is unknown in advance, but it is very valuable information.
(3) Methods: Statistics, artificial intelligence, machine learning and other methods such as decision trees, neural networks, association rules, cluster analysis, etc. are mainly used for mining.
(4) Results: output models or rules, and the corresponding model scores or labels can be obtained, such as churn probability value, sum score, similarity, predicted value, etc. labels such as high, medium and low value users, churn and non-churn, good credit Medium difference, etc.
Taken together, the essence of data analysis (narrow sense) and data mining are the same. Both discover knowledge about the business (valuable information) from the data to help business operations, improve products, and help companies make better decisions Therefore, data analysis (narrow sense) and data mining constitute broad data analysis.
Data analysis is an operation method for data. Or algorithm. The goal is to sort, filter, and process the data in accordance with the a priori constraints, thereby obtaining information.
Data mining is the analysis of the value of information after data analysis.
And data analysis and data mining are even recursive. The result of data analysis is information, which is used as data to be mined by the data. And data mining uses data analysis methods, and it starts and ends.
The biggest difference between data analysis and data mining is that data analysis is based on the input data, and the data is processed through a priori constraints, but not based on how the conclusion is adjusted. For example, you need image recognition, which belongs to data analysis. You have to analyze the face. The way the data passes your a priori is to come out with a cat face. There is no problem with your data analysis. You need to bear the results silently and respect the facts. Therefore, the focus of data analysis is on the validity, authenticity and correctness of prior constraints.
Data mining is different. Data mining is the acquisition of the value of information. Naturalization does not consider the data itself, but whether the data is valuable. From this, a batch of data, you try to do different value mining on it. Evaluation is data mining. The biggest feature of comparing data analysis at this time is that you need to adjust your different a priori constraints and analyze the data again. The a priori constraints are not specific to the characteristics of the data source itself, such as the signal-to-noise ratio processing algorithm. Rather, it is a valuable content that you expect to get, to do a priori constraints. According to the observation, whether the data has correct feedback according to this constraint.
Big data is massive data mining on the Internet, and data mining is more about data mining for internal enterprise industries. Data analysis is to make targeted analysis and diagnosis. Big data needs to analyze trends and development. Data The main discoveries are problems and diagnosis.