Keywordsdata analysis data analysis method data analysis process
Data analysts will use different data analysis methods according to different variables in their work projects. Our commonly used
data analysis methods include cluster analysis, factor analysis, correlation analysis, correspondence analysis, regression analysis, and analysis of variance. If you want to use these methods proficiently , First need to understand the definition of these methods.
Cluster analysis refers to the analysis process of grouping a collection of physical or abstract objects into multiple classes composed of similar objects. Clustering is a process of classifying data into different classes or clusters, so objects in the same cluster have great similarities, while objects between different clusters have great differences. Cluster analysis is an exploratory analysis. In the classification process, people do not need to give a classification standard in advance. Cluster analysis can start from sample data and automatically classify. Different methods used in cluster analysis often lead to different conclusions. Different researchers perform cluster analysis on the same set of data, and the number of clusters obtained may not be consistent.
Factor analysis refers to the study of statistical techniques to extract common factors from variable groups. Factor analysis is to find internal connections from a large amount of data to reduce the difficulty of decision-making.
There are about 10 kinds of factor analysis methods, such as center of gravity method, image analysis method, maximum likelihood solution, least square method, Alfa factor method, Rao typical factor method and so on. These methods are mostly approximate methods in nature, based on the correlation coefficient matrix. The difference is that the values on the diagonal of the correlation coefficient matrix are estimated using different commonality □2. In sociological research, factor analysis often uses an iterative method based on principal component analysis.
3. Related analysis
Correlation analysis, correlation analysis is to study whether there is a certain dependency between phenomena, and to explore the relevant direction and degree of correlation of the specific dependent phenomena. Correlation is a non-deterministic relationship. For example, if X and Y are used to record the height and weight of a person, or the amount of fertilizer per hectare and wheat yield per hectare respectively, then X and Y are obviously related, but there is no To the extent that one of them can accurately determine the other, this is the correlation.
4. Correspondence analysis
Correspondence analysis is also called association analysis and R-Q factor analysis, which reveals the links between variables by analyzing the interactive summary table composed of qualitative variables. It can reveal the differences between the categories of the same variable, and the correspondence between the categories of different variables. The basic idea of correspondence analysis is to express the proportional structure of each element in the rows and columns of an associative list in the form of points in a lower-dimensional space.
5. Regression analysis
Study the statistical analysis method of the dependence of a random variable Y on another (X) or a group of variables. Regression analysis is a statistical analysis method to determine the quantitative relationship between two or more variables. It is widely used. Regression analysis can be divided into unary regression analysis and multiple regression analysis according to the number of independent variables involved; according to the type of relationship between independent variables and dependent variables, it can be divided into linear regression analysis and nonlinear regression analysis.
6. Analysis of variance
Also known as "variability analysis" or "F test", it was invented by R.A. Fisher and used to test the significance of the difference between the means of two or more samples. Due to the influence of various factors, the data obtained from the research fluctuates. The causes of fluctuations can be divided into two categories, one is uncontrollable random factors, and the other is controllable factors that affect the results imposed in the research. The analysis of variance starts with the variance of the observed variables, and studies which of the many control variables are variables that have a significant impact on the observed variables.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.