The specific analysis of the correlation coefficient of "turn" Pearson,spearman,kendall

Source: Internet
Author: User

The correlation coefficient of measurement correlation is many, the calculation method and characteristics of various parameters are different.

Related indicators for continuous variables:

At this time, the correlation coefficient of product difference is generally used, also called Pearson Correlation coefficient, and the correlation coefficient is only applicable when two variables are linearly correlated. Its value is between -1~1, when the correlation between the two variables reached the maximum, the scatter is a straight line when the value is 1 or 1, the sign indicates the relevant direction, if the two variables are completely unrelated, then the value is zero.

As a parameter method, the correlation analysis has certain applicable conditions, when the data can not meet these conditions, the analyst can consider using Spearman grade correlation coefficient to solve the problem.

Related indicators for ordered variables:

The high correlation/consistency of the so-called ordered hierarchical data refers to the high level of column variables with high row variable levels, and vice versa. If the row variable level is high and the column variable level is low, it is called an inconsistency.

Simple Correlation Analysis:

When the divergence of two consecutive variables shows a straight trend in the scatter plot, it can be considered that there is a linear correlation trend, also known as a simple correlation trend. Pearson correlation coefficient, also called product correlation coefficient, is a commonly used index to quantitatively describe the degree of linear correlation.

Applicable conditions for the correlation coefficient of the product difference:

The first question to be considered in the correlation analysis is whether the two variables may have related relations, if the positive conclusion, it is necessary to carry out the next quantitative analysis. It is also important to note the following issues:

1, the product difference correlation coefficient is suitable for the linear correlation situation, for the curve correlation and so on more complex situation, the product difference correlation coefficient size does not represent the correlation strength.

2. The extreme values in the sample have great influence on the correlation coefficient of the product difference, so we should consider and deal with it carefully, if necessary, we can remove it or change the variable to avoid the conclusion that the error is caused by one or two numerical values.

3, the correlation coefficient of the product difference requires the corresponding variable is a two-variable normal distribution, note that the two-variable normal distribution is not simply required to obey the normal distribution of the X-variable and y-variables, but a combined two-variable normal distribution is required.

The above requirements, the first two of the most stringent requirements, the third is more lenient, the result of the violation of the coefficient is relatively robust.

Spearman correlation coefficient, also known as rank correlation coefficient, makes use of two variable rank size as linear correlation analysis, and does not require the distribution of original variable, which belongs to Nonparametric statistic method. Therefore, its application scope is much wider than the Pearson correlation coefficient. The Spearman correlation coefficient can be calculated even if the original data is a hierarchical data. The Spearman correlation coefficients can also be calculated for the data subject to Pearson correlation coefficients, but the statistical efficiency is lower than the Pearson correlation coefficient (it is not easy to detect the correlation between the two in fact).

Kendall's tau-b grade correlation coefficient is an indicator used to reflect the correlation of categorical variables, and is suitable for the case where two variables are ordered.

One common denominator of simple correlation and partial correlation is that there should be a certain degree of understanding of the data background being analyzed. In this case, the calculation of the correlation coefficient of the product difference is further carried out to confirm the correlation at a quantitative level. Similarly, the calculation of the partial correlation coefficient is the same situation, but also on the basis of the calculation of the correlation coefficient of the difference of the other factors to consider the impact. But sometimes encounter a situation, before the analysis of the data represented by the professional background knowledge is not sufficient, itself belongs to exploratory research, at this time often need to first on the various indicators or cases of the difference, similarity degree to investigate, in order to first have a preliminary understanding of the data, Then consider how to conduct in-depth analysis based on the results.

The distinces process can be used to calculate the distance (or similarity) between records (or variables), and depending on the different types of variables, there can be many distances and similarity measurement indicators for the user to select. But because this module is only a pre-analysis process, so the distance analysis does not give the commonly used P-value, but only give the distance between the variables/records, for the user to determine the similarity of their own.

The distinces process calculates the distance measurement indicator or similarity measurement indicator, which can be toggled in the main dialog box.

Distance measurement indicators, depending on the data type, distance measurement indicators are different. It is divided into continuous variables, frequency tables data and two categorical variables three kinds.

The similarity measurement index time is the aforementioned correlation Analysis Index system, only is more detailed, mainly divides into the dosage data and two categorical variable two kinds.

Correlation and regression describes the different aspects of the relationship between the two variables, simple regression analysis is to find the dependent variable value changes with the natural quantity of the linear trend, and the scatter plot to find such a line, the corresponding equation is called linear regression equation.

It is more accurate to explain the relationship between the two variables by the regression equation. In addition to describing the relationship between the two variables, the regression equation can be predicted and controlled.

Statistical inference of unordered categorical variables: x2 test

It is mainly used to test whether the distribution of a disordered categorical variable is consistent between the two groups or groups. It can also be used to test whether the probability of the occurrence of a categorical variable equals the specified probability, and whether the distribution of a continuous variable conforms to a certain theoretical distribution. Its main purpose:

1. Test whether the distribution of a continuous variable is consistent with a certain theoretical distribution.

2. Test whether the probability of occurrence of a categorical variable is equal to the set probability.

3, check whether a two categorical variables are independent of each other.

4, the test control the role of some or some of the classification factors, the other two categorical variables are independent of each other.

5, check whether the results of a two methods are consistent.

The specific analysis of the correlation coefficient of "turn" Pearson,spearman,kendall

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.