Statistical correlation coefficient (3) -- Kendall rank (Kendel level) Correlation Coefficient and Matlab implementation

Source: Internet
Author: User
Kendall rank (Kendel level) correlation coefficient 1. Introduction

In statistics, the Kendel correlation coefficient is named after Maurice Kendall, and is often expressed by the Greek letter Tau (Tau. The kender correlation coefficient is a statistical value used to measure the correlation between two random variables. A kender test is a non-parametric hypothesis test that uses the calculated correlation coefficient to test the statistical dependence of two random variables. The value range of the Kendel correlation coefficient is-1 to 1. When the Tau value is 1, it indicates that two random variables have the same level correlation. When the Tau value is-1, it indicates that two random variables have a completely opposite hierarchical correlation. When the value of σ is 0, it indicates that the two random variables are independent of each other.

 

Assume that the two random variables are X and Y (or can be seen as two sets), their number of elements is N, and the two random variables take the I (1 <= I <= N) values are expressed by Xi and Yi respectively. X and Y form an element pair set XY, which contains (XI, Yi) (1 <= I <= N ). When the ranking of any two elements (XI, Yi) in the XY set is the same as that of (XJ, YJ) (that is, when case 1 or 2 occurs; Case 1: xi> XJ and Yi> YJ, Case 2: xi <XJ and Yi <YJ), the two elements are considered to be consistent. When case 3 or 4 occurs (Case 3: xi> XJ and Yi <YJ, case 4: xi <XJ and Yi> YJ ), these two elements are considered to be inconsistent. In case 5 or 6 (Case 5: xi = XJ, Case 6: Yi = YJ), the two elements are neither consistent nor inconsistent.

 

Here we have three formulas to calculate the value of the kender correlation coefficient.

 

Formula 1:

C indicates that XY has consistent element logarithm (two elements are a pair); D indicates that XY has inconsistent element logarithm.

Note: This formula is only applicable to the absence of identical elements in the X and Y sets (each element in the set is unique ).

 

Formula 2:

Note: This formula is applicable to the case where the same element exists in the set X or Y (of course, if the same element exists in both X and Y, formula 2 is equivalent to Formula 1 ).

C and D are the same as Formula 1;

;;

N1 and N2 are calculated for the X and Y sets respectively. Now, we take N1 as an example to illustrate the origin of N1 (the calculation of N2 can be similar ):

Combine the same elements in X into small sets, and s indicates the number of small sets in set X (for example, X contains elements: 1 2 3 4 3 2, in this example, the value of S is 2, because only 2 and 3 have the same element.) UI indicates the number of elements contained in the I-th small set. N2 is calculated based on the Y set.

 

Formula 3:

Note: This formula does not consider the effect of the same element in the set X or Y on the final statistical value. The Formula 3 is only applicable to the calculation of the correlation coefficient between the random variables X and Y in the table (which will be described below ).

Parameter m will be introduced later.

 

All of the above calculate the Kendel correlation coefficient based on the random variables expressed in the set. The following describes how to calculate the Kendel correlation coefficient based on the random variables represented in the table.

 

Generally, people make the values of two random variables into a table. For example, there are 10 samples, perform Two metric tests X and Y for each sample (the values of index X and Y are from 1 to 3 ). The following two-dimensional table (table 1) is obtained based on the values of x and y in the sample ):

From table 1, we can see that X and Y can be expressed as collections:

X = {1, 1, 2, 2, 2, 2, 2, 3, 3 };

Y = {1, 2, 1, 1, 2, 2, 3, 2, 3 };

After obtaining the set form of X and Y, you can use one or two of the above formulas to calculate the Kendel coefficient of X and Y (note the applicable conditions of Formula 1 and two ).

Of course, given the set form of X and Y, it is easy to get their table form.

 

Note that formula 2 can also be used to calculate the Kendel correlation coefficient of two-dimensional variables in table form. However, it is generally used to calculate the Kendel correlation coefficient of two-dimensional variables represented by square tables, formula 3 is only used to calculate the Kendall correlation coefficient of two-dimensional variables represented by a rectangular table. The meaning of m in the formula is given here. M indicates the number of rows in a rectangular table and the smaller number of columns. The number of rows and columns in Table 1 are three.

 

2. Applicability

The Kendel correlation coefficient has the same requirements for data conditions as the Spielmann correlation coefficient. For more information, see statistical correlation coefficient (2) -- Rank) correlation coefficient and the requirements on data conditions of the Pearson correlation coefficient introduced in MATLAB implementation.

 

3. MATLAB implementation

Source code 1:

MATLAB Implementation of kender correlation coefficient (based on formula 2)

Function coeff = mykendall (x, y) <br/> % this function is used to calculate the correlation coefficient of kender level <br/>%< br/> % input: <br/> % x: Input numerical sequence <br/> % Y: Input numerical sequence <br/> % output: <br/> % coeff: Correlation Coefficient of two input numeric sequences X and Y </P> <p> If length (x )~ = Length (y) <br/> error ('the dimension of the two numeric series is not equal '); <br/> return; <br/> end </P> <p> % changes X to a row sequence (if X is already a row sequence, no change is made) <br/> If size (X, 1 )~ = 1 <br/> X = x'; <br/> end <br/> % change Y to a row sequence (if y is already a row sequence, no change is made) <br/> If size (Y, 1 )~ = 1 <br/> Y = y'; <br/> end </P> <p> N = length (X ); % get the length of the sequence <br/> xy = [X; y]; % get the merged sequence <br/> C = 0; % consistent array logarithm <br/> D = 0; % inconsistent array logarithm <br/> n1 = 0; % Total combination logarithm of the same element in set X <br/> n2 = 0; % Total combination logarithm of the same element in set Y <br/> N3 = 0; % Total logarithm of the merged sequence XY <br/> xpair = ones (1, N ); % Number of elements in each subset of the same element in set X <br/> ypair = ones (1, N ); % Number of elements in each subset of set y composed of the same elements <br/> cont = 0; % used for counting </P> <p> % calculate C and D <br/> for I = 1: n-1 <br/> for J = I + 1: n <br/> If ABS (Sum (XY (:, I )~ = Xy (:, j) = 2 <br/> switch ABS (sum (XY (:, I)> XY (:, j ))) <br/> case 0 <br/> C = C + 1; <br/> case 1 <br/> d = d + 1; <br/> case 2 <br/> C = C + 1; <br/> end </P> <p> % calculate the values of each element in xpair. <br/> while length (X) ~ = 0 <br/> cont = cont + 1; <br/> Index = find (x = x (1); <br/> xpair (cont) = length (INDEX); <br/> X (INDEX) = []; <br/> end <br/> % calculate the value of each element in ypair <br/> cont = 0; <br/> while length (y )~ = 0 <br/> cont = cont + 1; <br/> Index = find (y = Y (1); <br/> ypair (cont) = length (INDEX); <br/> Y (INDEX) = []; <br/> end </P> <p> % calculates the values of N1, N2, and N3 <br/> n1 = sum (0.5 * (xpair. * (xpair-1); <br/> n2 = sum (0.5 * (ypair. * (ypair-1); <br/> N3 = 0.5 * n * (N-1); </P> <p> coeff = (c-d) /SQRT (N3-N1) * (N3-N2); </P> <p> end % function mykendall ends

 

Source code 2:

Use existing functions in MATLAB to calculate the kender Correlation Coefficient

Coeff = Corr (X, Y, 'type', 'endall ');

Note: when using the function provided by MATLAB to calculate the kender correlation coefficient, make sure that X and Y are column vectors. The function provided by MATLAB is used to calculate the kender correlation coefficient of the sequence through formula 2.

 

Here there is another formula that is not used to calculate the correlation coefficient of kender (only applicable to the case where the same element does not exist in the set X and Y, but actually it is equal to the formula ), see references (3 ).

 

4. References

(1), http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient

(2), http://www.unesco.org/webworld/idams/advguide/Chapt4_2.htm

(3), http://www.wikidoc.org/index.php/Kendall_tau_rank_correlation_coefficient

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.