Rank Correlation Coefficient)

Source: Internet
Author: User
Rank Correlation coefficientfrom Wikipedia, the free encyclopediajump: Navigation, Search

InStatistics,Spearman's rank correlation coefficientOrSpearman's rock, Named afterCharles rankAnd often denoted by the Greek letterP(Rock) orRS, IsNon-parametricMeasureCorrelation-That is, it assesses how well an arbitraryMonotonicFunction cocould describe the relationship between twoVariables, Without making any assumptions aboutFrequency DistributionOfVariables.



  • 1 Calculation
  • 2 example
  • 3 determining significance
  • 4 corresponsponanalysis Based on rank-1
  • 5 See also
  • 6 notes
  • 7 References
  • 8 External links

[ Edit] Calculation

In principle, P is simply a special case ofPearson product-Moment CoefficientIn which two sets of dataXIAndYIAre convertedRankings XIAndYIBefore calculating the coefficient.[1]In practice, however, a simpler procedure is normally used to calculate P.Raw scoresAre converted to ranks, and the differencesDIBetween the ranks of each observation on the two variables are calculated.

If there are no tied ranks, I. e.

Then P is given:


D I= X IY I= The Difference Between the ranks of corresponding values X IAnd Y I, And
N= The number of values in each data set (same for both sets ).

If tied ranks exist, classic Pearson'sCorrelation CoefficientBetween ranks has to be used instead of this formula:[1]

One has to assign the same rank to each of the equal values. It is an average of their positions in the ascending order of the values:

An example of averaging ranks

In the table below, notice how the rank of values that are the same is the mean of what their ranks wowould otherwise be.

VariableXI Position in the descending order RankXI
0.8 5 5
1.2 4
1.2 3
2.3 2 2
18 1 1

In this case we cannot use the specified cut formula (because of the tied ranks in the data) and must use the second, product-moment form.

[ Edit] Example

The raw data used in this example is shown below where we want to calculate the correlation betweenIQOf someone with the number of hours spent in frontTVPer week.

IQ,XI HoursTVPer week,YI
106 7
86 0
100 27
101 50
99 28
103 29
97 20
113 12
112 6
110 17

The first step is to sort this data by the first column. Next, two more columns are created (XIAndYI). The last of these columns (YI) Is assigned 1, 2, 3 ,...N, And then the data is sorted by the first original column (XI). The first of the newly created columns (XI) Is assigned 1, 2, 3 ,...N. Then a columnDIIs created to hold the differences between the two rank columns (XIAndYI). Finally another column shoshould be created. This is just ColumnDISquared.

After doing this process with the example data you shoshould end up with something like:

IQ,XI HoursTVPer week,YI RankXI RankYI DI
86 0 1 1 0 0
97 20 2 6 -4 16
99 28 3 8 -5 25
100 27 4 7 -3 9
101 50 5 10 -5 25
103 29 6 9 -3 9
106 7 7 3 4 16
110 17 8 5 3 9
112 6 9 2 7 49
113 12 10 4 6 36

The values in the column can now be added to find. The value of N is 10. So these values can now be substituted back into the equation,

Which evaluates to P' = − 0.175758 which shows that the correlation between IQ and hour spend between TV is really low (barely any correlation ). in the case of ties in the original values, this formula shocould not be used. instead, the Pearson correlation coefficient shoshould be calculated on the ranks (where ties are given ranks, as described abve ).

[ Edit] Determining significance

The modern approach to testing whether an observed value of P is significantly different from zero (we will always have 1 ≥p ≥−1) is to calculate the probability that it wocould be greater than or equal to the observed P, givenNull hypothesis, By usingPermutation Test. This approach is almost always superior to traditional methods, unlessData SetIs so large that computing power is not sufficient to generate permutations, or unless an algorithm for creating permutations that are logical under the null hypothesis is difficult to devise for the particle case (but usually these algorithms are straightforward ).

Although the permutation test is often trivial to perform for anyone with computing resources and programming experience, traditional methods for determining significance are still widely used. the most basic approach is to compare the observed P with published tables for varous levels of significance. this is a simple solution if the significance only needs to be known within a certain range or less than a certain value, as long as tables are available that specify the desired ranges. A reference to such a table is given below. however, generating these tables is computationally intensive and complicated mathematical tricks have been used over the years to generate tables for larger and larger sample sizes, so it is not practical for most people to extend existing tables.

An alternative approach available for sufficiently large sample sizes is an approximation toStudent's T-distributionWith Degrees of Freedom N-2. for sample sizes about 20, the variable

Has a student's T-distribution in the null case (zero correlation ). in the non-null case (I. e. to test whether an observed P' is significantly different from a theoretical value, or whether two observed P' s differ significantly) tests are much less powerful, thoughT-Distribution can again be used.

A Generalization of the rank coefficient is useful in the situation where there are three or more conditions, a number of subjects are all observed in each of them, and we predict that the observations will have a particle order. for example, a number of subjects might each be given three trials at the same task, and we predict that performance will improve from trial to trial. A test of the significance of the trend between conditions in this situation was developed by E. b. page and is usually referred toPage's trend testFor ordered alternatives.

[ Edit] Corresponsponanalysis Based on rank-1

ClassicCorresponsponanalysisIs a statistical method which gives a score to every value of two nominal variables, in this way that Pearson'sCorrelation CoefficientBetween them is maximized.

There exists an equivalent of this method, calledGrade corresponsponanalysis, Which maximizesKendall's Tau[2].

[ Edit] See also
Statistics Portal
  • Kendall Tau rank correlation coefficient
  • Rank Correlation
  • Chebyshev's sum Inequality,Rearrangement Inequality(These two articles may shed light on the mathematical properties of Rank's p .)
  • Pearson product-moment correlation coefficient, A similar correlation method that instead relies on the data being linearly correlated.

[ Edit] Notes
  1. ^A B Myers, Jerome L.; Arnold D. Well (2003 ).Research Design and statistical analysis, Second edition, Lawrence erlbaum, P.M. 508.ISBN 1, 0805840370. 
  2. ^ Kowalczyk, T.; pleszczy sans ska E., ruland F. (eds.) (2004 ).Grade models and methods for data analysis with applications for the analysis of data populations, Studies in fuzziness and soft computing vol. 151. Berlin Heidelberg New York: Springer Verlag.ISBN 1, 9783540211204.

[ Edit] References

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.