Study on the correlation coefficient of Pearson's accumulated moment

Source: Internet
Author: User
Study on the correlation coefficient of Pearson's accumulated moment

Pearson correlation coefficients (Pearson Correlation coefficient) are often used when doing similarity calculations, so how do you understand the coefficients? What is its mathematical nature and meaning?

Pearson correlation coefficient understanding has two angles

First, take the high school textbook as an example, the two sets of data is processed by the Z-fraction, then the product of the two sets of data and divided by the number of samples.

The Z-score generally represents the distance from the center point of the data in the normal distribution. equals the variable minus the average and dividing by the standard deviation. The standard deviation is equal to the sum of squares of the variable minus the average and divided by the number of samples. So we can refine the formula to:

The following is a Python implementation:

?
123456789101112131415161718192021222324252627282930313233 frommath importsqrt#返回p1和p2的皮尔逊相关系数def sim_pearson(prefs,p1,p2):    #得到双方曾评价过的物品列表    si ={}    foritem inprefs[p1]:        if item inprefs[p2]:            si[item] =1    #得到列表元素个数    =len(si)          #如果两者没有共同之处,则返回1    ifnotn:        return1         #对所有偏好求和    sum1 =sum([perfs[p1][it] forit insi])    sum2 =sum([perfs[p2][it] forit in si])         #求平方和    sum1Sq =sum([pow(prefs[p1][it],2forit insi])    sum2Sq =sum([pow(prefs[p2][it],2forit insi])          #求乘积之和    pSum =sum([prefs[p1][it] *prefs[p2][it] forit insi])          #计算皮尔逊评价值    num =pSum -(sum1 *sum2 /2)    den = sqrt((sum1Sq -pow(sum1,2/n) *(sum2Sq -pow((sum2,2/2)))    ifnotden:        return0    =num/den    returnr

Second, according to the university's linear mathematics level to understand, it is more complex can be seen as two sets of data, the cosine of the vector angle.

For data that is not centralized, the correlation coefficient is the same as the cosine of the angle of two possible regression lines Y=GX (x) and X=gy (y).

1, N numeric rows (x1, x2, x3,... xn) called n-dimensional vectors précis-writers to uppercase X

| X| =√X12+X22+X32+...+XN2 is defined as the modulus of Vector X, and the inner product of vector x and y is: x y=x1*y1+x2*y2+. Xn*yn

2. The vector angle cosine of vector x and y is calculated according to the following formula:

X Y

cosθ=

| X|x| y|

3, the vector angle of the cosine approximately 1 indicates the higher the similarity of the two vectors.

The following is a Python implementation:

?
123 importmath,numpydef cosine_distance(u, v):    return numpy.dot(u, v) /(math.sqrt(numpy.dot(u, u)) *math.sqrt(numpy.dot(v, v)))

From the above explanations, Pearson's related constraints can also be understood:

      • Linear relationship between two variables

      • variable is a continuous variable

      • The variables are normally distributed, and the two-yuan distribution also conforms to the normal distribution

      • Two variables Independent

In practice statistics generally only two coefficients are output, one is the correlation coefficient is calculated the correlation coefficient size (between 1 to 1), and the other is an independent sample test coefficient, used to verify the sample consistency.

Study on the correlation coefficient of Pearson's accumulated moment

Large-Scale Price Reduction
  • 59% Max. and 23% Avg.
  • Price Reduction for Core Products
  • Price Reduction in Multiple Regions
undefined. /
Connect with us on Discord
  • Secure, anonymous group chat without disturbance
  • Stay updated on campaigns, new products, and more
  • Support for all your questions
undefined. /
Free Tier
  • Start free from ECS to Big Data
  • Get Started in 3 Simple Steps
  • Try ECS t5 1C1G
undefined. /

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.