Variance, covariance and Correlation

Source: Internet
Author: User

Recently I have been studying the r language, which involves some functions involved in association analysis. Three of them are associated with each other:

VaR: Calculate the variance of a Variable

Cov: Calculates the covariance of two variables.

Cor: Calculate the correlation between two variables

I have learned all these concepts in theoretical schools, but I can't think of them at all. What's more, I don't know why these statistical concepts should exist. Now I have to search for them on DU Niang.PeriodHope,Variance,Standard Deviation,CovarianceAndCorrelation.


Expected Value

In probability theory and statistics, the expected value of a discrete random variable (or mathematical expectation, orMean Value(Expectation) is the probability of each possible result in the experiment multiplied by the sum of the results. In other words, the expected value is the equivalent "expected" average value calculated from repeated results in the same opportunity.


In statistics, when estimating the expected value of a variable, a commonly used method is to repeat the value of this variable, and then use the average value of the obtained data as the estimation of the expected value of this variable, average is generally usedμ.

In probability distribution, expectation and variance or standard deviation are important features of distribution.

In classical mechanics, the algorithm of the center of gravity of an object is very similar to the algorithm of the expected value.


Variance

Variance (VarianceAlso known as the amount of variation or the number of variations, is a concept in applied mathematics. In probability theory and statistics, the variance of a random variable describes its degree of discretization, that is, the distance from the variable to its expected value. The variance of a real random variable is also called itsSecond MomentOrSecond-order medium heart rate differenceIt happens to be its second-order accumulation. The square root of the arithmetic variance is the standard deviation of the random variable. The formula of variance can be simply described as the sum of all the observed values of the variable and the square of the expected difference, and then divided by the number of samples:

650) This. width = 650; "src =" http://upload.wikimedia.org/math/5/2/7/527ef4e6866926586a5960384837f439.png "alt =" \ operatorname {var} (x) = \ operatorname {e} \ left [(X-\ mu) ^ 2 \ right] "/>


Standard Deviation

Standard deviation (standard deviation), a mathematical symbol σ, is most often used in probability statistics as a measurement of the statistical degree of distribution (Statistical Dispersion. The standard deviation is defined as the arithmetic square root of variance, reflecting the degree of discretization between individuals in the group. The ratio of standard deviation to expected value is the standard deviation rate. The concept of standard deviation is introduced to statistics by Karl Pearson.

Application of standard deviation

In short, standard deviation is a measure of the degree to which a set of values are dispersed from the mean. A large standard deviation represents a large difference between most values and their average values. A small standard deviation represents that these values are closer to the average value.

For example, the average values of {0, 5, 9, 14} and {5, 6, 8, 9} in the two sets are 7, but the second set has a small standard deviation.

Standard deviation can be used as a measurement of uncertainty. For example, in physical science, when performing repetitive measurements, the standard deviation of the set of measurements represents the accuracy of these measurements. To determine whether the measured value meets the predicted value, the standard deviation of the measured value plays a decisive role. If the difference between the measured average value and the predicted value is too far (compared with the standard deviation value), the measured value and the predicted value are considered to be in conflict. This is easy to understand, because if the measured values are out of a certain range, you can reasonably infer whether the predicted values are correct.

The standard deviation is applied to investment and can be used as an indicator to measure the return stability. The larger the standard deviation value, the higher the risk because the return value is far from the previous average value and the return value is unstable. On the contrary, the smaller the standard deviation, the more stable the return and the less risky.

Regular distribution rules

In practical application, it is often considered that a group of data has a probability distribution similar to a normal distribution. If its hypothesis is correct, about 68% of the values are distributed within the range of one standard deviation from the average, and about 95% of the values are distributed within two standard deviations from the average, and about 99.7% values are distributed within the range of three standard deviations from the distance mean. It is called the "68-95-99.7 rule ".

650) This. width = 650; "src =" http://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/350px-Standard_deviation_diagram.svg.png "alt =" 350px-standard_deviation_diagram.svg.png "/>

Relationship between standard deviation and average value

The average and standard deviation of a set of data are often used as the reference basis. In a sense, if the mean value is used to consider the center of a value, the standard deviation is a "natural" Measure of the dispersion of statistics.


Covariance

Covariance (CovarianceUsed in probability theory and statistics to measure the overall error between the two variables. Variance is a special case of covariance, that is, when two variables are the same.


The expected values are 650) This. width = 650; "src =" http://upload.wikimedia.org/math/6/0/3/60390643adab2c7ca8f71dcc3f7f80a7.png "alt =" E (x) = \ mu "/> with 650) This. width = 650; "src =" http://upload.wikimedia.org/math/ B /f/6/bf680d71c501085ab7650d67e8a2950c.png "alt =" E (y) = \ nu "/> the covariance between two real number random variables X and Y is defined:

650) This. width = 650; "src =" http://upload.wikimedia.org/math/5/f/ B /5fbe23eb934ddb586d90484274f7125a.png "alt =" \ operatorname {cov} (x, y) = \ operatorname {e} (X-\ mu) (Y-\ nu) "/>

E is the expected value.

The covariance represents the total error of two variables, which is different from the variance that only represents the error of one variable. If the change trend of the two variables is the same, that is, if one of them is greater than its expected value and the other is greater than its expected value, the covariance between the two variables is a positive value. If one of the two variables has the opposite trend, that is, one is greater than its expected value, and the other is less than its expected value, the covariance between the two variables is a negative value. If X and Y are independent statistics, the covariance between them is 0.


Related

In probability theory and statistics, correlation (correlation, or correlation coefficient) shows the intensity and direction of the linear relationship between two random variables. In statistics, the correlation is used to measure the distance between two variables and each other. In this broad definition, many data-related coefficients are defined based on data characteristics.

Statistical correlation

The calculation process of correlation coefficient can be expressed as: convert each variable into a standard unit, and the average of the product is the correlation coefficient.

The relationship between the two variables can be intuitively expressed in a scatter chart. When they are closely grouped around a straight line, there is a strong correlation between the variables.

A scatter chart can be summarized using five statistics. All X values mean, all x values SD, all y values mean, all y values SD, correlation coefficient R.

Use the following formula to record the first variable as X, the second variable as Y, and the correlation coefficient as R:

R = average of [(X in standard unit) x (Y in standard unit )]


This article is from the "Yubo blog", please be sure to keep this source http://yubowang.blog.51cto.com/8929119/1552101

Variance, covariance and Correlation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.