Statistical measures with R

Source: Internet
Author: User
Document directory
  • Mean, average
  • Median, median
  • Quartile, quartile, median, or quartile
  • Percentile, percentile
  • Range
  • Interquartile range, quartile distance
  • Box plot, box plot
  • Variance, variance
  • Standard deviation, standard deviation
  • Covariance, covariance
  • Correlation coefficient, correlation coefficient
  • Central Moment, center moment
  • Skewness, skewness
  • Kurtosis, peak state
Refer to r tutorial andexercise solutionmean, average

The mean of an observation variable isNumerical measureOf the central location of the data values. It is the sum of its data values divided by data count.

Hence, for a data sample of size N, its sample mean is defined as follows:

X xi = n XI
I = 1 ">

> Duration = faithful $ eruptions # The eruption durations
> Mean (duration) # apply the mean Function
[1] 3.4878

 

Median, median

The median of an observation variable is the value at the middle when the data is sorted in ascending order. It isOrdinal measureOf the central location of the data values.

> Duration = faithful $ eruptions # The eruption durations
> Median (duration) # apply the median Function
[1] 4

 

 

Quartile, quartile, median, or quartile

There are several quartiles of an observation variable.

The first quartile, or lower quartile, is the value that cuts off the first 25% of the data when it is sorted in ascending order.

The second quartile, or median, is the value that cuts off the first 50%.

The third quartile, or upper quartile, is the value that cuts off the first 75%.

> Duration = faithful $ eruptions # The eruption durations
> Quantile (duration) # apply the Quantile Function
0% 25% 50% 75% 100%
1.6000 2.1627 4.0000 4.4543 5.1000

 

Percentile, percentile

The Nth percentile of an observation variable is the value that cuts off the first n percent of the data values when it is sorted in ascending order.

Find the 32nd, 57th and 98th percentiles

> Duration = faithful $ eruptions # The eruption durations
> Quantile (duration, C (. 32,. 57,. 98 ))
32% 57% 98%
2.3952 4.1330 4.9330

 

Range

The range of an observation variable is the difference of its largest and smallest data values. It is a measure of how far apart the entire data spreads in value.

> Duration = faithful $ eruptions # The eruption durations
> MAX (duration) −min (duration) # apply the max and Min Functions
[1] 3.5

 

Interquartile range, quartile distance

The interquartile range of an observation variable is the difference of its upper and lower quartiles. It is a measure of how far apart the middle portion of Data spreads in value.

 

> Duration = faithful $ eruptions # The eruption durations
> Iqr (duration) # apply the iqr Function
[1] 2.2915

 

Box plot, box plot

The box plot of an observation variable is a graphical representation based on its quartiles, as well as its smallest and largest values. It attempts to provide a visual shape of the data distribution.

> Duration = faithful $ eruptions # The eruption durations
> Boxplot (duration, horizontal = true) # horizontal box plot

The box plot of the eruption duration is:

This figure uses graphs to represent quartile. The three sides of the box represent the first, second, and third quartile. The largest is the quartile, that is, the median.

0% 25% 50% 75% 100%
1.6000 2.1627 4.0000 4.4543 5.1000

The figure shows the data distribution...

 

Variance, variance

The variance is a numerical measure of how the data values is dispersed around the mean. In particle, the sample variance is defined:

S2 = -- 1 -- Σ (X-squared X) 2
N-1i = 1 I ">

 

> Duration = faithful $ eruptions # The eruption durations
> VAR (duration) # apply the VaR Function
[1] 1.3027

 

Standard deviation, standard deviation

The standard deviation of an observation variable isSquare RootOf its variance.

> Duration = faithful $ eruptions # The eruption durations
> SD (duration) # apply the SD Function
[1] 1.1414

 

Covariance, covariance

The covariance of two variables X and Y in a data sample measures how the two are Linearly related.Positive CovarianceWocould indicatesPositive Linear RelationshipBetween the variables, and a negative covariance wocould indicate the opposite.

The sample covariance is defined in terms of the Sample means:

S = -- 1 -- Σ (x −0000x) (Y −0000y)
Xy N −1 I = 1 I ">

> Duration = faithful $ eruptions # The eruption durations
> Waiting = faithful $ waiting # the waiting period
> Cov (duration, waiting) # apply the COV Function
[1] 13.978

 

Correlation coefficient, correlation coefficient

The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

Formally, the sample correlation coefficient is defined by the following formula, where SX and SY are the sample standard deviations, and sxy is the sample covariance.

Rxy = -- XY
Sxsy ">

If the correlation coefficient is close to 1, it wocould indicates that the variables are positively linearly related and the scatter plot falls almost along a straight line with positive slope.

For-1, it indicates that the variables are negatively linearly related and the scatter plot almost falls along a straight line with negative slope.

And for zero, it wowould indicates a weak linear relationship between the variables.

> Duration = faithful $ eruptions # The eruption durations
> Waiting = faithful $ waiting # the waiting period
> Cor (duration, waiting) # apply the CoR Function
[1] 0.90081

It indicates that the eruption time is proportional to the wait time. The longer the wait time, the longer the spray...

 

Covariance and Correlation Coefficient

1. covariance is a statistical indicator used to measure the risk of a specific investment project in a portfolio relative to another investment project. The common point is the degree of return on the two projects in the portfolio, positive numbers indicate that one of the two projects has a higher return rate, and the other has a higher return rate, which changes in the same direction. If it is a negative number, one goes up and the other goes down, indicating that the return rate changes in the opposite direction. The larger the absolute value of covariance, the closer the two types of assets are. The smaller the absolute value, the more distant the two types of assets are.
2. Because the covariance is difficult to understand, the covariance is divided by the product of the standard deviation of the ROI of the two investment schemes, and a number with the same property as the covariance but not quantified is obtained. This number is the correlation coefficient. The formula is correlation coefficient = covariance/product of two project standard deviations.

 

Central Moment, center moment

The Kth Central Moment (or moment about the mean) of a data sample is:

Mk = N (XI −0000x)
I = 1 ">

For example,Second Central MomentOf a population is itsVariance.

> Library (moments) # load the moments package
> Duration = faithful $ eruptions # The eruption durations
> Moment (duration, order = 3, central = true)
[1] −0.6149

 

Skewness, skewness

The skewness of a Data population is defined by the following formula, where μ2 and μ3 are the second and third central moments.

Intuitively, The skewness is a measureWannry.

Negative skewnessIndicates that the mean of the data values is less than the median, and the data distribution is left-skewed;

Positive skewnessWocould indicates that the mean of the data values is larger than the median, and the data distribution is right-skewed. of course, this rule applies only to unimodal distributions whose histograms have a single peak.

> Library (moments) # load the moments package
> Duration = faithful $ eruptions # The eruption durations
> Skewness (duration) # apply the skewness Function
[1]-0.41584

 

Kurtosis, peak state

The kurtosis of a univariate population is defined by the following formula, where μ2 and μ4 are the second and fourth central moments.

Intuitively, the kurtosis is a measure ofPeakedness of the Data Distribution.

Negative kurtosis wowould indicates a flat distribution, which is said to be platykurtic (flat top ).

Positive kurtosis wowould indicates a peaked distribution, which is said to be leptokurtic (TIP ).

Finally, the normal distribution has zero kurtosis, and is said to be mesokurtic (normal peak ).

> Library (moments) # load the moments package
> Duration = faithful $ eruptions # The eruption durations
> Kurtosis (duration)-3 # apply the kurtosis Function
[1]-1.5006

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.