"Data Analysis R Language Practice" study notes the descriptive analysis of the data in the fifth chapter (Part I)

Last Update:2015-05-19 Source: Internet

Author: User

Tags types of functions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

5.1R built-in distribution

Distribution is the core and most important way to describe a sample data. R incorporates a number of commonly used statistical distributions and provides four types of functions: probability density function (density), cumulative distribution function (probability), Division number (quantile), and pseudo-random number (random). In R, each of the 4 items is represented by D,p,q,r, followed by the English name or abbreviation of the distribution.

Analysis of trends in episode 5.2

Measurement of 5.2.1 concentration trend

The indicators that describe the trend of statistical distribution are mainly average, median, and majority, also known as the average indicator. The main functions of these indicators include:

reflect the concentration trend and general level of the distribution of the overall units variables;

It is easy to compare the level of similar phenomena between different units;

It is convenient to compare the development trend or law of similar phenomena in different periods;

A dependency that is used to analyze the question of phenomena.

5.2.2 R Language Implementation

The function Summary () calculates the five-and mean-values of a set of data.

>summary (Cars$speed)

Min.1stqu.medianmean3rdqu.max.

4.012.015.015.419.025.0

5.3 Analysis of discrete trends

Measure of 5.3.1 Discrete trend

The degree of dispersion of the data is mainly measured by the statistical index of the difference, four difference, average difference, variance, standard, etc. In the actual analysis, the dispersion degree analysis mainly has the following functions:

To measure the representativeness of average indicators;

Reflect the balance of social and economic activities;

To study the situation of the distribution of the overall flag value deviating from normal state;

A basic indicator of statistical analysis such as sampling inference.

5.3.2 R Language Implementation

The extreme difference can be calculated from the function range (). Give a minimum and a maximum of two points, then subtract from it:

>m=range (Cars$speed)

>M[2]-M[1]

[1]21

The four-point difference also requires manual calculation, the more convenient way is to directly use the function Fivenum ()

>q=fivenum (Cars$speed)

>Q[4]-Q[2]

[1]7

The variance function in R and the standard deviation function are VAR () and SD () R also have a special function, that is, the dispersion mad (), which is used to calculate the median absolute deviation, with asymptotically normal consistency.

5.4 Analysis of data distribution

Measurement of distribution of 5.4.1

(1) Degree of skewness

(2) Peak degree

5.4.2R Language Implementation

In package timedate (or directly loading the Fbasics package), there are functions that directly calculate skewness and kurtosis coefficients for skewness () and kurtosis ()

>skewness (Cars$speed)

[1]-0.1105533

attr (, "method")

[1] "moment"

>kurtosis (Cars$speed)

[1]-0.6730924

attr (, "method")

[1] "excess"

5.5 Graphical analysis and R implementation

5.5.1 histogram and density function graphs

>hist (cars$speed,breaks=50,prob=t) #参数breaks设1直方图的组距, prob=t specifies the density histogram

>lines (Density (cars$speed), col= ' Blue ') #用核密度估计函数density (), plot the density graph

5.5.2 QQ Map

QQ graphs are used to visually verify whether a set of data comes from a distribution, or to verify that two sets of data are from the same family. In the teaching and software commonly used QQ Scatter chart to check whether the data from the normal distribution. QQ graph is the normal quantile-quantile graph, the horizontal axis is the theoretical value, the longitudinal shaft is the sample value, if the sample data approximate to obey the normal distribution, then the QQ map scatter should be evenly distributed around the line y=xσ+μ, the slope of the line is normally distributed

Standard deviation J, intercept is mean-value knife.

>qqnorm (Cars$speed)

>qqline (Cars$speed)

5.5.3 stem and leaf diagram

Using function stem () to draw stem and leaf plots in R

Stem (x,scale=1,width=80,atom=1e-08)

where x is the data vector, the scale controls the length of the stem and leaf graph, width controls the drawing's widths, and atom is the tolerance.

> Set.seed (111)

> S=sample (cars$speed,25)

> Stem (s)

  The decimal point was 1 digit (s) to the right of the |

  0 | 44

  0 | 779

  1 | 011233344

  1 | 5557889

  2 | 0344

5.5.4 Box Line diagram

> BoxPlot (cars$speed)

5.5.5 Experience Distribution Map

The function ecdf () in R gives the empirical distribution of the sample, plotted through plot ()

ECDF (x)

Plot (x,...,ylab= "Fn (x)", verticals=false,col.01line= "Gray70", peh=19)

"Data Analysis R Language Practice" study notes the descriptive analysis of the data in the fifth chapter (Part I)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More