"Data Analysis R Language Practice" study notes the descriptive analysis of the data in the fifth chapter (Part I)

Source: Internet
Author: User

5.6 Multi-group data analysis and R implementation

5.6.1 statistical analysis of multiple groups of data

> Group=read.csv ("C:/Program files/rstudio/002582.csv") > Group=na.omit (Group) #忽略缺失样本 > Summary (Group)          Time          open           up to       2013/08/26:  1   Min.   : 13.6   Min.   : 13.9    2013/08/27:  1   1st qu.:18.2   1st qu.:18.5    2013/08/28:  1   median:19.6   median:19.9    2013/08/29:  1   Mean   : 20.2   Mean   : 20.6    2013/08/30:  1   3rd qu.:21.6   3rd qu.:22.0    2013/09/02:  1   Max.   : 35.0   Max.   : 37.0   (Other)    : 414                                      lowest           close      Min.   : 13.5   Min.   : 13.6   1st qu.:18.0   1st qu.:18.2   median:19.3   median:19.6   Mean   : 19.8   Mean   : 20.2   3rd qu.:21.3   3rd qu.:21.6   Max.   : 34.0   Max.   : 34.6  


The function var () is applied to multiple sets of data, and the resulting result is a covariance matrix, each of which is the covariance between the vectors. The same result is obtained using the instruction Cor (group).

> Options (digits=3) > var (group)     time Open highest minimum close time Na na Na na-na   open   na 13.2 13.8 12.6 13.3 Max   na 13.8 14.6 13.2 14.0 min   na 12.6 13.2 12.1 12.8 close   na 13.3 14.0 12.8 13.6

The size of the covariance reflects the correlation of variables to some extent, but it is also affected by the unit of measure of the variables themselves, so we also calculate the correlation coefficients to measure the linear correlation between variables. The correlation coefficient matrix is computed using the function Cor () in R.

Cor (x, y = NULL, use = "everything", method = C ("Pearson", "Kendall", "Spearman"))

where x, Y is the computed object, and y can be omitted when it is a data frame or list: use specifies how to handle a missing sample: method gives the calculation of which correlation coefficient: The default Pearson (Pearson) coefficient measures linear correlation, if the data is not linear, but monotonous, You can use the Kendall (Kendall) or Spearman (Spearman) correlation coefficients, which describe the rank correlation.

5.6.2 graphical analysis of multiple groups of data

The function lowess () in R is smoothed by weighted polynomial regression to fit a nonlinear curve, but it can only be applied to two-dimensional cases. A similar loess () is used to handle multidimensional situations.

Lowess (x, y = NULL, F = 2/3, iter = 3, Delta = 0.01 * diff (Range (x)))

X, y specifies two vectors: f is the smooth span, the higher the value, the higher the smoothness of the curve; the greater the number of iterations that the ITER control should perform, the higher the smoothness, the more accurate the value, but the faster the program will run with a smaller value.

> Attach (Group) > plot (highest ~ lowest) > lines (lowess (lowest, highest), col= "Red", lwd=2)

(2) Contour Map

Sometimes the data volume is very large, the data points on the scatter chart will be very concentrated, it is not easy to see the relationship between the variables or trends, which requires the use of two-dimensional contour graph to describe. The density function of two-dimensional data is estimated by using the function kde2d () in the package mass, and then the contour graph of density is drawn by the function contour (). If you do not want to draw the data labels on the graph, you can remove the parameter drawlabels=false. How to use the function kde2d ():

Kde2d (x, y, h, n = +, LIMs = C (range (x), range (y)))

where x, y is the data for the horizontal and vertical axes, and n specifies the number of grid points in each direction, which can be a scalar or a positive vector of length 2: The parameter LIMs represents the range of the longitudinal axis.

> Library (MASS) >?kde2d> a=kde2d (lowest, highest) > contour (a,col= "Blue", main= "contour plot")

(3) Matrix scatter plot

Graphs of multiple sets of data can also be shown with scatter plots, except that this is a matrix scatter plot. For a data frame, you can use the plot () command or pairs () to draw a matrix scatter plot directly in R.

> Pairs (Group)

(4) matrix diagram

When working with multiple sets of data, each group of data is often compared together, and Matplot () can place the scatter plots of each variable in the same drawing area.

> Matplot (group,type= "L", main= "Matplot")

(5) Box line diagram

> BoxPlot (group,cex.axis=.6)

(6) Star Chart (radar chart)

Stars (x, full = true, Scale = true, radius = True,labels = Dimnames (x) [[1]], locations = Null,nrow = NULL, Ncol = null, le n = 1,key.loc = null, Key.labels = Dimnames (x) [[2]],KEY.XPD = True,xlim = null, Ylim = null, Flip.labels = Null,draw.segme NTS = False,col.segments = 1:n.seg, Col.stars = NA, col.lines = Na,axes = FALSE, frame.plot = Axes,main = NULL, sub = NULL , Xlab = "", Ylab = "", CeX = 0.8, LWD = 0.25, lty = par ("lty"), xpd = False,mar = Pmin (Par ("mar"), 1.1+ C (2*axes+ (xlab! =) ""), 2*axes+ (Ylab! = ""), 1, 0), add = FALSE, plot = TRUE, ...)

(7) Line chart

Custom Function Required

(8) Harmonic curve

Custom Function Required

"Data Analysis R Language Practice" study notes the descriptive analysis of the data in the fifth chapter (Part I)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.