5.6 Multi-group data analysis and R implementation
5.6.1 statistical analysis of multiple groups of data
> Group=read.csv ("C:/Program files/rstudio/002582.csv") > Group=na.omit (Group) #忽略缺失样本 > Summary (Group) Time open up to 2013/08/26: 1 Min. : 13.6 Min. : 13.9 2013/08/27: 1 1st qu.:18.2 1st qu.:18.5 2013/08/28: 1 median:19.6 median:19.9 2013/08/29: 1 Mean : 20.2 Mean : 20.6 2013/08/30: 1 3rd qu.:21.6 3rd qu.:22.0 2013/09/02: 1 Max. : 35.0 Max. : 37.0 (Other) : 414 lowest close Min. : 13.5 Min. : 13.6 1st qu.:18.0 1st qu.:18.2 median:19.3 median:19.6 Mean : 19.8 Mean : 20.2 3rd qu.:21.3 3rd qu.:21.6 Max. : 34.0 Max. : 34.6
The function var () is applied to multiple sets of data, and the resulting result is a covariance matrix, each of which is the covariance between the vectors. The same result is obtained using the instruction Cor (group).
> Options (digits=3) > var (group) time Open highest minimum close time Na na Na na-na open na 13.2 13.8 12.6 13.3 Max na 13.8 14.6 13.2 14.0 min na 12.6 13.2 12.1 12.8 close na 13.3 14.0 12.8 13.6
The size of the covariance reflects the correlation of variables to some extent, but it is also affected by the unit of measure of the variables themselves, so we also calculate the correlation coefficients to measure the linear correlation between variables. The correlation coefficient matrix is computed using the function Cor () in R.
Cor (x, y = NULL, use = "everything", method = C ("Pearson", "Kendall", "Spearman"))
where x, Y is the computed object, and y can be omitted when it is a data frame or list: use specifies how to handle a missing sample: method gives the calculation of which correlation coefficient: The default Pearson (Pearson) coefficient measures linear correlation, if the data is not linear, but monotonous, You can use the Kendall (Kendall) or Spearman (Spearman) correlation coefficients, which describe the rank correlation.
5.6.2 graphical analysis of multiple groups of data
The function lowess () in R is smoothed by weighted polynomial regression to fit a nonlinear curve, but it can only be applied to two-dimensional cases. A similar loess () is used to handle multidimensional situations.
Lowess (x, y = NULL, F = 2/3, iter = 3, Delta = 0.01 * diff (Range (x)))
X, y specifies two vectors: f is the smooth span, the higher the value, the higher the smoothness of the curve; the greater the number of iterations that the ITER control should perform, the higher the smoothness, the more accurate the value, but the faster the program will run with a smaller value.
> Attach (Group) > plot (highest ~ lowest) > lines (lowess (lowest, highest), col= "Red", lwd=2)
(2) Contour Map
Sometimes the data volume is very large, the data points on the scatter chart will be very concentrated, it is not easy to see the relationship between the variables or trends, which requires the use of two-dimensional contour graph to describe. The density function of two-dimensional data is estimated by using the function kde2d () in the package mass, and then the contour graph of density is drawn by the function contour (). If you do not want to draw the data labels on the graph, you can remove the parameter drawlabels=false. How to use the function kde2d ():
Kde2d (x, y, h, n = +, LIMs = C (range (x), range (y)))
where x, y is the data for the horizontal and vertical axes, and n specifies the number of grid points in each direction, which can be a scalar or a positive vector of length 2: The parameter LIMs represents the range of the longitudinal axis.
> Library (MASS) >?kde2d> a=kde2d (lowest, highest) > contour (a,col= "Blue", main= "contour plot")
(3) Matrix scatter plot
Graphs of multiple sets of data can also be shown with scatter plots, except that this is a matrix scatter plot. For a data frame, you can use the plot () command or pairs () to draw a matrix scatter plot directly in R.
> Pairs (Group)
(4) matrix diagram
When working with multiple sets of data, each group of data is often compared together, and Matplot () can place the scatter plots of each variable in the same drawing area.
> Matplot (group,type= "L", main= "Matplot")
(5) Box line diagram
> BoxPlot (group,cex.axis=.6)
(6) Star Chart (radar chart)
Stars (x, full = true, Scale = true, radius = True,labels = Dimnames (x) [[1]], locations = Null,nrow = NULL, Ncol = null, le n = 1,key.loc = null, Key.labels = Dimnames (x) [[2]],KEY.XPD = True,xlim = null, Ylim = null, Flip.labels = Null,draw.segme NTS = False,col.segments = 1:n.seg, Col.stars = NA, col.lines = Na,axes = FALSE, frame.plot = Axes,main = NULL, sub = NULL , Xlab = "", Ylab = "", CeX = 0.8, LWD = 0.25, lty = par ("lty"), xpd = False,mar = Pmin (Par ("mar"), 1.1+ C (2*axes+ (xlab! =) ""), 2*axes+ (Ylab! = ""), 1, 0), add = FALSE, plot = TRUE, ...)
(7) Line chart
Custom Function Required
(8) Harmonic curve
Custom Function Required
"Data Analysis R Language Practice" study notes the descriptive analysis of the data in the fifth chapter (Part I)