Research on multivariate statistical analysis of R language

Source: Internet
Author: User
Tags rcolorbrewer

# read multivariate statistical analysis data to R
Wine<-read.table ("Http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep= ",")
# Draw Multivariate Statistics
# Matrix Scatter plot
# A common approach is to use scatter plots to produce multivariate statistics that show a scatter plot between all variable 22.
# We can do this using the "Scatterplotmatrix ()" function in the "car" package in R.
Library (CAR)
Scatterplotmatrix (Wine[2:6])
# Scatter plot of group callout data points
Plot (WINE$V4,WINE$V5)
Text (wine$v4,wine$v5,wine$v1,cex=0.7,pos=4,col= "Red")


# contour Map?
# contour Map? Another very useful chart type is "Contour", which shows the variation of each variable by drawing the value of each variable in the sample.
# the "Makeprofileplot ()" function below can draw a contour map. This function requires a "rcolorbrewer" library.
Makeprofileplot<-function (mylist,names) {
Require (Rcolorbrewer)
# Find out how many variables we want to include
Numvariables<-length (MyList)
# choose ' numvariables ' random colours
Colours<-brewer.pal (Numvariables, "Set1")
# Find out the minimum and maximum values of the variables:
Mymin<-1e+20
Mymax<-1e-20
For (i in 1:numvariables) {
Vectori<-mylist[[i]]
Mini<-min (Vectori)
Maxi<-max (Vectori)
if (mini<mymin) {Mymin<-mini}
if (Maxi>mymax) {Mymax<-maxi}
}

# Plot the variables
For (i in 1:numvariables) {
Vectori<-mylist[[i]]
Namei<-names[i]
Colouri<-colours[i]

if (i = = 1) {Plot (vectori,col=colouri,type= "L", Ylim=c (Mymin,mymax))}
else {points (vectori,col=colouri,type= "L")}

Lastxval<-length (Vectori)
Lastyval<-vectori[length (Vectori)]
Text ((lastxval-10), (lastyval), namei,col= "Black", cex=0.6)
}
}
# For example, in order to draw a contour map of the first five chemicals in the wine sample (they are stored in the V2,V2,V4,V5,V6 column of the "Wine" variable), we enter:
Library (Rcolorbrewer)
Names<-c ("V2", "V3", "V4", "V5", "V6")
Mylist<-list (WINE$V2,WINE$V3,WINE$V4,WINE$V5,WINE$V6)
Makeprofileplot (Mylist,names)


# Calculate summary statistics for multivariate statistical data
# Another thing you might want to do is to calculate the summary statistics for each variable in your multivariate statistical dataset, such as mean, standard deviation, and so on.
Sapply (Wine[,2:14],mean)
Sapply (WINE[,2:14],SD)
# we can standardize to make the data look more meaningful so that we can clearly compare these variables. We need to pass each variable so that they have a sample variance of 1 and a sample mean of 0.


# mean and variance for each group
# usually interested in calculating their mean and standard deviations from a particular sample population, for example, to calculate a sample of each grape variety. The wine variety is stored in the "V1" column of the "Wine" variable.
# in order to extract only the data of the 2nd symbol, we enter:
Cultivar2wine<-wine[wine$v1==2,]
Sapply (Cultivar2wine[2:14],mean)
Sapply (CULTIVAR2WINE[2:14],SD)
You can also use similar methods to calculate the 1th sample, or the mean and standard deviation of 13 chemical concentrations for the 3rd sample:
However, for the sake of convenience, you may want to output the mean and standard deviation of grouped data in a data set by the following "Printmeanandsdbygroup ()" function:
Printmeanandsdbygroup<-function (variables,groupvariable) {
# Find the names of the variables
Variablenames<-c (Names (groupvariable), names (As.data.frame (variables)))
# within each group, find the mean of each variable
groupvariable<-groupvariable[,1] #ensures groupvariable is not a list
Means<-aggregate (As.matrix (variables) ~groupvariable,fun=mean)
Names (means) <-variablenames
Print (Paste ("Mean:"))
Print (means)
# within each group, find the standard deviation of each variable:
Sds<-aggregate (As.matrix (variables) ~GROUPVARIABLE,FUN=SD)
Names (SDS) <-variablenames
Print (Paste ("standard deviations:"))
Print (SDS)
# within each group, find the number of samples:
Samplesizes<-aggregate (As.matrix (variables) ~groupvariable,fun=length)
Names (samplesizes) <-variablenames
Print (Paste ("Sample sizes:"))
Print (samplesizes)
}
Printmeanandsdbygroup (Wine[2:14],wine[1])
# The function "Printmeanandsdbygroup ()" Outputs the number of the grouped sample. In this example, we can see that the symbol 1 has 59 samples, the species 2 has 71 samples, and the variety 3 has 48 samples.


# # variable inter-group variance and intra-group variance
# If we want to calculate the intra-group variance of a particular variable (for example, to calculate the concentration of a specific chemical), we can use the following "calwithingroupsvariance ()" function:
Calcwithingroupsvariance<-function (variable,groupvariable) {
# Find out how many values the group variable can take
Groupvariable2<-as.factor (Groupvariable[[1])
Levels<-levels (Groupvariable2)
Numlevels<-length (Levels)
# get the mean and standard deviation for each group:
numtotal<-0
denomtotal<-0
For (i in 1:numlevels) {
Leveli<-levels[i]
Levelidata<-variable[groupvariable==leveli,]
Levelilength<-length (Levelidata)
# get the mean and standard deviation for group I:
Meani<-mean (Levelidata)
SDI&LT;-SD (Levelidata)
numi<-(levelilength-1) * (SDI*SDI)
Denomi<-levelilength
Numtotal<-numtotal+numi
Denomtotal<-denomtotal+denomi
}
# Calculate the Within-groups variance
vw<-numtotal/(Denomtotal-numlevels)
Return (VW)
}
# For example, to calculate the intra-group variance of the V2 variable (the concentration of the first chemical), we enter:
Calcwithingroupsvariance (Wine[2],wine[1]) # [1] 0.2620525
# we can calculate the inter-group variance of a specific variable (such as V2) by using the "calcbetweengroupsvariance ()" function described below:
Calcbetweengroupsvariance <-Function (variable,groupvariable) {
# Find out how many values the group variable can take
Groupvariable2 <-As.factor (groupvariable[[1])
Levels <-levels (GROUPVARIABLE2)
Numlevels <-Length (levels)
# Calculate the overall grand mean:
Grandmean <-mean (variable[,1])
# get the mean and standard deviation for each group:
Numtotal <-0
Denomtotal <-0
For (i in 1:numlevels)
{
Leveli <-Levels[i]
Levelidata <-Variable[groupvariable==leveli,]
Levelilength <-Length (levelidata)
# get the mean and standard deviation for group I:
Meani <-mean (levelidata)
SDI <-SD (LEVELIDATA)
Numi <-Levelilength * ((Meani-grandmean) ^2)
Denomi <-Levelilength
Numtotal <-Numtotal + Numi
Denomtotal <-Denomtotal + Denomi
}
# Calculate the Between-groups variance
Vb <-numtotal/(NUMLEVELS-1)
Vb <-Vb[[1]]
Return (VB)
}
# You can use it like this to calculate the inter-group variance of V2:
Calcbetweengroupsvariance (Wine[2],wine[1]) # [1] 35.39742
# we can calculate "separation" by dividing the variance of the variables by the intra-group variance. Thus, this interval computed by V2 is:
Calcbetweengroupsvariance (Wine[2],wine[1])/calcwithingroupsvariance (wine[2],wine[1])
# If we want to calculate the interval from all variables of multivariate statistics, you can use the following "Calcseparations ()":
Calcseparations<-function (variables,groupvariable) {
# Find out what many variables we have
Variables<-as.data.frame (variables)
Numvariables<-length (variables)
# Find the variable names
Variablenames<-colnames (variables)
# Calculate the separation for each variable
For (i in 1:numvariables) {
Variablei<-variables[i]
Variablename<-variablenames[i]
Vw<-calcwithingroupsvariance (variablei,groupvariable)
Vb<-calcbetweengroupsvariance (variablei,groupvariable)
Sep<-vb/vw
Print (Paste ("variable", VariableName, "vw=", Vw, "vb=", Vb, "separation=", Sep))
}
}
# For example, to calculate the interval of 13 chemical concentrations per variable, we enter:
Calcseparations (Wine[2:14],wine[1])
# Therefore, the maximum interval for individual variables within a group (wine variety) is V2 (interval 233.0).
# as we will discuss below, the purpose of linear discriminant analysis (LDA) is to look for a linear combination of individual variables to achieve the maximum interval within the group (here is the symbol).
# Here it is hoped to get a better interval to replace this optimal interval by any individual variable (temporarily V8 233.9).

Research on multivariate statistical analysis of R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.