Absrtact: The classical statistical analysis methods are mainly regression analysis, logistic regression, decision tree, support vector machine, cluster analysis, correlation analysis, principal component analysis, correspondence analysis, factor analysis and so on, then the use of these classical analytical methods in R mainly have those packages and functions.
1. Linear Model ~ Regression analysis: "Package": Stats "function": LM (formula, data, ...) stepwise regression: Step (LM (formula, data, ...)) Regression diagnostics: Influence.measure (LM (formula , data, ...)) Multi-collinearity: Kappa (Xx,exact=t), Eigen (XX) autocorrelation test: First Order: Dwtest (Y~X) Multi-stage: Bgtest (y~x,order=2,type= "chisq") "Remarks": 1) Stats in bag lm () Multivariate linear models can be made, ANOVA.MLM () compare multiple multivariate linear models, Manova () do multivariate Anova (Manova). 2) the Msn.mle () and and Mst.mle () of the SN package can fit the multivariate partial normal and partial t distribution models. 3) PLS package provides partial least squares regression (PLSR) and principal component regression, 4) PPLS package can be punished partial least squares regression, 5) Dr Package provides reduced dimension regression method, such as: Slice inverse regression method (sliced inverse Regression), average variance estimation of slices (sliced Average variance estimation). 6) Plsgenomics The genome analysis based on partial least squares regression. 7) Relaimpo package can evaluate the relative importance of regression parameters. 2, Logistic regression: "package": Stats "function": GLM (Formula, Family=gaussian,data, ...) Note: Family binomial (link = "logit") Gaussian (link = "Identity") Gamma (link = "inverse") inverse.gaussian (link = "1/mu^2") Poisson (link = "log") quasi (link = "Identity", Variance = "constant") quasibinomial (link = "logit") quasipoisson (link = "Log")
3. Unsupervised Classification ~ Decision Tree: "Package": Rpart "function": Rpart (Formula,data, Method= "class", Control=ct,parms=list (Prior=c (p,1-p), split= " Information "))
Rpart.plot (fit,branch=1,branch.type=2,type=1,extra=102,shadow.col= "Gray", box.col= "green",
split.cex=1.2,main= "kyphosis decision tree") #提供了复杂度损失修剪的修剪方法
Printcp (FIT): Tell which layer to split, cp,nsplit,rel,error, estimated error of cross-validation (XERROR), standard error (XSTD)
Prune (fit,cp=fit$cptable[which.min (fit$cptable[, "Xerror"), "CP"]): Pruning function
"Remarks": 1) Cran's machinelearning task List has a detailed description of the tree method.
2) The classification tree is also often an important multivariate method, the Rpart package is exactly such a package,
3) Rpart.permutation package can also do Rpart () model replacement (permutation) test.
4) The tree of the Twix can be pruned externally.
5) Hier.part The variance of the multivariate data set.
6) Mvpart package can do multiple regression tree,
7) The party package implements recursive segmentation (recursive partitioning),
8) The RRP package realizes random recursive segmentation.
9) Caret package can do classification and regression training, and then CARETLSF package to achieve the parallel processing.
The K-Nearest neighbor method of KKNN package can be used for regression and classification.
4, Support vector machine:
"Package": E1071,kernlab
"Function": SVM (x_train,y_train,type= "c-classification", cost=10,kernel= "radial", Probability=true,scale=false)
SVP=KSVM (x,y,type= "C-svc", kernel= "RBF", Kpar=list (sigma=1), c=1)
5, Unsupervised classification ~ Cluster Analysis:
"Package": Stats
"Functions": System Clustering: Hclust (d,method= "complete", Members=null)
Fast clustering: Kmeans (x,centers,iter.max=10,nstart=1,algorithm= "Hartigan-wong")
Distance function: Dist (x,method= "Euclidean", diag=false,upper=false,p=2)
"Remarks": 1) Cran's cluster task list the clustering method of r implementation is comprehensively reviewed.
2) Stats provides hierarchical clustering of hclust () and K-means clustering Kmeans ().
3) There are a lot of clustering and visualization techniques in the cluster package,
4) There are some clustering validation procedures in the CLV package,
5) The Classagreement () of the e1071 package calculates Rand Index to compare the results of two classifications.
6) Trimmed K-means cluster analysis can be implemented by the Trimcluster package,
7) The clustering Fusion method (Cluster ensembles) is implemented by the clue package,
8) Clustersim package can help to choose the best clustering,
9) The Hybridhclust package provides some hybrid clustering methods.
The energy pack has a distance measure function edist () and a hierarchical clustering method based on the e-Statistic Hclust.energy ().
One) The Llahclust package provides clustering based on the likelihood (likelihood linkage) method, as well as indicators for evaluating clustering results.
There are clusters based on mahalanobis distance in the FPC bag.
Clustvarsel package has a variety of model-based clustering.
14) Fuzzy clustering (clustering) can be implemented in cluster package and Hopach package.
The Kohonen package provides supervised and unsupervised SOM algorithms for high-dimensional spectroscopy (spectra) or patterns (pattern).
Clustergeneration package helps to simulate clustering.
Cran's Environmetrics Task List also has a summary of the relevant clustering algorithms.
The Mclust package implements model-based clustering,
MFDA package implements model-based clustering of functional data.
6. Correlation Analysis:
"Package": Arules,matrix,lattice,arulesviz
"Function": Apriori (Groceries,parameter=list (support=0.01,confidence=0.2))
Eclat (Groceries, parameter = List (support = 0.05), control = List (Verbose=false))
7. Principal Component Analysis:
"Package": Stats
"Function": Princomp (Data,cor=false,scores=true,covmat=null,subset=rep (True,nrow (As.matrix (x))),...)
PrComp (Data,cor=false,scores=true,covmat=null,subset=rep (True,nrow (As.matrix (x))),...)
PrComp: Singular value decomposition method using observed values; Princomp: Eigenvalue decomposition method using correlation coefficient matrix
"Remarks": 1) the stats Package PrComp () (based on SVD ()) and Princomp () (based on Eigen ()) can calculate the principal component.
2) SCA packages do a single component analysis.
3) Nfactors can evaluate the crushed stone plot (scree plot),
4) The Paran package can be used to evaluate the principal component and factor analysis obtained by principal component analysis.
5) Pcurve Package Master curve (Principal Curve) analysis and visualization.
6) The Gmodels package provides Fast.prcomp () and FAST.SVD () suitable for large matrices.
7) The KPCA () in the Kernlab package uses a nuclear method to do non-linear principal component analysis.
8) The Pcapp packet projection pursuit (Projection Pursuit) method calculates the robust/robust (robust) principal component.
9) the Acpgen () and Acprob () functions of the AMAP package are respectively for the generalized (generalized) and robust (robust) principal component analysis.
8. Correspondence Analysis:
"Package": Ca,mass,vegan,factominer
"Functions": Simple Correspondence Analysis: CA (data,...)
Multiple Correspondence Analysis: MJCA (Data,...)
plot3d.ca (CA (data,nd=3))
Plot (MJCA (data,lambda= "Burt"))
"Remarks": 1) Mass package of CORRESP () and MCA () can do simple and multiple correspondence analysis;
2) The CA package provides single, multiple, and joint correspondence analysis;