Basic R packages have implemented many of the functions of traditional multivariate statistics, however many of the other packages provided by Crna provide a more in-depth multivariate statistical approach, the following package is mainly divided into the following sections:
1) Multivariate visualization (visualising multivariate data): Drawing method:
The basic drawing functions (such as: Pairs (), Coplot ()) and the paint function in the lattice package (Xyplot (), Splom ()) can be drawn as two-dimensional scatter plots of pairs of lists, 3-dimensional density plots. The Scatterplot.matrix () function in the car bag provides a more powerful representation of the two-dimensional scatter plot. The PLTSPLOMT () function of the Cwhplot package in the Cwhmisc package is similar to the pair () to draw a scatter graph matrix, and a histogram or density estimate can be drawn at a diagonal position. In addition, the Scatterplot3d package can draw a 3-dimensional scatter plot, aplpack bag Bagplot () can draw two variables of the BOXPLOT,SPIN3R () can be drawn to rotate the three-dimensional point map. The MISC3D package has a function of visualizing density. The Yaletoolkit package provides a number of multivariate data visualization techniques, as is the case with Agsemisc. More special multivariate graphs include: The Faces () in the Aplpack package can be drawn in Chernoff's face;mass bag of Parcoord () to draw parallel plots (each line of the matrix draws a line, the horizontal axis represents each column of the matrix); stars in the graphics Bag () A star graph that can draw multivariate data (each line of the matrix is represented by a star). The smallest spanning tree can be drawn from the Mstree () in the ade4 bag and the Spantree () in the vegan bag. The calibrate package supports two-variable plots and scatter plots, and the Chplot package can draw convex hull graphs. The geometry package provides an interface to the Qhull library, and the index of the corresponding point is given by Convexhulln (). Ellipse can be used to draw an ellipse or to visualize a correlation matrix with Plotcorr (). The Denpro package provides a horizontal set tree structure (level set trees) for multiple visualizations. The Mosaicplot () in the graphics package and the mosaic () function in the VCD bag draw a mosaic (mosaic plot). The Gclus package provides scatter plots and parallel coordinate plots for clustering. Rggobi Package and Describedisplay package is Ggobi interface, Describedisplay diagram can achieve the requirements of publishing quality, Xgobi package is Xgobi and Xgvis interface, can achieve dynamic interaction diagram. Finally, the Iplots package provides powerful dynamic interaction diagrams, especially parallel and mosaic diagrams. The Seriation package provides a seriation method to rearrange the matrix and system tree.
Data preprocessing:
AIS packages provide a preliminary description function for multivariate data. The summarize () and Summary.formula () in the Hmisc package describe data, and the Varclus () function can be clustered, while Datarep () and find.matches () find typical data and matching data for a given dataset. The NN () function in the Knnfinder package uses Kd-tree to find the number of similar variables. The DPREP package provides data preprocessing and visualization functions for the classification, such as checking the redundancy and standardization of variables. The dist () in the base package and the Daisy () function in the cluster package provide a distance calculation function, and the proxy package provides more distance measures, including the distance between matrices. The Simba package handles both the existing and missing data, including the similarity matrix and the reshape.
2) hypothesis test (hypothesis testing):
The ICSNP package provides hotelling (hotellings) T2 Inspection and many non-parametric test methods, including location test based on marginal ranks, calculating spatial median and symbol, shape estimation. The Cramer package makes two samples of non-parameter test, SPATIALNP can do space symbol and rank test.
3) Multivariate distribution (multivariate distributions): Descriptive statistics (descriptive measures):
The covariance and correlation coefficients are estimated by cov () and and Cor () in the stats package respectively. The ICSNP package provides several data description methods, such as: Spatial.median () to estimate spatial median values, and other functions to estimate scatter. The Cov.rob () in the mass package provides a more robust estimation of variance/covariance matrices. The covariance estimation method of Covrobust packets is estimated by the nearest neighbor Variance estimator. The COVMCD () of the Robustbase package estimates the covariance and covogk () to do orthogonalized gnanadesikan-kettenring. The Rrcov package provides an extensible and robust estimation function covmcd (), Covmest (). Corpcor packages can calculate large-scale covariance and partial correlation matrices.
Density estimation and Simulation (densities (estimation and simulation)):
The Mvrnorm () of the mass package produces a random number with a multivariate normal distribution. The Mvtnorm package has the probability and quantile function of multivariate t distribution and multivariate normal distribution, and the density function of multivariate normal distribution can also be calculated. The MVTNORMPCS package provides Dunnett-based functions. The MNORMT package provides the density and distribution functions of the meta T distribution and the multivariate normal distribution, and can generate random numbers. The SN package provides the density, distribution, and random number functions of the multivariate partial t distribution and the partial normal distribution. The Delt package provides a number of functional methods for estimating multivariate density, such as cart and greedy methods. Cran's Cluster Task List (http://cran.r-project.org/web/views/Cluster.html) has more comprehensive information about the Rmvnorm.mixt () and Dmvnorm.mixt () in the KS package The function generates random numbers and estimated densities, and there are many fitting methods in the BAYESM packet. Many places provide functions that mimic the distribution of Wishart, such as the Rwishart () in the BAYESM package, the Rwish () in the Mcmcpack package, and the Mcmcpack packet with the density function dwish (). The Bkde2d () in the Kernsmooth package and the kde2d () of the mass package are divided into bins (binned) or two-dimensional kernel density estimates. KS bags are also like ash and genkern to make the core smoothing (kernel smoothing). Prim packet usage to find high-dimensional multivariate data, the feature packet can compute the salient features of multivariate data.
Normal test (assessing normality):
The Mvnormtest package provides a multivariate data extension method for Shapiro-wilks inspection, Mvoutlier packet detection multivariate outlier (outlier), and the ICS package verifies a multivariate normal distribution. The Mvnorm.etest () in Energy pack () is a normal test based on e-statistics, and k.sample () verifies that multiple data are from the same distribution. The Mardia () in the Dprep bag is tested for normality with Mardia. The Mauchly.test () in the stats package verifies the covariance matrix of the Wishart distribution.
Connection function (Copulas):
The copula package provides routines for routine copula functions, including: normal, T, Clayton, Frank, Gumbel. FGAC Package provides generalised Archimedian Copula,mlcopulaselection package can do two variable copula.
4) linear model (Linear models):
LM () in the stats package can do multivariate linear models, ANOVA.MLM () compare multiple multivariate linear models, Manova () do multivariate Anova (Manova). The Msn.mle () and and Mst.mle () of the SN package can fit the multivariate partial normal and partial t distribution models. Pls package provides partial least-squares regression (PLSR) and principal component regression; The PPLS package can be used to punish partial least-squares regression, and the DR package provides a reduced-dimension regression method, such as.: Slice inverse regression method (sliced inverse Regression), average variance estimation of slices (sliced average Variance estimation). Plsgenomics is a genome analysis based on partial least squares regression. The Relaimpo package can evaluate the relative importance of regression parameters.
5) Projection method (Projection methods): Main component (Principal components):
PrComp () (based on SVD ()) and Princomp () (based on Eigen ()) of the stats package can calculate the principal component. SCA packages do a single component analysis. Nfactors can evaluate a crushed stone plot (scree plot), and the Paran package evaluates the factors derived from principal component analysis and factor analysis. Pcurve Package Master curve (Principal Curve) analysis and visualization. The Gmodels package provides Fast.prcomp () and FAST.SVD () suitable for large matrices. The KPCA () in the Kernlab package uses a nuclear method to do non-linear principal component analysis. The robust/robust (robust) principal component is calculated using the projection pursuit (projection pursuit) method of the Pcapp packet. The Acpgen () and Acprob () functions of the AMAP package are respectively for generalized (generalized) and robust (robust) principal component analysis. The main components in many aspects also have corresponding applications, such as: Ecological ade4 package, sensory SENSOMINR package. Psy Bag has a variety of procedures for psychology, related to the main components are: SPHPCA () with a spherical visual representation of the correlation matrix, similar to the 3D PCA;FPCA () graphic display of the results of the principal component analysis, and allow the correlation between certain variables; Scree.plot () Graphs show the eigenvalues of the correlation or covariance matrix. The Ptak package does the assertion analysis (Principal Tensor analyses). The SMATR package provides functions for the growth of Allometry.
Typical correlation (Canonical Correlation):
The Cancor () in the stats package is a typical correlation function. The Kernlab package provides a more robust kernel method Kcca (). The Concor package offers many concordance methods.
Redundancy Analysis (redundancy analyst):
The RDA () function in the calibrate packet can be used for redundancy analysis and typical correlation. The FSO package provides a fuzzy set ordering (ordination) method.
Independent ingredients (Independent components):
Fastica Package Fastica algorithm for independent component Analysis (ICA) and projection Pursuit analysis (Projection Pursuit), Mlica packet provides the maximum likelihood fitting of independent component analysis, Pearsonica packets based on the mutual information of the scoring function to separate the independent signal. The ICS package can perform an invariant coordinate system (invariant coordinate system) and independent component analysis (independent components). The Jade package provides an interface to the jade algorithm and can do some ICA.
Pruck Analysis (Procrustes):
The Procrustes () in the vegan package can do pruck analysis, as well as a sort (ordination) function. More general Pruck analysis can be achieved by the GPA () in the Factominer package.
6) Primary Coordinate/scale method (Principal coordinates/scaling methods):
The Cmdscale () function of the stats package performs traditional multidimensional scale analysis (Multidimensional Scaling,mds) (main coordinate analysis principal coordinates analyses), mass package Sammon () and Isomds () functions perform Sammon and Kruskal non-metric multidimensional scale analysis respectively. The Vegan package provides package (wrappers) and post-processing programs for non-metric multidimensional scale analysis.
7) Unsupervised classification (unsupervised classification): Cluster analysis:
Cran's cluster task List provides a comprehensive overview of the clustering method implemented by R. Stats provides hierarchical clustering of hclust () and K-means clustering Kmeans (). There are a number of clustering and visualization techniques in the cluster package, and there are some clustering confirmations in the CLV package, and the Classagreement () of the e1071 package calculates Rand index to compare the two classification results. Trimmed K-means cluster analysis can be implemented by Trimcluster package, clustering Fusion method (Cluster ensembles) is implemented by clue package, Clustersim package can help to select the best clustering, The Hybridhclust package provides some hybrid clustering methods. The energy pack has a distance measure function edist () and a hierarchical clustering method based on the e-Statistic Hclust.energy (). The Llahclust package provides clustering based on the likelihood (likelihood linkage) method, as well as indicators for evaluating clustering results. There are clusters based on mahalanobis distance in the FPC bag. The Clustvarsel package has a variety of model-based clustering. Fuzzy clustering (clustering) can be implemented in cluster package and Hopach package. The Kohonen package provides supervised and unsupervised SOM algorithms for high-dimensional spectroscopy (spectra) or patterns (pattern). The Clustergeneration package helps to simulate clustering. Cran's Environmetrics Task List also has a summary of related clustering algorithms. Mclust package realizes the model-based clustering, and the MFDA package realizes the model-based clustering of the function data.
Tree method:
Cran's machinelearning Task List has a detailed description of the tree method. Classification tree is also often an important multivariate method, Rpart package is such a package, rpart.permutation package can also do Rpart () model replacement (permutation) test. The tree of the Twix can be pruned externally. Hier.part Package splits the variance of a multivariate data set. Mvpart package can do multivariate regression tree, party package realizes recursive segmentation (recursive partitioning), RRP package realizes random recursive segmentation. Caret package can do classification and regression training, and then the CARETLSF package realizes the parallel processing. The K-Nearest neighbor method of KKNN package can be used for regression and classification.
8) Supervised classification and discriminant analysis (supervised classification and discriminant analyses):
The LDA () and Qda () in the mass package respectively for linear and two discriminant analysis. MDA () and FDA () MDA packages allow for mixed and more flexible discriminant analysis, Mars () makes multivariate adaptive spline regression (multivariate adaptive regression splines), Bruto () Do adaptive spline back fitting (adaptive spline backfitting). There are also functions of multi-adaptive spline regression in earth package. The RDA package enables the classification of high-dimensional data using the centroid shrinkage method (shrunken centroids regularized discriminant analysis). The KNN () function of the class packet of VR executes the K-nearest neighbor algorithm, and the K-nearest neighbor algorithm for categorical variables is available in the KNNCAT packet. The Sensominer package of FDA () is used for factor discriminant analysis. Many packages combine the dimensionality reduction (dimension reduction) and classification. The Klar package can be used for variable selection, processing multiple collinearity, and visualizing functions. SUPERPC package uses the principal component to do supervised classification, CLASSPP package can do the projection pursuit (projection pursuit), Gpls pack Generalized Partial least squares to do classification. The linear discriminant analysis of cross-validation of Hddplot packets determines the optimal number of features. The Supclust packet can be used to monitor the gene according to the chip data. ROCR provides many ways to evaluate the performance of a classification. The Predbayescor package can be classified as naive Bayes (naïve Bayes). More information on the supervised classification can be found in the Machinelearning task list.
9) Correspondence Analysis (correspondence):
The mass package of CORRESP () and MCA () can do simple and multiple correspondence analysis. The CA package provides single, multiple, and federated (joint) Correspondence analysis. The CA () and MCA () of the ADE4 package do general and multiple correspondence analysis respectively. A similar function is found in the vegan package. Cocorresp can realize the co-correspondence analysis between two matrices. The CA () and MCA () functions of the Factominer package can also perform similar simple and multiple correspondence analysis, as well as drawing functions. Homals performs homogeneous analysis (homogeneity).
10) Forward lookup (Forward search):
The RFWDMV package performs a forward lookup of multivariate data.
11) Missing (Missing data):
The Mitools package has a function of multiple estimates of missing data (multiple imputation), the mice pack chained equations to achieve multiple estimates, MVNMLE packet can be a maximum likelihood estimation for the missing values of multivariate normal data (ML estimation), the norm package provides a desired maximization algorithm (EM algorithm) for estimating missing values for multivariate normal data, and the cat package allows multiple estimates of missing values for categorical data, and the mix package is suitable for mixed data for categorical and continuous data. Pan packages can make multiple estimates of missing values for the panel data. Vim package makes the visualization and estimation of missing data. The Aregimpute () and Transcan () of the Hmisc package provide additional methods for estimating missing values. EMV package provides a KNN method for estimating missing data. The MONOMVN packet estimates the missing values of monotone multivariate normal data.
12) Implicit variable method (latent variable approaches):
The factanal () of the stats package performs the maximum likelihood factor analysis, and the Mcmcpack package can do Bayesian factor analysis. The Gparotation package provides a projection gradient (Gradient Projection) rotation factor method. The genetic algorithm of fair packet was used as factor analysis. The IFA package can be used for non-normal variables. The model of linear structure equation is fitted by SEM package. The LTM package can do implicit semantic parsing (latent semantic analysis), and the ERM package fits the Rasch model (Rasch models). There are many factor analysis methods in the Factominer package, including: MFA () multivariate factor Analysis, HMFA () hierarchical multivariate Analysis, ADFM () multivariate analysis of quantitative and qualitative data. TSFA the factor analysis of the execution time series of the package. The POLCA packet is a latent category analysis (latent class analyst) for multi-categorical variables (polytomous variable).
13) Non-Gaussian modeling (modelling Non-gaussian data):
Bivpois Package Modeling Poisson distribution of two variables. The Mprobit package provides a multivariate probabilistic model for two-and sequential-response variables. The MNP package realizes the Bayesian multivariate probability model. The Polycor package calculates multiple sets of related (Olychoric correlation) and four-related (tetrachoric correlation) matrices. There are several models in the BAYESM package, such as: surface non-correlated regression (seemingly unrelated Regression), multivariate logit/probit model, tool variable method (instrumental Variables). The Vgam package has: generalized linear and additive models (Vector generalised Linear and Additive Models), reduced-rank regression (reduced rank regression).
14) matrix processing (matrices manipulations):
As a vector-and matrix-based language, R has many powerful tools for processing matrices, implemented by the package matrix and Sparsem. The MatrixCalc package adds the function of matrix calculus. The spam package provides a more in-depth approach to sparse matrices.
15) Other (Miscellaneous Utitlies):
DEA Packet performs data envelopment analysis (Envelopment Analysis,dea). Abind package combination multidimensional array. Hmisc
The mapply () of the package expands the functionality of apply (). In addition to the previously described features, the SN package also provides the marginalisation, affine transformations (affine transformations), and so on, without partial normality and partial t distributions. SHAREDHT2 package performs chip data hotelling ' s T2 inspection. The panel package has a modeling method for panel data. The Mar packet can be made vector autoregressive model (vector auto-regression) with Bayesian vector autoregressive model in the Msbvar packet. The Rm.boot () function of the Hmisc package bootstrap repeated measurement test (repeated Measures Models). The compositions package provides composite data analysis (compositional).
The Cramer package is a multivariate non-parametric Cramer test for two sample data. Psy has many common methods of psychology. The Cwhmath package in the CWHMISC package has many interesting features, such as various rotation functions. The desirability package provides a multi-variable optimization method based on the density function. The Geozoo package can draw a geometric object defined in the Geozoo package.
Reference: http://blog.sina.com.cn/s/blog_7404f71e0102v7z8.html
Ml-r Common multivariate statistical analysis package (continuous update ...) )