Today found a very good blog (http://www.RDataMining.com), Bo Master is committed to research the R language in data mining applications, just recently want to learn a system of r language and data mining the entire process, read the content of this blog, the heart of a long time can not calm. The decision starts today ... Found a very good blog today (http://www.RDataMining.com, Bo Master is committed to research R language in the application of data mining, just recently want to learn a system of r language and data mining the entire process, read the content of this blog, the heart of a long time can not calm. Decided from today, as long as the evening can be before 11 o'clock to wash the bowl, spend one hours of time to learn the content of the blog, and the learning process to remember the information recorded, by the way to the English level four gap as far as possible to narrow.
The collection of R packages and functions available for data mining is listed below. Some of them are not specifically developed for data mining, but these packages can help us a lot in the process of data mining, so they are included.
1. Clustering
Commonly used packages: Fpc,cluster,pvclust,mclust
Partitioning-based approach: Kmeans, Pam, PAMK, Clara
Hierarchy-based approach: Hclust, Pvclust, Agnes, Diana
Model-based approach: Mclust
Density-based approach: Dbscan
Drawing-based method: Plotcluster, Plot.hclust
Verification-based method: Cluster.stats
2. Classification
Commonly used packages:
Rpart,party,randomforest,rpartordinal,tree,margintree,
Maptree,survival
Decision Tree: Rpart, Ctree
Random forest: Cforest, Randomforest
Regression, logistic regression, poisson regression: GLM, predict, residuals
Survival analysis: Survfit, Survdiff, coxph
3. Association rule and frequent item set
Commonly used packages:
Arules: Supports mining frequent itemsets, maximum frequent itemsets, frequent closed itemsets, and association rules
DRM: A repetitive association model of regression and categorical data
Apriori algorithm, breadth rst algorithm: Apriori, DRM
Eclat algorithm: Using equivalence class, RST depth search and the intersection of sets: Eclat
4. Sequence mode
5. Time series
Commonly used packages: Timsac
Time series build function: TS
Component decomposition: Decomp, decompose, STL, TSR
6. Statistics
Commonly used packages: Base R, Nlme
Variance analysis: AoV, ANOVA
Density Analysis: Density
Hypothesis test: T.test, Prop.test, Anova, AoV
Linear hybrid Model: LME
Principal component Analysis and Factor analysis: Princomp
7. Chart
Bar chart: Barplot
Pie chart: Pie
Scatter chart: Dotchart
Histogram: hist
Density chart: Densityplot
Candle chart, box-shaped diagram BoxPlot
QQ (quantile-quantile) Chart: Qqnorm, Qqplot, Qqline
Bi-variate Plot:coplot
Tree: Rpart
Parallel Coordinates:parallel, Paracoor, Parcoord
Heat map, Contour:contour, Filled.contour
Other diagrams: Stripplot, Sunflowerplot, Interaction.plot, Matplot, Fourfoldplot,
Assocplot, Mosaicplot
Saved Chart formats: PDF, PostScript, Win.metafile, JPEG, BMP, PNG
8. Data manipulation
Missing value: Na.omit
Variable Normalization: Scale
Variable transpose: t
Sample: Sample
Stacks: Stack, unstack
Others: Aggregate, merge, reshape
9. Interface with data mining software Weka
R Language Common Data mining package