R language learning routes and common data mining packages

Source: Internet
Author: User

For beginners of the R language, the most common way is: meet not the place, ran to the forum Roar a voice, and then gladly or sad to leave, until the next question to come back. Of course, this is not the best way to learn, the best way is to read. At present, there are many books about R language in the market, both in Chinese and English. So, in many books, which one should start with a novice? How can you become a master in some way after you get started? I believe this is a question that many people have in mind. It is blessed to have this kind of doubt, because I will summarize the learning Roadmap of R language books according to my experience to make ruser less detours.

This article is divided into 6 parts, introducing beginners, advanced entry, drawing and visualization, econometrics, time series analysis, finance and so on.

1. Beginner's entry

"R language Combat", this is Ko Tao, Xiaonan and other translation of a book a comprehensive introduction of the introduction, graphics, statistics, regression, variance, efficacy analysis, generalized linear model, principal components, factor analysis, missing value processing. In addition, you can read Liusi Zhe's "153 minutes to learn R." This book collects the 153 most frequently asked questions for beginners in R. Why call it 153 minutes? Because the original author wrote 153 questions, it took 1 minutes to read a question, and it was 153 minutes in the global.

2. Advanced Introductory

After reading the above books, you can go to the advanced entry stage. There are two very classic books to read at this time. "Statistics with R" and "The R book". The reason that these two books are high, because these two books are no longer limited to the R Foundation, but a combination of data analysis of the various common methods to write, compared with the introduction of R in linear regression, variance analysis, multivariate statistics, R Mapping, time series analysis, data mining and other aspects of the content, after reading you will find, wow, It turns out that R can do so much, and it's so concise. Reading this is almost there, and the rest of the estimate is something you're going to specialize in. Here's a general talk.

3. Drawing and Visualization

Aristotle said, "Humans prefer to watch more than other senses." As a result, drawing and visualization are a lot of people's attention and attention. So how do you learn R-drawing and data visualization? How to draw a histogram? How do I add a density curve to the histogram? I'd like to finish reading the following books and you'll get the general idea.

First of all, the introduction of drawing can read the "R Graphics", personally think this is a more classic, a comprehensive introduction of the R system of cartography. The book corresponds to a website that Google can have. More in-depth reading of "Lattice:multivariate Data visualization with R" is available. All of these are more common. Of course, there is a more literary and elegant--ggplot2 system, see "Ggplot2:elegant Graphics for the Data analysis." There are data mining aspects of the book: "Mining with Rattle and R", mainly with Rattle software, personal preferred rattle! of course, Rattle is not the best, Rweka is also great! Then there is the interactive graphics of the book, the famous interactive system is Ggobi, this I have been fond of more than two years, about Ggobi's book has "Interactive and Dynamic graphics for the Data analysis with R and Ggobi", but , but also suitable for getting started, more comprehensive or go to Ggobi's homepage, which has a variety of information and package updates!

4. econometrics

On econometrics, we first recommend a very thin booklet: "Econometrics in R" for beginners. Then, is "applied econometrics with R", the book corresponding R package is AER, can be installed after use, the effect is very satisfactory. A large part of econometrics is about time series analysis, and this piece of content is said in the following place.

5. Time series Analysis

Time series Books of books are divided into two categories, one is a more universal book, the typical representative is: "Timing series analysis and its applications:with R examples". This book introduces the classical methods of various time series analysis and the R code to implement various classical methods, the book is in Chinese version. If you do not want to buy, it is recommended to download the author's homepage directly, the English version is actually very simple to read. A large chunk of time series analysis is about financial time series analysis. There are two "analysis of Financial time series" In this area, the first of which is the S-plus code, but the new version is already based on the R code. This book is suitable for people who have the basis of time series analysis and financial basis, because the book on the Theory of Time series analysis and a variety of financial knowledge is not particularly clear, the extreme value theory to calculate the part of the Var is more ugly understand. Another interesting is the rmetrics launch of the "Timeseriesfaq", this book is the financial time series introduction of things, speaking very basic, but difficult to understand. The corresponding Chinese version has the "Financial time series analysis frequently asked questions set", of course, has not yet been issued. Time series in the economic field there is a special situation called cointegration, many people are concerned about this theory, the concern of this can be seen in the analysis of Integrated and cointegrated time series with R. Finally, a more advanced book is about wavelet analysis, see "Wavelet Methods in Statistics with R". Add a little, about time series clustering books are relatively rare at present, is a virgin land, people with lofty ideals can be reclaimed!

6. Financial

The field of finance is broad, and if it is a big one, insurance will be included. To do finance with r more need to grasp the financial knowledge, only a few of the technical significance of data analysis. I think these books are useful for people who understand the technology of finance and data analysis, and those who only understand the technology of data analysis and do not move financial knowledge seem certain to be mirrors, even some people will think that financial analysis is relatively low. Some of the more classic books in this area are: "Advanced Topics in Analysis of economic and financial Data Using R" and "modelling financial time Series with S Plus ". Financial product pricing and the like are often used in stochastic differential equations, there is a book called "Simulation Inference Stochastic differential equations:with R Examples" is about this content, there are examples, The content is detailed! In addition, it is a risk measurement and management class. The classics are "Simulation techniques in financial Risk Management", "Modern actuarial Risk theory Using R" and "Quantitative Risk manag". Ement:concepts, Techniques and Tools. Portfolio analysis classes and option pricing classes can be seen in the Portfolio optimization with R and option Pricing and estimation of financial Models with R respectively.

7. Data Mining

Now the related books have been more, can be seen <r language classic books recommended > A few books recommended in the article.

8. Notes

Many books are already available in electronic versions. Can in some groups or Sina love ask csdn wait inside to look for.

A collection of r packages and functions related to or helpful to data mining.
1. Clustering
Commonly used packages: Fpc,cluster,pvclust,mclust
Partitioning-based approach: Kmeans, Pam, PAMK, Clara
Hierarchy-based approach: Hclust, Pvclust, Agnes, Diana
Model-based approach: Mclust
Density-based approach: Dbscan
Drawing-based method: Plotcluster, Plot.hclust
Verification-based method: Cluster.stats

2. Classification
Commonly used packages:
Rpart,party,randomforest,rpartordinal,tree,margintree,
Maptree,survival
Decision Tree: Rpart, Ctree
Random forest: Cforest, Randomforest
Regression, logistic regression, poisson regression: GLM, predict, residuals
Survival analysis: Survfit, Survdiff, coxph

3. Association rule and frequent item set
Commonly used packages:
Arules: Supports mining frequent itemsets, maximum frequent itemsets, frequent closed itemsets, and association rules
DRM: A repetitive association model of regression and categorical data
Apriori algorithm, breadth rst algorithm: Apriori, DRM
Eclat algorithm: Using equivalence class, RST depth search and the intersection of sets: Eclat

4. Sequence mode
Commonly used packages: Arulessequences
Spade algorithm: Cspade
5. Time series
Commonly used packages: Timsac
Time series build function: TS
Component decomposition: Decomp, decompose, STL, TSR

6. Statistics
Commonly used packages: Base R, Nlme
Variance analysis: AoV, ANOVA
Density Analysis: Density
Hypothesis test: T.test, Prop.test, Anova, AoV
Linear hybrid Model: LME
Principal component Analysis and Factor analysis: Princomp

7. Chart
Bar chart: Barplot
Pie chart: Pie
Scatter chart: Dotchart
Histogram: hist
Density chart: Densityplot
Candle chart, box-shaped diagram BoxPlot
QQ (quantile-quantile) Chart: Qqnorm, Qqplot, Qqline
Bi-variate Plot:coplot
Tree: Rpart
Parallel Coordinates:parallel, Paracoor, Parcoord
Heat map, Contour:contour, Filled.contour
Other diagrams: Stripplot, Sunflowerplot, Interaction.plot, Matplot, Fourfoldplot,
Assocplot, Mosaicplot
Saved Chart formats: PDF, PostScript, Win.metafile, JPEG, BMP, PNG

8. Data manipulation
Missing value: Na.omit
Variable Normalization: Scale
Variable transpose: t
Sample: Sample
Stacks: Stack, unstack
Others: Aggregate, merge, reshape

9. Interface with data mining software Weka
Rweka: With this interface, all the Weka algorithms can be used in R.

Transferred from: http://mp.weixin.qq.com/s?__biz=MjM5MTI3MzUwMA==&mid=202413215&idx=3&sn= E1d27235246e31edcac501ba47ffbbef&3rd=mza3mdu4ntyzmw==&scene=6#rd

R language learning routes and common data mining packages (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.