R language
R is the language and operating environment for statistical analysis, plotting. R is a free, free, source-code software that belongs to the GNU system and is an excellent tool for statistical computing and statistical mapping.
Feature Introduction
• Mainly used for statistical analysis, drawing, data mining
The • R has a number of statistical and digital analysis functions built into it. The functions of R can also be enhanced by the installation package (Packages, user-authored features).
• Because of the lineage of S, R has a stronger object-oriented (object-oriented programming) function than other programming languages for statistical or mathematical purpose
Official website: http://cran.r-project.org/
Other introduction
Another strength of • R is the drawing function, which has the qualities of printing, and the addition of mathematical symbols.
• Although r is primarily used for statistical analysis or development of statistical-related software, it is also used as a matrix calculation. The analysis speed is comparable to the free software GNU octave and business software MATLAB dedicated to matrix computing.
spss-Another statistical analysis software
SAS System-another statistical analysis software
Installation
• R official website Download 3.1.3
In the installation of the R language environment, the installation directory is best not to include spaces, special characters, Chinese characters, etc., or when installing Rstudio, may not find the address of the R operating environment;
IDE Development environment using Rstudio
After the installation is complete, the rstudio:https://www.rstudio.com/products/rstudio/download/is the following, a very lightweight tool:
Configure Rstudio space and default encoding, tool-"Global Options ...
Management and understanding of data
• After completing this chapter, you will understand:
– Basic R data structures and how these data structures are used to store and extract data
– How to import data from different source formats into R
– A common way to understand and visualize complex data
Data
• vector
• R basic data-structured vector. A vector stores an ordered set of values, called elements.
• A vector can contain any number of elements. However, all elements must be of the same type, for example, a vector cannot contain both numbers and text.
integer numeric character logical NULL NA
Combine function
Vectors in • R have an inherent order, so their data can be accessed by counting the sequence numbers of each element in the vector, starting from 1
• Factor
• A factor is a special case of a vector that is uniquely used to identify a nominal attribute
• Why not character the character vector?
• Convert a character vector into a factor, only the factor () function should be applied
• List
• A special type of vector-a list that is used to store an ordered set of values
• Lists allow different types of values to be collected
• Use lists to build "objects" for access
• arrays
• Data Frame
• The most important R data structure used in machine learning is the data frame, because it has both row and column data, so it is a structure similar to a spreadsheet or database
• New Parameter stringasfactors = FALSE
• Extract the entire vector data, just as the list extracts an element so simple, by name
• Data frame is a two-dimensional format "[Rows, Columns]" can also extract data
• Matrix
Explore Data
Str ()
Summary ()
table ()
plot ()
Data management
• Save and load R data structures
–save (x, Y, Z, file= "MyData. RData ") –load (" MyData. RData ")
• Import and save data with CSV file
–pt_data <-read.csv ("Pt_data.csv", Stringsasfactors=false) –pt_data <-read.csv ("Pt_data.csv", Stringsasfactors=false, Header=false) –write.csv (Pt_data, file= "Pt_data2.csv")
• Import data from SQL database
–rodbc
# R language via ODBC link MySQL database mydb <-odbcconnect ("localhost", uid= "root", pwd= "123123") sqlTables (mydb) Students <- SQLQuery (MyDB, "select * from Student") students[1,2]temp <-SqlFetch (mydb, "Student", rownames = "id") Odbcclose (MyDB)
Here are a few examples of interactive use of R
Example one:
> Help.start () #启动在线帮助 will open the browser. > x <-rnorm (50); Y <-rnorm (x) #产生两个随机向量x和y > Plot (x, y) #使用x with a two-dimensional scatter plot, a graph window opens > ls () #查看当前工作空间里面的 R object > Rm (x, y) #清除x, Y object >x <-1:20
Example two:
X <-1:20# is equivalent to x = (1, 2, ..., 20). w <-1 + sqrt (x)/2# the ' weight ' vector of the standard deviation. dummy <-data.frame (xx=x, y= x + rnorm (x) *w) #创建一个由x and y consist of a double column data frame dummy #查看dummy对象中的数据. the FM <-lm (y ~ x, data=dummy) #拟合 y to X's simple linear regression summary (FM) #查看分析结果. fm1 <-lm (y ~ x, Data=dummy, weight=1/w^2) #加权回归 Summary (FM1) #查看分析结果. Attach (dummy) #让数据框中的列项可以像一般的变量那样使用. LRF <-lowess (x, y) #做一个非参局部回归. plot (x, y) #标准散点图. lines (x, lrf$y) #增加局部回归曲线. abline (0, 1, lty=3) #真正的回归曲线: (Intercept 0, slope 1). Abline (COEF (FM)) #无权重回归曲线. Abline (Coef (FM1), col = "Red") #加权回归曲线. Detach () #将数据框从搜索路径中去除. Plot (fitted (FM), RESID (FM), xlab= "fitted values", ylab= "residuals", main= "residuals vs fitted") A standard regression diagnostic diagram that examines variance (heteroscedasticity). Qqnorm (RESID (FM), main= "residuals rankit Plot") #用正态分值图检验数据的偏度 (skewness), Kurtosis (kurtosis), and outliers (outlier). RM (FM, FM1, LRF, X, dummy) #再次清空.
Example three: Classical experiments of Michaelson and Morley measuring the speed of light
FilePath <-system.file ("Data", "Morley.tab", package= "Datasets") #从对象 Morley to get the file path of the experimental data filepath# view the file path file.show (filepath) #查看文件内容 mm <-read.table (filepath) #以数据框的形式读取数据 mm$expt <-factor (MM$EXPT) Mm$run <-factor (Mm$run) #将 EXPT and Run change to a factor. Attach (mm) #让数据在位置 3 (the default) is visible (that is, it can be accessed directly). Plot (EXPT, speed, main= "speed of Light Data", xlab= "experiment No.") #用简单的盒状图比较五次实验. FM <-AOV (speed ~ Run + expt, data=mm) #分析随机区组, ' runs ' and ' experiments ' as a factor. Summary (FM) FM0 <-update (FM,. ~-Run) ANOVA (FM0, FM) #拟合忽略 ' runs ' sub-model, and variance analysis before and after the model change. Detach () RM (FM, FM0) #在进行下面工作前 to clear the data. #下面是等高线和影像显示的示例 x <-seq (-pi, pi, len=50) #x is a vector of 50 elements of equal spacing within the interval [-pi\, pi], y <-x F <-outer (x, Y, function (x, y) cos (y)/(1 + x^2)) #f is a square and the rows are indexed by x and y respectively, and the corresponding value is the result of the function cos (y)/(1 + x^2). Oldpar <-par (no.readonly = TRUE) par (pty= "s") #保存图形参数, set the graphics area to "square". Contour (x, y, f) contour (x, Y, F, nlevels=15, add=true) #绘制 the contours of f; add some curves to show the details. FA <-(F-t (f))/2#FA is the "asymmetric part" of f (t () is the transpose function). Contour (x, Y, FA, nlevels=15) #画等高线 par (OLDPAR) # restores the original graphics parameters image (x, Y, f) image (X, y, FA) #绘制一些高密度的影像显示 objects (); RM (x, Y, F, FA) #在继续下一步前 to clear the data. th <-seq (-pi, pi, len=100) z <-exp (1i*th) #1i represents the complex I par (pty= "s") plot (Z, type= "L") #图形参数是复数时, which represents the imaginary part to the real part of the drawing. This could be a circle. W <-rnorm (+) + rnorm (+) *1i# Suppose we want to randomly sample inside this circle. One method is to let the imaginary and real values of the complex numbers be standard normal random numbers ... w <-ifelse (Mod (W) > 1, 1/w, W) #将圆外的点映射成它们的倒数. Plot (W, Xlim=c ( -1,1), Ylim=c ( -1,1), pch= "+", xlab= "x", ylab= "Y") lines (z) #所有的点都在圆中, but the distribution is not uniform. #下面采用均匀分布. The points in the disc now look even more evenly. W <-sqrt (runif) *exp (2*pi*runif (+) *1i) plot (W, Xlim=c ( -1,1), Ylim=c ( -1,1), pch= "+", xlab= "x", ylab= "Y") lines (z) rm (th, W, z) #再次清空. Q () #离开 R Program
Example turns from: http://developer.51cto.com/art/201305/393121.htm
Machine learning 1, R language