R language

R is the language and operating environment for statistical analysis, plotting. R is a free, free, source-code software that belongs to the GNU system and is an excellent tool for statistical computing and statistical mapping.

**Feature Introduction**

• Mainly used for statistical analysis, drawing, data mining

The • R has a number of statistical and digital analysis functions built into it. The functions of R can also be enhanced by the installation package (Packages, user-authored features).

• Because of the lineage of S, R has a stronger object-oriented (object-oriented programming) function than other programming languages for statistical or mathematical purpose

Official website: http://cran.r-project.org/

**Other introduction**

Another strength of • R is the drawing function, which has the qualities of printing, and the addition of mathematical symbols.

• Although r is primarily used for statistical analysis or development of statistical-related software, it is also used as a matrix calculation. The analysis speed is comparable to the free software GNU octave and business software MATLAB dedicated to matrix computing.

spss-Another statistical analysis software

SAS System-another statistical analysis software

**Installation**

• R official website Download 3.1.3

In the installation of the R language environment, the installation directory is best not to include spaces, special characters, Chinese characters, etc., or when installing Rstudio, may not find the address of the R operating environment;

IDE Development environment using Rstudio

After the installation is complete, the rstudio:https://www.rstudio.com/products/rstudio/download/is the following, a very lightweight tool:

Configure Rstudio space and default encoding, tool-"Global Options ...

**Management and understanding of data**

• After completing this chapter, you will understand:

– Basic R data structures and how these data structures are used to store and extract data

– How to import data from different source formats into R

– A common way to understand and visualize complex data

**Data**

• vector

• R basic data-structured vector. A vector stores an ordered set of values, called elements.

• A vector can contain any number of elements. However, all elements must be of the same type, for example, a vector cannot contain both numbers and text.

integer numeric character logical NULL NA

Combine function

Vectors in • R have an inherent order, so their data can be accessed by counting the sequence numbers of each element in the vector, starting from 1

• Factor

• A factor is a special case of a vector that is uniquely used to identify a nominal attribute

• Why not character the character vector?

• Convert a character vector into a factor, only the factor () function should be applied

• List

• A special type of vector-a list that is used to store an ordered set of values

• Lists allow different types of values to be collected

• Use lists to build "objects" for access

• arrays

• Data Frame

• The most important R data structure used in machine learning is the data frame, because it has both row and column data, so it is a structure similar to a spreadsheet or database

• New Parameter stringasfactors = FALSE

• Extract the entire vector data, just as the list extracts an element so simple, by name

• Data frame is a two-dimensional format "[Rows, Columns]" can also extract data

• Matrix

**Explore Data**

Str ()

Summary ()

table ()

plot ()

**Data management**

• Save and load R data structures

–save (x, Y, Z, file= "MyData. RData ") –load (" MyData. RData ")

• Import and save data with CSV file

–pt_data <-read.csv ("Pt_data.csv", Stringsasfactors=false) –pt_data <-read.csv ("Pt_data.csv", Stringsasfactors=false, Header=false) –write.csv (Pt_data, file= "Pt_data2.csv")

• Import data from SQL database

–rodbc

# R language via ODBC link MySQL database mydb <-odbcconnect ("localhost", uid= "root", pwd= "123123") sqlTables (mydb) Students <- SQLQuery (MyDB, "select * from Student") students[1,2]temp <-SqlFetch (mydb, "Student", rownames = "id") Odbcclose (MyDB)

**Here are a few examples of interactive use of R**

**Example one:**

> Help.start () #启动在线帮助 will open the browser. > x <-rnorm (50); Y <-rnorm (x) #产生两个随机向量x和y > Plot (x, y) #使用x with a two-dimensional scatter plot, a graph window opens > ls () #查看当前工作空间里面的 R object > Rm (x, y) #清除x, Y object >x <-1:20

**Example two:**

X <-1:20# is equivalent to x = (1, 2, ..., 20). w <-1 + sqrt (x)/2# the ' weight ' vector of the standard deviation. dummy <-data.frame (xx=x, y= x + rnorm (x) *w) #创建一个由x and y consist of a double column data frame dummy #查看dummy对象中的数据. the FM <-lm (y ~ x, data=dummy) #拟合 y to X's simple linear regression summary (FM) #查看分析结果. fm1 <-lm (y ~ x, Data=dummy, weight=1/w^2) #加权回归 Summary (FM1) #查看分析结果. Attach (dummy) #让数据框中的列项可以像一般的变量那样使用. LRF <-lowess (x, y) #做一个非参局部回归. plot (x, y) #标准散点图. lines (x, lrf$y) #增加局部回归曲线. abline (0, 1, lty=3) #真正的回归曲线: (Intercept 0, slope 1). Abline (COEF (FM)) #无权重回归曲线. Abline (Coef (FM1), col = "Red") #加权回归曲线. Detach () #将数据框从搜索路径中去除. Plot (fitted (FM), RESID (FM), xlab= "fitted values", ylab= "residuals", main= "residuals vs fitted") A standard regression diagnostic diagram that examines variance (heteroscedasticity). Qqnorm (RESID (FM), main= "residuals rankit Plot") #用正态分值图检验数据的偏度 (skewness), Kurtosis (kurtosis), and outliers (outlier). RM (FM, FM1, LRF, X, dummy) #再次清空.

**Example three: Classical experiments of Michaelson and Morley measuring the speed of light**

FilePath <-system.file ("Data", "Morley.tab", package= "Datasets") #从对象 Morley to get the file path of the experimental data filepath# view the file path file.show (filepath) #查看文件内容 mm <-read.table (filepath) #以数据框的形式读取数据 mm$expt <-factor (MM$EXPT) Mm$run <-factor (Mm$run) #将 EXPT and Run change to a factor. Attach (mm) #让数据在位置 3 (the default) is visible (that is, it can be accessed directly). Plot (EXPT, speed, main= "speed of Light Data", xlab= "experiment No.") #用简单的盒状图比较五次实验. FM <-AOV (speed ~ Run + expt, data=mm) #分析随机区组, ' runs ' and ' experiments ' as a factor. Summary (FM) FM0 <-update (FM,. ~-Run) ANOVA (FM0, FM) #拟合忽略 ' runs ' sub-model, and variance analysis before and after the model change. Detach () RM (FM, FM0) #在进行下面工作前 to clear the data. #下面是等高线和影像显示的示例 x <-seq (-pi, pi, len=50) #x is a vector of 50 elements of equal spacing within the interval [-pi\, pi], y <-x F <-outer (x, Y, function (x, y) cos (y)/(1 + x^2)) #f is a square and the rows are indexed by x and y respectively, and the corresponding value is the result of the function cos (y)/(1 + x^2). Oldpar <-par (no.readonly = TRUE) par (pty= "s") #保存图形参数, set the graphics area to "square". Contour (x, y, f) contour (x, Y, F, nlevels=15, add=true) #绘制 the contours of f; add some curves to show the details. FA <-(F-t (f))/2#FA is the "asymmetric part" of f (t () is the transpose function). Contour (x, Y, FA, nlevels=15) #画等高线 par (OLDPAR) # restores the original graphics parameters image (x, Y, f) image (X, y, FA) #绘制一些高密度的影像显示 objects (); RM (x, Y, F, FA) #在继续下一步前 to clear the data. th <-seq (-pi, pi, len=100) z <-exp (1i*th) #1i represents the complex I par (pty= "s") plot (Z, type= "L") #图形参数是复数时, which represents the imaginary part to the real part of the drawing. This could be a circle. W <-rnorm (+) + rnorm (+) *1i# Suppose we want to randomly sample inside this circle. One method is to let the imaginary and real values of the complex numbers be standard normal random numbers ... w <-ifelse (Mod (W) > 1, 1/w, W) #将圆外的点映射成它们的倒数. Plot (W, Xlim=c ( -1,1), Ylim=c ( -1,1), pch= "+", xlab= "x", ylab= "Y") lines (z) #所有的点都在圆中, but the distribution is not uniform. #下面采用均匀分布. The points in the disc now look even more evenly. W <-sqrt (runif) *exp (2*pi*runif (+) *1i) plot (W, Xlim=c ( -1,1), Ylim=c ( -1,1), pch= "+", xlab= "x", ylab= "Y") lines (z) rm (th, W, z) #再次清空. Q () #离开 R Program

Example turns from: http://developer.51cto.com/art/201305/393121.htm

Machine learning 1, R language