Machine learning 1, R language

Source: Internet
Author: User
Tags cos

R language

R is the language and operating environment for statistical analysis, plotting. R is a free, free, source-code software that belongs to the GNU system and is an excellent tool for statistical computing and statistical mapping.

Feature Introduction
• Mainly used for statistical analysis, drawing, data mining
The • R has a number of statistical and digital analysis functions built into it. The functions of R can also be enhanced by the installation package (Packages, user-authored features).
• Because of the lineage of S, R has a stronger object-oriented (object-oriented programming) function than other programming languages for statistical or mathematical purpose

Official website:

Other introduction
Another strength of • R is the drawing function, which has the qualities of printing, and the addition of mathematical symbols.
• Although r is primarily used for statistical analysis or development of statistical-related software, it is also used as a matrix calculation. The analysis speed is comparable to the free software GNU octave and business software MATLAB dedicated to matrix computing.
spss-Another statistical analysis software
SAS System-another statistical analysis software

• R official website Download 3.1.3

In the installation of the R language environment, the installation directory is best not to include spaces, special characters, Chinese characters, etc., or when installing Rstudio, may not find the address of the R operating environment;

IDE Development environment using Rstudio

After the installation is complete, the rstudio: the following, a very lightweight tool:


Configure Rstudio space and default encoding, tool-"Global Options ...


Management and understanding of data
• After completing this chapter, you will understand:
– Basic R data structures and how these data structures are used to store and extract data
– How to import data from different source formats into R
– A common way to understand and visualize complex data


• vector

• R basic data-structured vector. A vector stores an ordered set of values, called elements.
• A vector can contain any number of elements. However, all elements must be of the same type, for example, a vector cannot contain both numbers and text.
integer numeric character logical NULL NA
Combine function
Vectors in • R have an inherent order, so their data can be accessed by counting the sequence numbers of each element in the vector, starting from 1

• Factor

• A factor is a special case of a vector that is uniquely used to identify a nominal attribute
• Why not character the character vector?
• Convert a character vector into a factor, only the factor () function should be applied

• List

• A special type of vector-a list that is used to store an ordered set of values
• Lists allow different types of values to be collected
• Use lists to build "objects" for access

• arrays
• Data Frame

• The most important R data structure used in machine learning is the data frame, because it has both row and column data, so it is a structure similar to a spreadsheet or database
• New Parameter stringasfactors = FALSE
• Extract the entire vector data, just as the list extracts an element so simple, by name
• Data frame is a two-dimensional format "[Rows, Columns]" can also extract data

• Matrix

Explore Data
Str ()
Summary ()
table ()
plot ()

Data management
• Save and load R data structures

–save (x, Y, Z, file= "MyData. RData ") –load (" MyData. RData ")

 • Import and save data with CSV file

–pt_data <-read.csv ("Pt_data.csv", Stringsasfactors=false) –pt_data <-read.csv ("Pt_data.csv", Stringsasfactors=false, Header=false) –write.csv (Pt_data, file= "Pt_data2.csv")

  • Import data from SQL database


# R language via ODBC link MySQL database mydb <-odbcconnect ("localhost", uid= "root", pwd= "123123") sqlTables (mydb) Students <- SQLQuery (MyDB, "select * from Student") students[1,2]temp <-SqlFetch (mydb, "Student", rownames = "id") Odbcclose (MyDB)


Here are a few examples of interactive use of R

Example one:

> Help.start () #启动在线帮助 will open the browser. > x <-rnorm (50);  Y <-rnorm (x)  #产生两个随机向量x和y > Plot (x, y) #使用x with a two-dimensional scatter plot, a graph window opens > ls () #查看当前工作空间里面的 R object > Rm (x, y) #清除x, Y object >x <-1:20  

 Example two:

X <-1:20# is equivalent to x = (1, 2, ..., 20).  w <-1 + sqrt (x)/2# the ' weight ' vector of the standard deviation.  dummy <-data.frame (xx=x, y= x + rnorm (x) *w) #创建一个由x and y consist of a double column data frame dummy #查看dummy对象中的数据.  the FM <-lm (y ~ x, data=dummy) #拟合 y to X's simple linear regression summary (FM) #查看分析结果.  fm1 <-lm (y ~ x, Data=dummy, weight=1/w^2) #加权回归 Summary (FM1) #查看分析结果. Attach (dummy) #让数据框中的列项可以像一般的变量那样使用.  LRF <-lowess (x, y) #做一个非参局部回归.  plot (x, y) #标准散点图.  lines (x, lrf$y) #增加局部回归曲线.  abline (0, 1, lty=3) #真正的回归曲线: (Intercept 0, slope 1).  Abline (COEF (FM)) #无权重回归曲线.  Abline (Coef (FM1), col = "Red") #加权回归曲线. Detach () #将数据框从搜索路径中去除. Plot (fitted (FM), RESID (FM), xlab= "fitted values", ylab= "residuals", main= "residuals vs fitted") A standard regression diagnostic diagram that examines variance (heteroscedasticity). Qqnorm (RESID (FM), main= "residuals rankit Plot") #用正态分值图检验数据的偏度 (skewness), Kurtosis (kurtosis), and outliers (outlier). RM (FM, FM1, LRF, X, dummy) #再次清空.

Example three: Classical experiments of Michaelson and Morley measuring the speed of light

FilePath <-system.file ("Data", "", package= "Datasets") #从对象 Morley to get the file path of the experimental data filepath# view the file path (filepath) #查看文件内容 mm <-read.table (filepath) #以数据框的形式读取数据 mm$expt <-factor (MM$EXPT) Mm$run <-factor (Mm$run) #将  EXPT and Run change to a factor.  Attach (mm) #让数据在位置 3 (the default) is visible (that is, it can be accessed directly). Plot (EXPT, speed, main= "speed of Light Data", xlab= "experiment No.")  #用简单的盒状图比较五次实验.  FM <-AOV (speed ~ Run + expt, data=mm) #分析随机区组, ' runs ' and ' experiments ' as a factor.  Summary (FM) FM0 <-update (FM,. ~-Run) ANOVA (FM0, FM) #拟合忽略 ' runs ' sub-model, and variance analysis before and after the model change.  Detach () RM (FM, FM0) #在进行下面工作前 to clear the data. #下面是等高线和影像显示的示例 x <-seq (-pi, pi, len=50) #x is a vector of 50 elements of equal spacing within the interval [-pi\, pi], y <-x F <-outer (x, Y, function (x,   y) cos (y)/(1 + x^2)) #f is a square and the rows are indexed by x and y respectively, and the corresponding value is the result of the function cos (y)/(1 + x^2).  Oldpar <-par (no.readonly = TRUE) par (pty= "s") #保存图形参数, set the graphics area to "square".  Contour (x, y, f) contour (x, Y, F, nlevels=15, add=true) #绘制 the contours of f; add some curves to show the details.  FA <-(F-t (f))/2#FA is the "asymmetric part" of f (t () is the transpose function). Contour (x, Y, FA, nlevels=15) #画等高线 par (OLDPAR) # restores the original graphics parameters image (x, Y, f) image (X, y, FA) #绘制一些高密度的影像显示 objects ();  RM (x, Y, F, FA) #在继续下一步前 to clear the data. th <-seq (-pi, pi, len=100) z <-exp (1i*th) #1i represents the complex I par (pty= "s") plot (Z, type= "L") #图形参数是复数时, which represents the imaginary part to the real part of the drawing.  This could be a circle. W <-rnorm (+) + rnorm (+) *1i# Suppose we want to randomly sample inside this circle.  One method is to let the imaginary and real values of the complex numbers be standard normal random numbers ... w <-ifelse (Mod (W) > 1, 1/w, W) #将圆外的点映射成它们的倒数.      Plot (W, Xlim=c ( -1,1), Ylim=c ( -1,1), pch= "+", xlab= "x", ylab= "Y") lines (z) #所有的点都在圆中, but the distribution is not uniform. #下面采用均匀分布.   The points in the disc now look even more evenly. W <-sqrt (runif) *exp (2*pi*runif (+) *1i) plot (W, Xlim=c ( -1,1), Ylim=c ( -1,1), pch= "+", xlab= "x", ylab= "Y") lines  (z) rm (th, W, z) #再次清空.  Q () #离开 R Program

Example turns from:

Machine learning 1, R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.