"Translation" A (very) short Introduction to R R Introduction

Source: Internet
Author: User

Objective

This article is translated from Paul Torfs & Claudia Brauer's article A (very) short introduction to R. One of the more simple places without translation, not good in Chinese description of the place is not translated.

1. Introduction and Installation

The R language is a powerful language for data calculation and icon making. It is recommended that beginners use the integrated development environment Rstudio. Install R and Rstudio part will not write, search on the internet can be.

2. Rstudio interface

The bottom left is the console window, also called the Command Line window, you can enter a simple command after >, R will execute your command. This window is very important because this is where R executes.

The upper left is the edit window, also called the Script window. This part is used for programming, can write continuous instructions. If this window is not open, you can click File->new->r script to open it. If you want to run the edit Window command, click Run, or press Ctrl+enter.

The upper right is the workspace/History window. In the workspace window, you can see the various data and values that are stored in R, and click to view and change these values. The History window records the instructions you entered earlier.

The bottom right is the file/icon/package/Help window. Here you can open the file, view the chart (including the previous), install and load the package, and use the Help feature.

3. Working directory

Set your own working directory, which is where the file is saved.

To create a working directory on the command line:

>SETWD ("m:/hydrology/r/")

It can also be set in Rstudio by tools->set working directory.

4. Library

A lot of data can be statistically analyzed using packages or libraries.

Install package: (take geometry as an example) click Install Packages, enter geometry, or enter install.packages ("geometry") in the command window.

Load Package: Enter the library ("geometry") in the command window.

5. Example of the R command

(1) Calculation

Input:

>10^2 + 36

Get the answer:

[1] 136

Exercise: Subtract 2016 from the year you started studying in this school, divided by 2016 minus the year you were born, multiplied by 100, to get the percentage of life you've spent in this school. Use parentheses when needed.

If you add an opening parenthesis and you forget to add a closing parenthesis, then > will become +, and if you want to exit the operation state, press ESC.

(2) Working space

You can give a number a name, it becomes a variable, can be used again after a while. For example:

> A = 4

R will remember the value of a. You can ask what the value of R,a is.

> A

[1] 4

Or an operation with a:

> A * 5

[1] 20

If you redefine a,r, you will forget the original value, but remember the new value.

> A = a + 10

> A

[1] 14

If you want to remove the variable from the storage of R:

>RM (List=ls ())

Or click Clear all on the workspace window. If you only want to remove the variable, enter RM (a).

Note that the name must begin with a letter.

(3) scalar, vector, and matrix

A scalar is a number, 0 dimensions; a vector is a one-dimensional array; a matrix is a two-dimensional array.

Defining a vector, using function C, is shorthand for concatenate:

>b = C (3,4,5)

(4) function

If you want to count the averages in vector b, you can write:

> (3+4+5)/3

But if the vector is very long, it's too tedious to write, so you can use the function. You can use functions in R, or you can write them yourself.

> Mean (x=b)

The parentheses are parameters that provide additional information to the function. X indicates that the average function needs to manipulate the vector to be B. You can also omit X, written as mean (b).

Practice: First 4,5,8,11 into a vector, and then computes its sum with the sums function.

One more example: The Rnorm function, which can generate random samples from a common distribution. Enter the following code and press ENTER, and you will get 10 random numbers.

>rnorm (10)

[1]-0.949 1.342-0.474 0.403

[5] -0.091-0.379 1.015 0.740

[9]-0.639 0.950

The first line of Rnorm is a function, and 10 is a parameter that determines how many random numbers are generated. The following three lines are the result, generating 10 random numbers and generating a 10-length vector.

A new 10 random number will be generated when you re-enter it again. You can use the UP arrow to restore the previous instruction. If you want to get 10 random numbers from a normal distribution with a mean of 1.2 standard deviation of 3.4, you can enter:

>rnorm (Ten, mean=1.2, sd=3.4)

You can see that the rnorm has three parameters, you can give only the first parameter, the latter two parameters will be replaced by default values. Rstudio automatically displays the parameter information when you enter the Rnorm.

5. Charts

R can generate graphs. A simple example:

> x = rnorm (100)

> Plot (x)

The result will generate this graph:

 

Exercise: Generate a chart of 100 random numbers.

6. Help and documentation

Input:

>help (Rnorm)

Can get a description of the function rnorm, including parameters and default values, and so on. Input:

> Example (rnorm)

Can get some examples of use of rnorm. Input:

>help.start ()

You can get a Help overview based on HTML writing.

When you enter the function name and an opening parenthesis, press the TAB key to get the parameter information for the function, see.

Other Useful Links:

Http://cran.r-project.org/doc/manuals/R-intro.pdf a complete manual

Http://cran.r-project.org/doc/contrib/Short-refcard.pdf a short reference document

Http://zoonek2.free.fr/UNIX/48_R/all. HTML contains a wealth of examples

http://www.statmethods.net/is also called quick-r, which provides efficient help.

http://mathesaurus.sourceforge.net/A dictionary of programming languages.

Using Google search is also quite efficient.

Exercise: See Help for the SQRT function.

7. Scripts

R is a python-like interpreted language. You can make commands directly in the console. You can also store commands in files, which are called scripts, and the extensions of these files are generally. R, like Foo. R. You can edit the file by clicking File->new->open file to open the edit window.

Select the part you want to perform, then press Ctrl+enter or click Run to partially execute the code. If you do not select, the program executes from the line where the cursor is resting. The command to execute all the code is:

> Source ("foo. R ")

You can also click Run All, or press Ctrl+shift+s to execute all the code.

Exercise: Create a name named Firstscript. R file, instruction to generate 100 random numbers, and display with a chart. Execute this script multiple times.

8. Data structure

(1) Vector

Use the function C () to construct the vector:

> VEC1 = C (1,4,6,8,10)

> VEC1

[1] 1 4 6 8 10

Use [i] to specify the values in the vector:

> Vec1[5]

[1] 10

You can replace the value at the specified location:

> vec1[3] = 12

> VEC1

[1] 1 4 12 8 10

Another way to construct vectors is to use the SEQ () function:

> vec2 = seq (from=0, to=1, by=0.25)

> VEC2

[1] 0.00 0.25 0.50 0.75 1.00

There are many vector-based computational functions in R. If you add two vectors of the same length, the elements are added:

> VEC1 + vec2

[1] 1.00 4.25 12.50 8.75 11.00

(2) matrix

Define the matrix with the function matrix.

> Mat=matrix (Data=c (9,2,3,4,5,6), ncol=3)

> Mat

[, 1] [, 2] [, 3]

[1,]9 3 5

[2,]2 4 6

The parameter data represents the number that appears in the matrix. Ncol defines the number of columns, or you can use Nrow to define the number of rows.

Exercise: Put 31-60 of these numbers in a vector called p, and then put it in a matrix Q of 6 rows and 5 columns. Tip: Use the SEQ function.

The operation of a matrix is similar to a vector, specifying [row, column] to represent elements in a matrix.

>mat[1,2]

[1] 3

Specify the entire row:

>mat[2,]

[1] 2 4 6

A function that takes a matrix as a parameter.

> Mean (MAT)

[1] 4.8333

(3) Data frame

A data frame is a matrix, but unlike a matrix, each of its columns is named, so sometimes you can use one of these values without knowing exactly where it is:

> t = data.frame (x = C (11,12,14), y = C (19,20,21), z = C (10,9,7))

> t

X y Z

1 11 20 10

2 12 20 9

3 14 21 7

Two methods for averaging operations by using Z-columns:

> Mean (t$z)

[1] 8.666667

> Mean (t[["z"])

[1] 8.666667

Exercise: Write a script file that creates 3 random number vectors, each with a length of 100, named X1,x2 and X3. Create 1 data frames, named T, where the vector is a,b,c, where a=x1,b=x1+x2,c=x1+x2+x3. Call the following functions: Plot (T) and SD (t). Can you understand the result?

(4) List

Unlike matrices and data frames, a list can have a different column length.

> L = List (one=1, two=c), Five=seq (0,1,length=5))

> L

$one

[1] 1

$two

[1] 1 2

$five

[1] 0.00 0.25 0.50 0.75 1.00

You can show which columns are in L:

> Names (L)

[1] "one" "" "" Five "

You can also use the number inside:

>l$five + 10

[1] 10.00 10.25 10.50 10.75 11.00

9. Charts

Simple graph Generation:

> Plot (Rnorm, type= "1", col= "gold")

This command generates 100 random numbers, represented on the graph, and connects each point. Type=l means connecting points in a straight line. Col indicates that the color of the line is gold.

Another example of a histogram:

>hist (Rnorm (100))

Practice: Use the following command, in the previous trainee into the structure of their own through the experiment to find out what RGB means, the RGB parameter is what the meaning of Lwd,pch,cex respectively.

>plot (t$a, type= "L", Ylim=range (t), lwd=3, Col=rgb (1,0,0,0.3))

>lines (t$b, type= "s", lwd=2, Col=rgb (0.3, 0.4, 0.3, 0.9))

>points (T$c, pch=20, cex=4, Col=rgb (0,0,1,0.3))

To learn more about the chart, enter help (PAR). Google "R color chart", you can get a color selection of a PDF file. Click Export in the Chart window to select the best height and width, then click Copy or Save.

10. Read and Write files

There are many ways to read and write files, only one is introduced here. First set up a data frame D:

> d = data.frame (A = C (3,4,5), B = C (12,43,54))

> D

A b

1 3 12

2 4 43

3 5 54

>write.table (d, file= "Tst0.txt", Row.names=false)

The data frame d is written to the file Tst0.txt, and the parameter row.names=false indicates that the row name is not written to the file because the row name is not important, just some numbers.

> d2 = read.table (file= "Tst0.txt", Header=true)

> d2

A b

1 3 12

2 4 43

3 5 54

Use the Read.table function to write the data in the file to D2.

Exercise: Create file Tst1.txt with data. Reads and multiplies the value in the column named G by 5, which is deposited into the file tst2.txt.

11. Data that cannot be obtained

Exercise: Calculate the average of the square root of a vector with 100 random numbers. What's going to happen?

When a data cannot be obtained, it is denoted by na:

> j = C (1,2,na)

There is no conventional calculation for J. Like what:

> Max (j)

[1] NA

The maximum value is not calculated.

If it is necessary to calculate, using the parameter na.rm=true, the meaning is probably to ignore the value of Na:

>max (J, Na.rm=true)

[1] 2

12. Class

All the previous touches are numbers, and sometimes you might want to deal with more than just a few data, such as a name or a data file. There are three types in R: Numeric,character and POSIX.

(1) Characters characters

Defines a string that needs to be enclosed in double quotation marks.

> m = "Apples"

>m

[1] "apples"

> n = Pears

Error:object ' pears ' not found

It is also not possible to use strings for mathematical operations.

> m+2

Error in M + 2:non-numeric argument to binary operator

(2) Date

Date and time are more complex. Using the Strptime function is the simplest way to tell the R language time:

> Data1=strptime (C ("20100225230000", "20100226000000", "20100226010000"), format= "%y%m%d%h%m%s")

> Date1

[1] "2010-02-25 23:00:00"

[2] "2010-02-26 00:00:00"

[3] "2010-02-26 01:00:00"

First use the C () function to create a vector, remember to use double quotes, because strptime requires a string as input. The format parameter determines the formatting of the read-in time. The year, month, day, time, minute, and second in turn.

Exercise: Generate a graph with x-axis for today, 2014 Saint Nicholas Day, your birthday. The y-axis indicates the number of gifts you want to receive on these days.

13. Programming Tools

If you need to make a large program, you might use some programming statements:

(1) If statement

> w = 3

>if (w< 5)

{

d=2

}else{

d=10

}

> D

2

Learned the programming of all understand not detailed said.

can also be used to define special conditions:

> A = C (1,2,3,4)

> b = C (5,6,7,8)

> f = a[b==5 | b==8]

>f

[1] 1 4

Note the double equals symbol. There are other symbols, such as <, >,! =, <=, >=. If you want to validate more than one condition, use & to represent or.

(2) For loop

Define the number of times and the actions you want to make.

> h = seq (from=1, to=8)

> s = C ()

> For (i-2:10)

{

S[i] = h[i] * 10

}

> S

[1] na-A-na-NA

First define a vector h. Then create an empty vector s. The purpose of the For loop is to multiply 2 to 10 elements by 10 and then put in S.

Exercise: Create a vector from 1 to 100, traverse the entire vector with a for loop, and multiply the value of 5 and the value of 90 by 10. The other values are multiplied by 0.1.

(3) Write your own function

> func1 = function (arg1, arg2)

{

W = arg1^2

Return (ARG2+W)

}

> func1 (arg1 = 3, arg2 = 5)

[1] 14

Exercise: Write the previous exercise as a function and use a for loop in the function. You can use the length function to define the range of loops.

14. Some useful references

(1) function

Some of the functions mentioned in the R Reference Card:

A) data creation

read.table: Reads a file from the file. Parameter: Header=true: reads the first line as a column name; Sep= ",": data separated by commas; Skip=n: The first n rows are not read.

write.table: Writes a table to a file.

C: Sets up a number to generate a vector.

array: Create a vector, parameter: dim:length

Matrix: Establish a matrix, parameters: Ncol and/or nrow:number of Rows/columns

data.frame: Create a data frame

list: Create a list

Rbind and Cbind: combines two vectors into a matrix by row or column

b) Extracting data

X[n]: nth element in a vector

X[m:n]: the first m to the nth element

x[c (k,m,n)]: element at a specific location

X[x>m & X<n]: elements between M and N

x$n: An element named N in a list or data frame

x[["n"]: Ibid.

[i,j]: element of column I, section J

[i,]: line I in the matrix

c) Information on variables

Length: Length of the Matrix

Ncol or Nrow: A column or row number in a matrix

class: Class of variables

names: The name of an object in the list

print: Displaying variables or strings on the screen

return: Used in functions to return variables

is.na: Determine if the variable is NA

as.numeric or As.character: Changes a class to a number or string

Strptime: Converts the class of a string to time (POSIX)

D) Statistics

sum: The and of a vector or matrix element

mean: average of vectors

• SD: Standard deviation of vectors

max or min: Maximum or minimum element

rowsums (or Rowmeans, Colsums and Colmeans): the and/or average of each row/column in The matrix. The result is a vector.

quantile (X,c (0.1,0.5)): Sample the 0.1 and 0.5th quantiles of vector x

e) Data Processing

seq: to build vectors evenly (e.g. from1to100)

rnorm: Creating a random number vector based on normal distribution

sort: Arranging elements in ascending order

• T: Transpose a matrix

Aggregate (X,by=ls (y), fun= "mean"): divides x by y into subsets, computes the average of a subset, and generates a new list.

na.approx:interpolate (in the zoo package). Argument:vector with NAs. Result:vector without

Nas.

cumsum: Cumulative and, as a result, a vector.

rollmean:moving average (in the zoo package)

paste: Glue strings together

substr: Divides a string into several parts

f) Fitting

LM (V1SV2): Linear fit (regression line) between Vector v1 on the y-axis and V2 on the x-axis

NLS (V1sa+b*v2, Start=ls (a=1,b=0)): Nonlinear fit. Should contain equation with variables (here v1 and v2 and parameters (here A and b) with starting values

coef:returns Coe "cients from a fit

Summary:returns all results from a fit

g) Plotting

plot (x): Plot x (y-axis) versus index number (x-axis) in a new window

plot (x, y): Plot y (y-axis) versus X (x-axis) in a new window

image (z/y): Plot z (color scale) versus X (x-axis) and Y (y-axis) in a new window

lines or points:add lines or points to a previous plot

Hist:plot Histogram of the numbers in a vector

barplot:bar plot of vector or data frame

Contour (x, Y, z): Contour plot

Abline:draw Line (segment). Arguments:a,b for intercept A and slope b; or h=y for horizontal line at Y; or v=x for vertical line at x.

Curve:add function to plot. Needs to has a x in the expression. Example:curve (x^2)

Legend:add legend with given symbols (lty or PCH and col) and text (legend) at location

(x= "TopRight")

Axis:add axis. Arguments:side–1=bottom, 2=left, 3=top, 4=right

mtext:add text on axis. Arguments:text (character string) and side

grid:add Grid

par:plotting parameters to be specified before the plots. arguments:e.g. Mfrow=c (1,3)):

Number of figures per page (1 row, 3 columns); New=true:draw plot over previous plot.

h) Plotting parameters

These can added as arguments to plot, lines,image, etc. For help see par.

type: "L" =lines, "P" =points, etc.

col:color– "Blue", "red", etc

lty:line type–1=solid, 2=dashed, etc.

pch:point type–1=circle, 2=triangle, etc.

Main:title-character String

Xlab and Ylab:axis labels–character string

Xlim and Ylim:range of axes–e.g. C (1,10)

log:logarithmic axis– "x", "Y" or "xy"

i) programming

function (arglist) {expr}: function Definition:do expr with list of arguments arglist

if (cond) {EXPR1}ELSE{EXPR2}: If-statement:if cond is true, then expr1, else expr2

for (var in vec) {expr}: For-loop:the counter var runs through the vector VEC and does expr each run

while (cond) {expr}: While-loop:while cond is true, does expr each run

(2) Shortcut keys

Click Help->keyboard shortcuts to view.

crl+enter: Send command of Script window to command window

• Up ARROW or DOWN arrow in command window:previous or next command

ctrl+1, Ctrl+2, etc.: change between the Windows not r-specific, but very useful keyboard shortcuts:

Ctrl+c, ctrl+x and ctrl+v:copy, cut and paste

Alt+tab:change to another program window

• Up, down, left, or right: Move cursor

home or end:move cursor to begin or END of line

page UP or page down:move cursor one page UP or down

shift+ up/down/left/right/home/end/pgup/pgdn:select

(3) Error message

• No such file or directory or cannot change working directory

Make sure the working directory and file name are correct

object ' x ' not found

The variable x is not yet defined, X is defined or double quotation marks indicate that x is a string.

argument ' x ' is missing without default

You do not define the parameter x if the parameter x is mandatory.

+

R is running or you forgot to add a closing parenthesis. Wait, or click}, or click ESC.

unexpected ') ' or unexpected '} ' in '} '

Hit the right parenthesis more.

unexpected ' else ' in ' else '

Put the else a if-statement on the same line as the last bracket of the ' Then '-part:}else{.

missing value where True/false needed

The condition section has a problem, such as (if (x==1)), X is Na?

the condition have length > 1 and only the first element would be used

For example (if (x==1)) if x is a vector it will be an error. Give it a try x[i].

non-numeric argument to binary operator

An attempt was made to perform a non-numeric operation. Try Class () to test the data classes, or use As.numeric () to convert to numbers.

argument is of length zero or replacement is of length zero

Variable for control

"Translation" A (very) short description of introduction to R R

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.