Objective
This article is translated from Paul Torfs & Claudia Brauer's article A (very) short introduction to R. One of the more simple places without translation, not good in Chinese description of the place is not translated.
1. Introduction and Installation
The R language is a powerful language for data calculation and icon making. It is recommended that beginners use the integrated development environment Rstudio. Install R and Rstudio part will not write, search on the internet can be.
2. Rstudio interface
The bottom left is the console window, also called the Command Line window, you can enter a simple command after >, R will execute your command. This window is very important because this is where R executes.
The upper left is the edit window, also called the Script window. This part is used for programming, can write continuous instructions. If this window is not open, you can click File->new->r script to open it. If you want to run the edit Window command, click Run, or press Ctrl+enter.
The upper right is the workspace/History window. In the workspace window, you can see the various data and values that are stored in R, and click to view and change these values. The History window records the instructions you entered earlier.
The bottom right is the file/icon/package/Help window. Here you can open the file, view the chart (including the previous), install and load the package, and use the Help feature.
3. Working directory
Set your own working directory, which is where the file is saved.
To create a working directory on the command line:
>SETWD ("m:/hydrology/r/")
It can also be set in Rstudio by tools->set working directory.
4. Library
A lot of data can be statistically analyzed using packages or libraries.
Install package: (take geometry as an example) click Install Packages, enter geometry, or enter install.packages ("geometry") in the command window.
Load Package: Enter the library ("geometry") in the command window.
5. Example of the R command
(1) Calculation
Input:
>10^2 + 36
Get the answer:
[1] 136
Exercise: Subtract 2016 from the year you started studying in this school, divided by 2016 minus the year you were born, multiplied by 100, to get the percentage of life you've spent in this school. Use parentheses when needed.
If you add an opening parenthesis and you forget to add a closing parenthesis, then > will become +, and if you want to exit the operation state, press ESC.
(2) Working space
You can give a number a name, it becomes a variable, can be used again after a while. For example:
> A = 4
R will remember the value of a. You can ask what the value of R,a is.
> A
[1] 4
Or an operation with a:
> A * 5
[1] 20
If you redefine a,r, you will forget the original value, but remember the new value.
> A = a + 10
> A
[1] 14
If you want to remove the variable from the storage of R:
>RM (List=ls ())
Or click Clear all on the workspace window. If you only want to remove the variable, enter RM (a).
Note that the name must begin with a letter.
(3) scalar, vector, and matrix
A scalar is a number, 0 dimensions; a vector is a one-dimensional array; a matrix is a two-dimensional array.
Defining a vector, using function C, is shorthand for concatenate:
>b = C (3,4,5)
(4) function
If you want to count the averages in vector b, you can write:
> (3+4+5)/3
But if the vector is very long, it's too tedious to write, so you can use the function. You can use functions in R, or you can write them yourself.
> Mean (x=b)
The parentheses are parameters that provide additional information to the function. X indicates that the average function needs to manipulate the vector to be B. You can also omit X, written as mean (b).
Practice: First 4,5,8,11 into a vector, and then computes its sum with the sums function.
One more example: The Rnorm function, which can generate random samples from a common distribution. Enter the following code and press ENTER, and you will get 10 random numbers.
>rnorm (10)
[1]-0.949 1.342-0.474 0.403
[5] -0.091-0.379 1.015 0.740
[9]-0.639 0.950
The first line of Rnorm is a function, and 10 is a parameter that determines how many random numbers are generated. The following three lines are the result, generating 10 random numbers and generating a 10-length vector.
A new 10 random number will be generated when you re-enter it again. You can use the UP arrow to restore the previous instruction. If you want to get 10 random numbers from a normal distribution with a mean of 1.2 standard deviation of 3.4, you can enter:
>rnorm (Ten, mean=1.2, sd=3.4)
You can see that the rnorm has three parameters, you can give only the first parameter, the latter two parameters will be replaced by default values. Rstudio automatically displays the parameter information when you enter the Rnorm.
5. Charts
R can generate graphs. A simple example:
> x = rnorm (100)
> Plot (x)
The result will generate this graph:
Exercise: Generate a chart of 100 random numbers.
6. Help and documentation
Input:
>help (Rnorm)
Can get a description of the function rnorm, including parameters and default values, and so on. Input:
> Example (rnorm)
Can get some examples of use of rnorm. Input:
>help.start ()
You can get a Help overview based on HTML writing.
When you enter the function name and an opening parenthesis, press the TAB key to get the parameter information for the function, see.
Other Useful Links:
Http://cran.r-project.org/doc/manuals/R-intro.pdf a complete manual
Http://cran.r-project.org/doc/contrib/Short-refcard.pdf a short reference document
Http://zoonek2.free.fr/UNIX/48_R/all. HTML contains a wealth of examples
http://www.statmethods.net/is also called quick-r, which provides efficient help.
http://mathesaurus.sourceforge.net/A dictionary of programming languages.
Using Google search is also quite efficient.
Exercise: See Help for the SQRT function.
7. Scripts
R is a python-like interpreted language. You can make commands directly in the console. You can also store commands in files, which are called scripts, and the extensions of these files are generally. R, like Foo. R. You can edit the file by clicking File->new->open file to open the edit window.
Select the part you want to perform, then press Ctrl+enter or click Run to partially execute the code. If you do not select, the program executes from the line where the cursor is resting. The command to execute all the code is:
> Source ("foo. R ")
You can also click Run All, or press Ctrl+shift+s to execute all the code.
Exercise: Create a name named Firstscript. R file, instruction to generate 100 random numbers, and display with a chart. Execute this script multiple times.
8. Data structure
(1) Vector
Use the function C () to construct the vector:
> VEC1 = C (1,4,6,8,10)
> VEC1
[1] 1 4 6 8 10
Use [i] to specify the values in the vector:
> Vec1[5]
[1] 10
You can replace the value at the specified location:
> vec1[3] = 12
> VEC1
[1] 1 4 12 8 10
Another way to construct vectors is to use the SEQ () function:
> vec2 = seq (from=0, to=1, by=0.25)
> VEC2
[1] 0.00 0.25 0.50 0.75 1.00
There are many vector-based computational functions in R. If you add two vectors of the same length, the elements are added:
> VEC1 + vec2
[1] 1.00 4.25 12.50 8.75 11.00
(2) matrix
Define the matrix with the function matrix.
> Mat=matrix (Data=c (9,2,3,4,5,6), ncol=3)
> Mat
[, 1] [, 2] [, 3]
[1,]9 3 5
[2,]2 4 6
The parameter data represents the number that appears in the matrix. Ncol defines the number of columns, or you can use Nrow to define the number of rows.
Exercise: Put 31-60 of these numbers in a vector called p, and then put it in a matrix Q of 6 rows and 5 columns. Tip: Use the SEQ function.
The operation of a matrix is similar to a vector, specifying [row, column] to represent elements in a matrix.
>mat[1,2]
[1] 3
Specify the entire row:
>mat[2,]
[1] 2 4 6
A function that takes a matrix as a parameter.
> Mean (MAT)
[1] 4.8333
(3) Data frame
A data frame is a matrix, but unlike a matrix, each of its columns is named, so sometimes you can use one of these values without knowing exactly where it is:
> t = data.frame (x = C (11,12,14), y = C (19,20,21), z = C (10,9,7))
> t
X y Z
1 11 20 10
2 12 20 9
3 14 21 7
Two methods for averaging operations by using Z-columns:
> Mean (t$z)
[1] 8.666667
> Mean (t[["z"])
[1] 8.666667
Exercise: Write a script file that creates 3 random number vectors, each with a length of 100, named X1,x2 and X3. Create 1 data frames, named T, where the vector is a,b,c, where a=x1,b=x1+x2,c=x1+x2+x3. Call the following functions: Plot (T) and SD (t). Can you understand the result?
(4) List
Unlike matrices and data frames, a list can have a different column length.
> L = List (one=1, two=c), Five=seq (0,1,length=5))
> L
$one
[1] 1
$two
[1] 1 2
$five
[1] 0.00 0.25 0.50 0.75 1.00
You can show which columns are in L:
> Names (L)
[1] "one" "" "" Five "
You can also use the number inside:
>l$five + 10
[1] 10.00 10.25 10.50 10.75 11.00
9. Charts
Simple graph Generation:
> Plot (Rnorm, type= "1", col= "gold")
This command generates 100 random numbers, represented on the graph, and connects each point. Type=l means connecting points in a straight line. Col indicates that the color of the line is gold.
Another example of a histogram:
>hist (Rnorm (100))
Practice: Use the following command, in the previous trainee into the structure of their own through the experiment to find out what RGB means, the RGB parameter is what the meaning of Lwd,pch,cex respectively.
>plot (t$a, type= "L", Ylim=range (t), lwd=3, Col=rgb (1,0,0,0.3))
>lines (t$b, type= "s", lwd=2, Col=rgb (0.3, 0.4, 0.3, 0.9))
>points (T$c, pch=20, cex=4, Col=rgb (0,0,1,0.3))
To learn more about the chart, enter help (PAR). Google "R color chart", you can get a color selection of a PDF file. Click Export in the Chart window to select the best height and width, then click Copy or Save.
10. Read and Write files
There are many ways to read and write files, only one is introduced here. First set up a data frame D:
> d = data.frame (A = C (3,4,5), B = C (12,43,54))
> D
A b
1 3 12
2 4 43
3 5 54
>write.table (d, file= "Tst0.txt", Row.names=false)
The data frame d is written to the file Tst0.txt, and the parameter row.names=false indicates that the row name is not written to the file because the row name is not important, just some numbers.
> d2 = read.table (file= "Tst0.txt", Header=true)
> d2
A b
1 3 12
2 4 43
3 5 54
Use the Read.table function to write the data in the file to D2.
Exercise: Create file Tst1.txt with data. Reads and multiplies the value in the column named G by 5, which is deposited into the file tst2.txt.
11. Data that cannot be obtained
Exercise: Calculate the average of the square root of a vector with 100 random numbers. What's going to happen?
When a data cannot be obtained, it is denoted by na:
> j = C (1,2,na)
There is no conventional calculation for J. Like what:
> Max (j)
[1] NA
The maximum value is not calculated.
If it is necessary to calculate, using the parameter na.rm=true, the meaning is probably to ignore the value of Na:
>max (J, Na.rm=true)
[1] 2
12. Class
All the previous touches are numbers, and sometimes you might want to deal with more than just a few data, such as a name or a data file. There are three types in R: Numeric,character and POSIX.
(1) Characters characters
Defines a string that needs to be enclosed in double quotation marks.
> m = "Apples"
>m
[1] "apples"
> n = Pears
Error:object ' pears ' not found
It is also not possible to use strings for mathematical operations.
> m+2
Error in M + 2:non-numeric argument to binary operator
(2) Date
Date and time are more complex. Using the Strptime function is the simplest way to tell the R language time:
> Data1=strptime (C ("20100225230000", "20100226000000", "20100226010000"), format= "%y%m%d%h%m%s")
> Date1
[1] "2010-02-25 23:00:00"
[2] "2010-02-26 00:00:00"
[3] "2010-02-26 01:00:00"
First use the C () function to create a vector, remember to use double quotes, because strptime requires a string as input. The format parameter determines the formatting of the read-in time. The year, month, day, time, minute, and second in turn.
Exercise: Generate a graph with x-axis for today, 2014 Saint Nicholas Day, your birthday. The y-axis indicates the number of gifts you want to receive on these days.
13. Programming Tools
If you need to make a large program, you might use some programming statements:
(1) If statement
> w = 3
>if (w< 5)
{
d=2
}else{
d=10
}
> D
2
Learned the programming of all understand not detailed said.
can also be used to define special conditions:
> A = C (1,2,3,4)
> b = C (5,6,7,8)
> f = a[b==5 | b==8]
>f
[1] 1 4
Note the double equals symbol. There are other symbols, such as <, >,! =, <=, >=. If you want to validate more than one condition, use & to represent or.
(2) For loop
Define the number of times and the actions you want to make.
> h = seq (from=1, to=8)
> s = C ()
> For (i-2:10)
{
S[i] = h[i] * 10
}
> S
[1] na-A-na-NA
First define a vector h. Then create an empty vector s. The purpose of the For loop is to multiply 2 to 10 elements by 10 and then put in S.
Exercise: Create a vector from 1 to 100, traverse the entire vector with a for loop, and multiply the value of 5 and the value of 90 by 10. The other values are multiplied by 0.1.
(3) Write your own function
> func1 = function (arg1, arg2)
{
W = arg1^2
Return (ARG2+W)
}
> func1 (arg1 = 3, arg2 = 5)
[1] 14
Exercise: Write the previous exercise as a function and use a for loop in the function. You can use the length function to define the range of loops.
14. Some useful references
(1) function
Some of the functions mentioned in the R Reference Card:
A) data creation
read.table: Reads a file from the file. Parameter: Header=true: reads the first line as a column name; Sep= ",": data separated by commas; Skip=n: The first n rows are not read.
write.table: Writes a table to a file.
C: Sets up a number to generate a vector.
array: Create a vector, parameter: dim:length
Matrix: Establish a matrix, parameters: Ncol and/or nrow:number of Rows/columns
data.frame: Create a data frame
list: Create a list
Rbind and Cbind: combines two vectors into a matrix by row or column
b) Extracting data
X[n]: nth element in a vector
X[m:n]: the first m to the nth element
x[c (k,m,n)]: element at a specific location
X[x>m & X<n]: elements between M and N
x$n: An element named N in a list or data frame
x[["n"]: Ibid.
[i,j]: element of column I, section J
[i,]: line I in the matrix
c) Information on variables
Length: Length of the Matrix
Ncol or Nrow: A column or row number in a matrix
class: Class of variables
names: The name of an object in the list
print: Displaying variables or strings on the screen
return: Used in functions to return variables
is.na: Determine if the variable is NA
as.numeric or As.character: Changes a class to a number or string
Strptime: Converts the class of a string to time (POSIX)
D) Statistics
sum: The and of a vector or matrix element
mean: average of vectors
• SD: Standard deviation of vectors
max or min: Maximum or minimum element
rowsums (or Rowmeans, Colsums and Colmeans): the and/or average of each row/column in The matrix. The result is a vector.
quantile (X,c (0.1,0.5)): Sample the 0.1 and 0.5th quantiles of vector x
e) Data Processing
seq: to build vectors evenly (e.g. from1to100)
rnorm: Creating a random number vector based on normal distribution
sort: Arranging elements in ascending order
• T: Transpose a matrix
Aggregate (X,by=ls (y), fun= "mean"): divides x by y into subsets, computes the average of a subset, and generates a new list.
na.approx:interpolate (in the zoo package). Argument:vector with NAs. Result:vector without
Nas.
cumsum: Cumulative and, as a result, a vector.
rollmean:moving average (in the zoo package)
paste: Glue strings together
substr: Divides a string into several parts
f) Fitting
LM (V1SV2): Linear fit (regression line) between Vector v1 on the y-axis and V2 on the x-axis
NLS (V1sa+b*v2, Start=ls (a=1,b=0)): Nonlinear fit. Should contain equation with variables (here v1 and v2 and parameters (here A and b) with starting values
coef:returns Coe "cients from a fit
Summary:returns all results from a fit
g) Plotting
plot (x): Plot x (y-axis) versus index number (x-axis) in a new window
plot (x, y): Plot y (y-axis) versus X (x-axis) in a new window
image (z/y): Plot z (color scale) versus X (x-axis) and Y (y-axis) in a new window
lines or points:add lines or points to a previous plot
Hist:plot Histogram of the numbers in a vector
barplot:bar plot of vector or data frame
Contour (x, Y, z): Contour plot
Abline:draw Line (segment). Arguments:a,b for intercept A and slope b; or h=y for horizontal line at Y; or v=x for vertical line at x.
Curve:add function to plot. Needs to has a x in the expression. Example:curve (x^2)
Legend:add legend with given symbols (lty or PCH and col) and text (legend) at location
(x= "TopRight")
Axis:add axis. Arguments:side–1=bottom, 2=left, 3=top, 4=right
mtext:add text on axis. Arguments:text (character string) and side
grid:add Grid
par:plotting parameters to be specified before the plots. arguments:e.g. Mfrow=c (1,3)):
Number of figures per page (1 row, 3 columns); New=true:draw plot over previous plot.
h) Plotting parameters
These can added as arguments to plot, lines,image, etc. For help see par.
type: "L" =lines, "P" =points, etc.
col:color– "Blue", "red", etc
lty:line type–1=solid, 2=dashed, etc.
pch:point type–1=circle, 2=triangle, etc.
Main:title-character String
Xlab and Ylab:axis labels–character string
Xlim and Ylim:range of axes–e.g. C (1,10)
log:logarithmic axis– "x", "Y" or "xy"
i) programming
function (arglist) {expr}: function Definition:do expr with list of arguments arglist
if (cond) {EXPR1}ELSE{EXPR2}: If-statement:if cond is true, then expr1, else expr2
for (var in vec) {expr}: For-loop:the counter var runs through the vector VEC and does expr each run
while (cond) {expr}: While-loop:while cond is true, does expr each run
(2) Shortcut keys
Click Help->keyboard shortcuts to view.
crl+enter: Send command of Script window to command window
• Up ARROW or DOWN arrow in command window:previous or next command
ctrl+1, Ctrl+2, etc.: change between the Windows not r-specific, but very useful keyboard shortcuts:
Ctrl+c, ctrl+x and ctrl+v:copy, cut and paste
Alt+tab:change to another program window
• Up, down, left, or right: Move cursor
home or end:move cursor to begin or END of line
page UP or page down:move cursor one page UP or down
shift+ up/down/left/right/home/end/pgup/pgdn:select
(3) Error message
• No such file or directory or cannot change working directory
Make sure the working directory and file name are correct
object ' x ' not found
The variable x is not yet defined, X is defined or double quotation marks indicate that x is a string.
argument ' x ' is missing without default
You do not define the parameter x if the parameter x is mandatory.
+
R is running or you forgot to add a closing parenthesis. Wait, or click}, or click ESC.
unexpected ') ' or unexpected '} ' in '} '
Hit the right parenthesis more.
unexpected ' else ' in ' else '
Put the else a if-statement on the same line as the last bracket of the ' Then '-part:}else{.
missing value where True/false needed
The condition section has a problem, such as (if (x==1)), X is Na?
the condition have length > 1 and only the first element would be used
For example (if (x==1)) if x is a vector it will be an error. Give it a try x[i].
non-numeric argument to binary operator
An attempt was made to perform a non-numeric operation. Try Class () to test the data classes, or use As.numeric () to convert to numbers.
argument is of length zero or replacement is of length zero
Variable for control
"Translation" A (very) short description of introduction to R R