Objects
R has five basic or "atomic" classes of objects:
Character
numeric (real numbers)
Integer
Complex
Logical (TRUE/FALSE)
The most basic object is a vector
A vector can only contain objects of the same class
But:the One exception is a list, which are represented as a vector but can contain objects of
Different classes (indeed, that's usually why we use them)
Empty vectors can is created with the vector () function.
Numbers
Numbers in R a generally treated as numeric objects (i.e. double precision real Numbers)
If you explicitly want a integer, you need to specify the L suffix
Ex:entering 1 gives you a numeric object; Entering 1L explicitly gives you an integer.
There is also a special number Inf which represents infinity; e.g. 1/0; INF can is used in
ordinary calculations; e.g. 1/inf is 0
The value NaN represents an undefined value ("Not a number"); e.g. 0/0; NaN can also be
Thought of as a missing value (more in that later)
Attributes
R objects can have attributes
Names, Dimnames
Dimensions (e.g matrices, arrays)
Class
Length
Other user-defined Attributes/metadata
Attributes of an object can is accessed using the Attributes () function.
Creating Vectors
The C () function can be used to create vectors of objects.
Using the vector () function
> x <-vector ("numeric", length = 10)
> x
[1] 0 0 0 0 0 0 0 0 0 0
Mixing Objects Mixing Objects
> y <-C (1.7, "a") # # character
> y <-C (TRUE, 2) # # Numeric
> y <-C ("a", TRUE) # # character
When different objects was mixed in a vector, coercion occurs so that every element in the vector is
of the same class.
Explicit coercion
Objects can explicitly coerced from one class to another using the as.* functions, if available.
> x <-0:6
> class (X)
[1] "integer"
> As.numeric (x)
[1] 0 1 2 3 4 5 6
> as.logical (x)
[1] FALSE True True True True True
> As.character (x)
[1] "0" "1" "2" "3" "4" "5" "6"
Nonsensical coercion results in NAs.
> x <-C ("A", "B", "C")
> As.numeric (x)
[1] Na Na Na
Warning message:
NAs introduced by coercion
> as.logical (x)
[1] Na Na Na
> As.complex (x)
[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i
Lists
Lists is a special type of vector that can contain elements of different classes. Lists is a very
Important data type in R and you should get to know them well.
> x <-List (1, "a", TRUE, 1 + 4i)
> x
[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
[[4]]
[1] 1+4i
matrices matrices
Matrices is vectors with a dimension attribute. The dimension attribute is itself a integer vector of length 2 (nrow, Ncol)
> M <-matrix (nrow = 2, Ncol = 3)
> M
[, 1] [, 2] [, 3]
[1,] Na Na Na
[2,] Na Na Na
> Dim (m)
[1] 2 3
> Attributes (M)
$dim
[1] 2 3
Matrices (cont ' d)
Matrices is constructed column-wise, so entries can is thought of starting in the ' upper left ' corner and running down th E columns.
> M <-Matrix (1:6, Nrow = 2, Ncol = 3)
> M
[, 1] [, 2] [, 3]
[1,] 1 3 5
[2,] 2 4 6
Matrices can also is created directly from vectors by adding a dimension attribute.
> M <-1:10
> M
[1] 1 2 3 4 5 6 7 8 9 10
> Dim (m) <-C (2, 5)
> M
[, 1] [, 2] [, 3] [, 4] [, 5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Cbind-ing and rbind-ing cbind-ing and rbind-ing
Matrices can is created by column-binding or row-binding with Cbind () and Rbind ().
> x <-1:3
> y <-10:12
> Cbind (x, y)
X y
[1,] 1 10
[2,] 2 11
[3,] 3 12
> Rbind (x, y)
[, 1] [, 2] [, 3]
X 1 2 3
Y 10 11 12
Factors
Factors is used to represent categorical data. Factors can be unordered or ordered. One can think
Of a factor as an integer vector where each integer has a label.
Factors is treated specially by modelling functions like LM () and GLM ()
Using factors with labels was better than using integers because factors are self-describing; Having
A variable that has the values "Male" and "Female" is better than a variable the has values 1 and 2.
> x <-Factor (C ("Yes", "yes", "no", "yes", "no"))
> x
[1] Yes yes No yes No
Levels:no Yes
> table (x)
X
No Yes
2 3
> Unclass (x)
[1] 2 2 1 2 1
attr (, "levels")
[1] "no" "yes"
The order of the levels can is set using the Levels argument to factor (). This can is important
In linear modelling because, the first level was used as the baseline level.
> x <-Factor (C ("Yes", "yes", "no", "yes", "no"),
levels = C ("Yes", "no")
> x
[1] Yes yes No yes No
Levels:yes No
Missing Values Missing values
Missing values is denoted by NA or NaN for undefined mathematical operations.
Is.na () is used to test objects if they am NA
Is.nan () is used to test for Nan
Na values has a class also, so there is integer na, character na, etc.
A NaN value is also NA and the converse is not true
> x <-C (1, 2, NA, 10, 3)
> is.na (x)
[1] false false TRUE false
> Is.nan (x)
[1] false to false false
> x <-C (1, 2, NaN, NA, 4)
> is.na (x)
[1] False True True false
> Is.nan (x)
[1] false false TRUE false
Data Frames
Data frames is used to store tabular data
They is represented as a special type of list where every element of the list has to the
Same length
Each element of the list can is thought of as a column and the length of each element of the list
is the number of rows
Unlike matrices, data frames can store different classes of objects in each column (just like lists);
Matrices must has every element be the same class
Data frames also have a special attribute called Row.names
Data frames is usually created by calling read.table () or read.csv ()
Can is converted to a matrix by calling Data.matrix ()
> x <-data.frame (foo = 1:4, bar = C (T, T, F, f))
> x
Foo Bar
1 1 TRUE
2 2 TRUE
3 3 FALSE
4 4 FALSE
> Nrow (x)
[1] 4
> Ncol (x)
[1] 2
Names
R objects can also have names, which are very useful for writing readable code and self-describing
Objects.
> x <-1:3
> Names (x)
Null
> Names (x) <-C ("foo", "Bar", "Norf")
> x
Foo Bar Norf
1 2 3
> Names (x)
[1] "foo" "Bar" "Norf"
Summary
Data Types
Atomic Classes:numeric, logical, character, integer, complex \
Vectors, lists
Factors
Missing values
Data frames
Names
R Programming Week1-data Type