5. Data Structure 5.1 data structure Introduction
(1) Vector
All elements of a vector must have the same type (pattern)
(2) List
The list can be non-homogeneous
List can be indexed by location: lst[[2]]
Extract sub-list: Lst[c (2,5)]
The list can have a name: lst[["Moe"]] or Lst$moe
Lists are similar to dictionaries, hash lists, and so on
(3) Mode: Entity type
> Mode (3.1415)
Each object in R has a pattern that indicates how the object is stored in memory:
Object |
Example |
Mode |
Number |
3.14 |
Numeric |
Vector of Numbers |
C (2.7, 3.14) |
Numeric |
Character string |
"Moe" |
Character |
Vector of Character string |
C ("Moe", "Larry") |
Character |
Factor |
Factor (C ("NY", "CA", "IL")) |
Numeric |
List |
List ("Moe", "Larry") |
List |
Data Frame |
Data.frame (X=1:3, Y=c ("NY", "CA", "IL")) |
List |
Function |
Print |
function |
(4) Class: Abstract type
> D <-as. Date ("2010-03-10")
> class (D)//result is "Date"
Each object in R has a class that defines their abstract type (class)
(5) Pure quantity (constant)
It's also called a vector with a unique element.
(6) Matrix
The matrix in R is just a vector of dimensions
The dimension of the vector, with an initial value of NULL
> A <-1:6
> Dim (A) <-c (2,3)//become 2*3 Matrix
(7) Arrays (array)
Matrices are just two-dimensional vectors, and arrays can be multi-dimensional vectors.
(8) factor (factor)
The unique value in the R record vector, each unique value is called the level of the associated factor, referring to 5.5
Factor two key applications: categorical variables, grouping
(9) Data frame
Designed to simulate datasets, with data sets in SAS or SPSS
5.2 Adding data to vectors
> v <-c (All-in-all)
> v <-c (V, 4)//Add 4 to the original vector: 1,2,3,4
> w <-c (5,6,7,8)
> v <-c (v,w)//Combine V and W
5.3 Inserting data into a vector
> Append (vec,newvalues, after=n)//insert NewValues after nth element in VEC
5.4 Understanding Circular Rules
When the shorter vector finishes all the elements, and the longer vectors still have the unhandled elements, the shorter vectors return to the starting position loop elements
5.5 Build Factor
Factors are made up of categorical variables, and the possible values for each categorical variable are called a horizontal
> F <-factor (v)
5.6 Creating a list
> LST <-list (0.5,0.8,0.3)
> LST <-list (mid=0.5, right=0.8, left=0.3)
> Lst[[2]]
>lst[["mid"] or lst["mid"] or Lst$mid
5.7 Removing elements from the list
> lst[["mid"]]<-null//Remove MID element
5.8 Converting a list to a vector
> v <-unlist (LST)
5.9 Remove the null-valued element from the list
> lst[sapply (lst,is.null)] <-null
5.10 Using conditions to remove list elements
> lst[lst< 0] <-null//removing elements less than 0
>lst[is.na (LST)] <-null//Remove elements with a value of NA
> Lst[abs (unlist (LST)) < 1]
5.11 Matrix Initialization
> Mat <-matrix (VEC, 2, 3)//Generate a 2*3 matrix from the VEC data
> Dim (VEC) <-c (2,3)//Method 2
5.12 Matrix Operations
> The transpose of T (a)//matrix A
> Solve (a)//inverse of matrix A
> A%*% B//Matrix A*b
> diag (n)//Generate an N-order diagonal Unit matrix
5.13 Assigning a descriptive name to the rows and columns of a matrix
> Rownames (MAT) <-c ("Rowname_1", "rowname_2", ..., "Rowname_n")
> Colnames (MAT) <-c ("Colname_1", "colname_2", ..., "Colname_n")
5.14 Select a row or column from the matrix
> Vec <-mat[1,]//result is a vector
> Vec <-mat[,2, Drop=false]//result is a matrix
5.15 initializing a data frame with column data
> Dfrm <-data.frame (v1, v2, v3, F1, F2)//Initialize data frame with vectors and factors
> LST <-list (v1, v2, v3)
> Dfrm <-as.data.frame (LST)//Method 2
5.16 initializing a data frame with row data
It is not possible to store data in a vector when the data in each row is mixed by different patterns of data, such as numbers, characters, and so on. Typically, each row is stored in a single row of data frames, and then a list is made, calling functions Rbind and Do.call to combine multiple rows into a large data frame.
> Obs <-list (Data.frame (Vc1=1, f1=0), Data.frame (vc1=2, f1=1))
> Dfrm <-rbind (obs[[1]], obs[[2])//Make the first two rows a data frame
> Dfrm <-do.call (rbind, OBS)//group all rows into one data frame
When OBS is not a list of data frames, but rather a list of lists, first call the map function to convert the row data into data frame data and then use the Do.call
> Dfrm <-do.call (Rbind, Map (As.data.frame, OBS))
5.17 add row to data frame
The new row is a single-line data frame pattern.
> Suburbs<-rbind (suburbs,
+ data.frame (city= "Nanjing", county= "Kane", pop=5421)
+ data.frame (city= "Beijing", county= "Jane", pop=5552))//Add two lines at a time
5.18 pre-allocated data frame
When the data volume is large, the memory manager of R will run poorly when you add new rows to build the data frame. If you know the number of rows that you must have, you can allocate space beforehand.
> N <-100000
> Dfrm <-data.frame (Colname1=numeric (n), Colname2=character (n), ...)
5.19 Select columns of the data frame
> Dfrm[[n]]//Return to column N, a vector
> Dfrm[n]//Returns a data frame with only nth column
>dfrm[c (N1,N2,N4)]//
> Dfrm[,n]//return a vector
>dfrm[,c (N1,N3)]//
> dfrm[["name"] > Dfrm$name//Return column named Name
> Subset (dfrm,select=c (colname1, colname2))//Select columns by column name
>subset (Dfrm, Select=c (colname1, colname2), subset= (colname1>0))//rows that meet the criteria, as well as as long as two columns
5.20 Modifying the column name of the data frame
> colnames (dfrm) <-c ("Before", "treatment", "after")
5.21 Edit Data frame
> Temp <-edit (dfrm)
> dfrm <-temp//Save the modified data frame as Temp
> Fix (DFRM)//overwrite original data frame after direct modification
5.22 removing rows containing na from the data frame
> Clean_dfrm<-na.omit (DFRM)
5.23 removing columns from the data frame
> subset (dfrm,select =-colname2)
5.24 merging two data frames
When the columns of the two data frames are inconsistent, the merge is horizontal, with Cbind:
> All.cols <-cbind (Dfrm1, Dfrm2)//Horizontal column Merge
When the columns of the two data frames are consistent, the merge is vertical, with Rbind:
> All.rows <-rbind (Dfrm1, Dfrm2)//Vertical row Merge
Based on a common column merge data frame, a SQL-like join with merge:
> M <-merge (Dfrm1, Dfrm2, by= "name")
5.25 more convenient access to data frame content
When using the columns in the data frame, you would need to dfrm$colname1, you can omit dfrm with the following command:
> with (dfrm,expr)//The current expression expr can be used directly with colname1
>attach (DFRM)//Can be used in the following expressions Colname1
5.26 conversions between basic data types
>as.character (x)//character type
>as.complex (x)//plural type
>as.numeric (x) or as.double (x)
>as.integer (x)
>as.logical (x)
5.27 conversions between different structured data types
Some conversions are not feasible, be careful.
>as.data.frame (x)
> as.list (x)
>as.matrix (x)
> As.vector (x)
R language Data structure