Data processing of R language

Source: Internet
Author: User

One, vector processing

1. Selecting and displaying vectors
DATA[1]
DATA[3]
Data[1:3]
DATA[-1]: All items except the first item
Data[c (1,3,4,6)]
DATA[DATA>3]
DATA[DATA<5|DATA>7]: All items less than 5 or greater than 7
which (data = = max): Displays the number of the item with the highest value
Data[seq (1,length (data), 2)]: Each time a specific value is taken out, 1 is the beginning of the first entry, length (data) to the end of the last item of the vector, 2 for the interval two

2. Ordering of vectors

The sort () function can sort vectors, default to Ascending, option Na.last to remove Na by default, true to include Na and place the last, false to include NA and place first, and sort by a parallel order for duplicate values
Sort (data,na.last = NA)
Sort (Data,na.last = TRUE)
Sort (Data,na.last = FALSE)


Order () function, get the ordering number of the vector, option na.last default to True contains Na and put it finally, Na is not containing Na, false to contain NA and place the front
oder (Data,na.last = NA)
Order (Data,na.last = TRUE)
Order (Data,na.last = FALSE)


The rank () function, which is also the vector sort function, differs from sort () in that rank is not tied to a repeating value, and in other ways the default is to take a de-averaging. Option Ties.method is the way to handle duplicate values, Na.last is the way to process NA values, keep to retain NA values
Rank (data,ties.method=average,na.last= "keep")

3. Vector returns logical value

Use vectors directly to determine the equal sign, such as
> data = = 100
[1] False to False to false to False if


Second, matrix and data frame processing

1. Selecting and displaying a matrix or data frame

Like vectors, select matrices and data frames can also use [], but you need to specify rows and columns, and the general format is
Object[row,column]
Such as
data[3,3]: Select data for third column of third row
Data[3,1:4]: Select data for the third row and 1-4 columns
Data[1:2,1:3]: Select data for 1–2 rows, 第1-3 columns
DATA[,1]: Selects all data in the first column, returns as a vector
Data[1,]: Select the first row of all data
DATA[1]: For the data frame, display the first column of data, returned as a list, for the matrix, will be treated as an index number, display the value corresponding to the index number.
Data[c (1,3,5,7),]
Data[c (1,3,5,7). -4]: Select all values except the fourth column and the 1,3,5,7 row.
Data[c (1,3,5,7), "column name"], option 1,3,5,7 row, and value of a column

Index number: The matrix is indexed by starting with the first row in the first column, reading each column sequentially, and then using Test.matrix

2. Sorting of matrices and data frames

Like vectors, sorting is done using the sort, order, rank functions, but sort can be used to sort the entire matrix, but you can't sort the entire data frame, and when you apply a data frame, you should select part of it.

Third, the processing of the list

Lists often hold data in different structures, so before you can process list data, you should review the data structure of the list contents and use the STR function to view


1. Selecting and displaying list data

DATA[1]: Select the 1th element in a list

2. Sorting the list

You can only sort an element in a list and use the $ symbol to select it

Sort (data$one): Sort the one element in the data list


Iv. basic operation of data Objects

1. View and set row and column names
Names (): You can get the name of the row and column of the data object, which can be used for lists and data frames, invalid for matrices

Row.names () and Colnames (): Can get the name of a data object row or column, can be used for data frames and matrices, invalid list

Dimnames (): The name of the row and column of the data object can be obtained, the row name is displayed, the column name is displayed, the data frame and matrix are valid, and the list is not valid because the list does not have rows and columns.

You can use the above function to set the name at the same time


2. Row and column transpose

You can use the T () function for row-and-column transpose, regardless of what data structure was previously transferred, and then into a matrix structure.

V. Constructing Data Objects

1. Structuring the list

The only way to combine objects of different data structures is to use lists to create lists with the list () function.

2. Construct the data frame
The construct data frame uses the Data.frame () function, which is a collection of data columns, which can be numeric and text types, and if the data frame is a text type, it is considered a factor and the number format is na if the length is different.


3. Construct The Matrix

Cbind (): A column that makes up a vector matrix
Rbind (): A row that makes a vector into a matrix
If there are both numbers and characters in the vector, then the numbers will be converted to characters


In addition, generally there is the matrix () function, matrix (data,nrow=2), you must specify the correct number of rows or columns, that is, the licensed column can be divisible by the number of data, or will be an error

Vi. Conversion of Data Objects

As.data.frame (): Convert to Data frame
As.character (): Convert to Factor
As.matrix (): Convert to Matrix
As.list (): Convert to List
As.table (): Convert to Table

The type of the data object can be judged by the IS function, such as: Is.data.frame, the logical value is returned. You can also return data object types directly through the class () function.

The conversion of the list is troublesome, it is best to turn the data frame into another, the data frame can not be directly converted to a table, you need to first convert to a matrix, and then converted to a table.

The stack () function is particularly useful for converting a data frame, because it can create a column of values and a list of factors in the data frame form. Unstack is its reverse operation.

In addition, the stack () function can add some options to get more results

For example, there is a data frame that reads:
Height Plant water
1 9 vulgaris Lo
2 vulgaris Lo
3 6 vulgaris Lo
4 vulgaris mid
5 vulgaris mid
6 vulgaris mid
7 vulgaris Hi
8 vulgaris hi
9 vulgaris hi
7 sativa Lo
6 sativa Lo
5 sativa Lo
Sativa mid
Sativa mid
Sativa mid
Sativa hi
Sativa hi
Notoginseng sativa Hi

There are three columns of data, a column of numerical two-column factors, and we can do the following:

> Unstack (data,form=height~plant)
Sativa vulgaris
1 7 9
2 6 11
3 5 6
4 14 14
5 17 17
6 15 19
7 44 28
8 38 31
9 37 32

> Unstack (data,form=height~water)
Hi Lo mid
1 28 9 14
2 31 11 17
3 32 6 19
4 44 7 14
5 38 6 17
6 37 5 15

> Cc<-unstack (data,form=height~water)
> Stack (cc,select = C (Hi,lo))
Values IND
1 hi
2-Hi
3 hi
4-Hi
5-Hi
6 Panax Notoginseng Hi
7 9 Lo
8 Lo
9 6 Lo
7 Lo
6 Lo
5 Lo

Data processing of R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.