R language Data frame common operations

Source: Internet
Author: User

Data frame is generally translated as a table in R, which is made up of rows and columns, and unlike the matrix, each column can be a different data type, and the matrix must be the same.

Data frame Each column has a column name, and each row can also specify a row name. If you do not specify a row name, it is the sequence that starts with 1 to identify each row. InitializeUse the Data.frame function to initialize a data Frame. For example, to initialize a student data frame with ID and name and gender and birthdate, the code is: Student<-data.frame (Id=c ( One, A, -), Name=c ("Devin","Edward","Wenli"), Gender=c ("M","M","F"), Birthdate=c ("1984-12-29","1983-5-6","1986-8-8 "))Alternatively, you can use Read.table () read.csv () to read a text file and return a data frame object. Reading the database also returns the data frame object.   View student's content as: ID Name Gender Birthdate1 Devin m 1984-12-292 Edward m 1983-5-63 Wenli F 1986-8-8 This specifies only the column names Id,name,gender and birthdate, the names function can be used to view the column names, and if you want to view the row names, you need to use the Row.names function. Here we want to use the ID as the row name, then you can write: Row.names (student) <-student$id simpler Way is to initialize the date.frame, there are parameters Row.names can set the line name of the vector. Accessing elementsAs with the matrix, you can access specific elements using the format [row index, column index]. For example, visit the first line: student[1,] Access second column: student[,2] Use the index or column name of the column to select which columns to access. For example, to ID and name, then the code is: idname<-student[1:2] or a idname<-student[c ("ID","Name ")]If you are accessing only one column and the vector type is returned, you can use [[or] to access it. For example we want all student name, code: name<-student[[2]] or name<-student[["name"] or name<-student$name use the attach and detach functions to make it possible to access the column without always following the variable name. For example, to print all the name, it can be written as: Attach (Student)
Print (Name)
Detach (Student) can also be used in a concise way is to use the WITH function: with (student,{
N<-name
Print (n)
}) The n scope here is only within curly braces, and if you want to assign a global variable in the WITH function, you need to use an operator such as <<-. modifying column data typesNext we look at the type of each column of the object, using str (student) to get the following result: ' Data.frame ': 3 obs. of 4 variables: $ id:num 1 2 3 $ name:factor w/ 3 Levels "Devin", "Edward",..: 1 2 3 $ gender:factor W/2 levels "F", "M": 2 2 1 $ birthdate:factor W/3 levels "1983- 5-6 "," 1984-12-29 ",..: 2 1 3 By default, the string vectors are automatically recognized as factor, that is, the ID is a number type, and the other 3 columns are defined as the factor type. Obviously the Name here should be a string type, birthdate should be of type date, we need to make changes to the data type of the column: student$name<- as. Character (Student$name)
student$birthdate<- as.     The Date (student$birthdate) below we run STR (student) to see the modified result: ' Data.frame ': 3 obs. of 4 variables: $ id:num $ Name : Chr "Devin" "Edward" "Wenli" $ gender:factor w/2 Levels "F", "M": 2 2 1 $ birthdate:date, format: "1984-12-29" "1983-05-06" "1986-08-08" Add new ColumnFor the student object that exists, we want to add the age column, which is calculated based on the birthdate. First you need to know how to count your age. We can use the Date function sys.date () to get the current date, then use the Format function to get the year, and then subtract two years from the age. As if R does not provide several date functions that can be used, we can only use the Format function to remove the year part and then convert to the int type subtraction. student$age<- as. Integer (Format (sys.date),"%Y"))- as. Integer (Format (student$birthdate,"%Y "))It seems too long to write, we can use the within function, which is similar to the previously mentioned with function, you can omit the variable name, the difference is that the within function can modify the variable, that is, we add the Age column: Student<-within ( student,{
age<- as. Integer (Format (sys.date),"%Y"))- as. Integer (Format (Birthdate,"%Y"))
}) Queries/subsetsQuerying a date Frame, returning a subset that satisfies a condition, is a very common operation, which is equivalent to a table query in a database. Using the index of rows and columns to get a subset is the simplest method, as mentioned earlier. If we use Boolean vectors, with the which function, we can filter the rows. For example, we want to query all the data Gender to F, then we first student$gender== "F", get a Boolean vector: false false true, and then use the which function to return True index of the Boolean vector, So our full query statement is: Student[which (student$gender=="F"), note that index is not entered here, if we only want to know the age of all girls, then you can instead: Student[which (student$gender=="F"),"Age "]Such query writing or complex point, you can directly use the subset function, then the query will be simpler, for example, we change the query to the age <30 female, check the name and age, then the query statement is: subset (student,gender=="F"& age< -,Select=c ("Name"," Age")) querying the data Frame using SQLFor those of me who have been using SQL for many years, if I can write SQL statements directly to the data frame query operation, it is how convenient and wonderful ah, the result is really a package: Sqldf. The same is the previous requirement, the corresponding statement is: library (SQLDF)
Result<-sqldf ("Select Name,age from student where gender= ' F ' and age<30") Connect/MergeFor a database, a join query for multiple tables is a normal thing, so you can also connect to multiple data frames in R, which requires the use of the merge function. For example, in addition to the previously stated student object, we declare a score variable that records each student's subject and score: Score<-data.frame (Sid=c ( One, One, A, A, -), Course=c ("Math","中文版","Math","Chinese","Math"), Score=c ( -, the, the, the, the) We look at the contents of the table: SID Course Score1 One math 902 one English 803 math 804 Chinese 955 math 96 Here the SID is student inside the ID, equivalent to a foreign key, now to use this ID for inner JOIN operation, then the corresponding R statement is: Result<-merge (student,score,by.x="ID", by.y="SID"We look at the results of the merge: ID Name Gender Birthdate age Course Score1 one Devin M 1984-12-29 Math 902 one Devin    M 1984-12-29 Chinese 803-Edward M 1983-05-06 [Math 804] Edward M 1983-05-06 Chinese 955 Wenli F 1986-08-08 Math 96 as we expected, join together. In addition to join, the other operation is union, which is also a common database operation, then how to join the two columns of data Frame union in R? Although the R language has a union function, but not the meaning of the Union of SQL, we want to implement the Union function, we need to use the Rbind function. The Rbind two data frame must have the same columns, for example, we declare a student2, which rbind up two variables: Student2<-data.frame (Id=c ( +, A), Name=c ("Yan","Peng"), Gender=c ("F","M"), Birthdate=c ("1982-2-9","1983-1-16"), Age=c ( +, to))
Rbind (Student,student2)

R language Data frame common operations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.