There are several commonly used functions in the R language, which can be processed by group, apply, Lapply, sapply, tapply, mapply, etc. These functions are somewhat similar, and the following describes the usage of these functions.
Apply
This is an operation on a matrix or array for a dimension. The format is:
Apply (data, dimension index, arithmetic function, function parameter)
For the matrix, the dimension value is 2, the second parameter dimension is index, 1 is the row operation, and 2 is the column operation. Here's an example:
M<-matrix (1:6,2,3)
Build a simple 2-row, 3-column matrix with the following:
[, 1] [, 2] [, 3] [1,] 1 3 5[2,] 2 4 6
If we want to calculate the sum value of each row, we can write it as:
Apply (M,1,sum)
[1] 9 12
If you want to calculate the mean value for each column, change to:
Apply (M,2,mean)
[1] 1.5 3.5 5.5
If a value is Na, then the NA value is omitted, so what about the sum of each line?
M[2,2]<-na
[, 1] [, 2] [, 3] [1,] 1 3 5[2,] 2 NA 6
Apply (M,1,sum) [1] 9 NA
The SUM function itself has a parameter na.rm, we can take this parameter to the Apply function, as the 4th parameter:
Apply (M,1,sum,na.rm=true) [1] 9 8
Note that if it is data Frame, then the system will be converted to matrix, if all column is not a numeric type or inconsistent type, resulting in conversion failure, then apply is not the result of any column.
Lapply
Before we say apply is for matrix and array, we can use the Lapply function for list. The function receives a list, and the returned result is also a list. Its invocation is as follows:
Apply (data, arithmetic functions, parameters of functions)
For data frame, if different columns have different data types, they cannot be converted to matrix, but can be converted to a list, and then the Lapply function is used.
We set up a data Frame for the student's name, age and score, and then the average age and average score, because the name column is not a numeric type and therefore cannot be averaged, so we can just count the number of non-numeric data. You need to use a custom function here.
The function can be an anonymous function, or it can be a previously defined function, because the logic here is simple, we can use anonymous function to solve.
S<-data.frame (Name=c ("Devin", "Edward", "Lulu"), Age=c (30,33,29), Score=c (95,99,90))
Name Age score1 Devin 952 Edward 993 Lulu 90
Lapply (s,function (x) {if (Is.numeric (x)) {mean (x)}else{length (x)}})
$name [1] 3$age[1] 30.66667$score[1] 94.66667
We can see that the result of returning a list consists of 3 items, each of which is the result of the function execution. The result returned by Lapply is the same as the structure of the incoming list, the number of item passed in, and the number of item returned.
Sapply
The Sapply function is very similar to the lapply function, and it is also done with the list, but on the return result, sapply will reconstruct a reasonable data type back based on the data type and structure of the result. The invocation format is as follows:
Apply (data, arithmetic functions, parameters of the function, simplify = TRUE, use. NAMES = TRUE)
For the simplify parameter, it indicates whether the returned result set is to be re-organized and, if False, the equivalent of lapply. Use. Names is a string that is named when it is processed on a string.
Or the above example, just replace the lapply with sapply:
Sapply (s,function (x) {if (Is.numeric (x)) {mean (x)}else{length (x)}}) name Age Score
We can see that the result set becomes a vector of numbers, not a list.
Mapply
This is the sapply processing of multiple data (multivariate), just the call is the parameter position changes, first put the function in front:
Mapply (arithmetic function, parameter of function, first passed parameter, second data ..., simplify = True,use. NAMES = TRUE)
For example, we customize a function M3, accept 3 numeric arguments, and then multiply 3 numbers to return the result:
M3<-function (a,b,c) {a*b*c}
Then we build 3 vectors, they have the same length:
A<-1:5
B<-2:6
C<-5:1
Now we require a,b,c in the corresponding number of M3 function operation, that is, the first number of a,b,c to do the operation, and then the second number of a,b,c to do the operation, then the third number ~ ~ ~ this time with the mapply is very convenient:
Mapply (M3,A,B,C) [1] 10 24 36 40 30
OK, so simple, the corresponding elements of the implementation of the operation.
Tapply
Several of the apply functions described above are processed for the whole data, while tapply is grouping the data in the vector. First look at the invocation format of the tapply function:
tapply (vector data, grouping identifier, arithmetic function, function parameter, simplify = TRUE)
We explain the tapply function as an example of data frame for a student, and first construct a new student data containing name,age,score,class,gender:
S<-data.frame (Name=c ("Devin", "Edward", "Lulu", "Jeneen"), Age=c (30,33,29,32), Score=c (95,99,90,88), Class=c ( 1,2,1,2), Gender=c ("M", "M", "F", "F"))
Name age score Class Gender1 Devin 1 M2 Edward 2 M3 Lulu 1 F4 jeneen 2 F
If we want to calculate the average score for each class, then the method of using tapply is:
Tapply (S$score,s$class,mean) 1
If you change the average score by gender, then it is:
Tapply (S$score,s$gender,mean) F
What if you look at both class and gender? Here we need to build two vectors into the list as the second parameter passed in:
Tapply (S$score,list (s$class,s$gender), mean) F M1 90 952 88 99
Cyclic functions in the R language (Grouping function)