Why use Apply
Because I am a programmer, so when I first learn R, as "another programming language" to learn, but how to learn all feel awkward. Now I am inclined to think that R is not a general-purpose programming language, but a software tool in the field of statistics. Therefore, it is not possible to design R code with general-purpose programming thinking. In Andrew Lim's comparative answer to R and Python, R is an array-oriented (array-oriented) syntax, which is more like math, allowing scientists to translate mathematical formulas into R code. Python is a common programming language that is more engineered. When using R, try to think in array mode and avoid the for loop. How do iterations be implemented without loops? This requires the use of a apply
family of functions. It is not a function, but a function similar to a family function.
Overview
The basic role of the Apply series function is to iterate over an array (array, which can be multidimensional), or a list of elements or elements of a subset, and to invoke a specified function with the current element or subset in the form of a parameter. Vector is a one-dimensional array,dataframe that can be considered a special list.
The relationships between these functions
Action Target |
apply on each element |
apply on a subset |
Array |
apply |
tapply |
List |
lapply (...) |
by |
This lapply(...)
includes a family of functions
lapply | |-> 简化版: sapply | | -> 可设置返回值模板: vapply | |-> 多变量版: mapply | |-> 递归版: rapply
In addition vector is strange, vector is a one-dimensional array, but not all the same function with the array. In the case of iteration by element, use the same function as list, and lapply
in the case of iteration by Subcollections, tapply
and by
all can be used, but the return value form is different.
function and Syntax description apply
apply(array, margin, FUN, ...)
On the array, in the margin direction, called sequentially FUN
. The return value is vector. Margin represents the ordinal subscript of an array reference (that is, array[index1, Index2, ...] The 1 corresponds to 1 for the row, 2 for the column, and C for rows. Margin=1 is apply(a, 1, sum)
equivalent to the following operation
a <- array(c(1:24), dim=c(2,3,4))result=c()for (i in c(1:dim(a)[1])) { result <- c(result, sum(a[i,,]))}
Measured, can only be used on the two-dimensional and above the array, can not be used on the vector (if you want to apply to the vector, please use lapply
or sapply
). Take matrix for example, as follows
> m <- matrix(c(1:10), nrow=2)> m [,1] [,2] [,3] [,4] [,5][1,] 1 3 5 7 9[2,] 2 4 6 8 10> apply(m, 1, sum)[1] 25 30> apply(m, 2, sum)[1] 3 7 11 15 19
Tapply
tapply(array, indices, margin, FUN=NULL, ...)
indices
grouped by the values in, the same value corresponding to the subscript array of elements form a collection, applied to FUN
. Operations similar to group by indices. Returns FUN
a vector if a value is returned, or returns a tapply
FUN
list if more than one value is returned tapply
. The length of a vector or list is indices
equal to the number of different values in.
When FUN
NULL
done, returns a vector of the same length and number of elements in the array, indicating the result of the grouping, and the subscript for the equivalent element in the vector belongs to the same group. For example, return C (1, 2, 1, 3, 2), representing the 1th, 3 elements as a group, the 2nd, 5 elements as a group, and the indices
4th element as a group.
Example of a one-dimensional array (i.e. vector)
> v <- c(1:5)> ind <- c(‘a‘,‘a‘,‘a‘,‘b‘,‘b‘)> tapply(v, ind)[1] 1 1 1 2 2> tapply(v, ind, sum)a b 6 9 > tapply(v, ind, fivenum)$a[1] 1.0 1.5 2.0 2.5 3.0$b[1] 4.0 4.0 4.5 5.0 5.0
Example of a two-dimensional array (i.e. matrix)
> m <- matrix(c(1:10), nrow=2)> m [,1] [,2] [,3] [,4] [,5][1,] 1 3 5 7 9[2,] 2 4 6 8 10> ind <- matrix(c(rep(1,5), rep(2,5)), nrow=2)> ind [,1] [,2] [,3] [,4] [,5][1,] 1 1 1 2 2[2,] 1 1 2 2 2> tapply(m, ind) [1] 1 1 1 1 1 2 2 2 2 2> tapply(m, ind, mean)1 2 3 8 > tapply(m, ind, fivenum)$`1`[1] 1 2 3 4 5$`2`[1] 6 7 8 9 10
By
by(dataframe, INDICES, FUN, ..., simplify=TRUE)
by
can be regarded as dataframe tapply
. indices
should be the same length as dataframe per column. The return value is an by
object of type. If Simplify=false, it is essentially a list.
> df <- data.frame(a=c(1:5), b=c(6:10))> ind <- c(1,1,1,2,2)> res <- by(df, ind, colMeans) > resind: 1a b 2 7 ------------------------------------------------------------ ind: 2 a b 4.5 9.5 > class(res)[1] "by"> names(res)[1] "1" "2"
Lapply
lapply(list, FUN, ...)
list
called on a per-element basis FUN
. Can be used on dataframe, because Dataframe is a special form of list. Cases
> lst <- list(a=c(1:5), b=c(6:10))> lapply(lst, mean)$a[1] 3$b[1] 8> lapply(lst, fivenum)$a[1] 1 2 3 4 5$b[1] 6 7 8 9 10
Sapply
sapply(list, FUN, ..., simplify, USE.NAME=TRUE)
lapply
more than one simplify
parameter. If simplify=FALSE
, it is equivalent to lapply
. Otherwise, the lapply
output list is simplified to a vector or matrix, based on the previous case. Cases
> lst <- list(a=c(1:5), b=c(6:10))> sapply(lst, mean)a b 3 8 > sapply(lst, fivenum) a b[1,] 1 6[2,] 2 7[3,] 3 8[4,] 4 9[5,] 5 10
Vapply
vapply(list, FUN, FUN.VALUE, ..., USE.NAME=TRUE)
vapply
Similar to sapply
, but provides a third parameter to FUN.VALUE
indicate the form of the return value, which can be considered as a template for the return value. Cases
> lst <- list(a=c(1:5), b=c(6:10)) > res <- vapply(lst, function(x) c(min(x), max(x)), c(min.=0, max.=0)) > res a b min. 1 6 max. 5 10
Mapply
mapply(FUN, ..., MoreArgs=NULL, SIMPLIFY=TRUE, USE.NAMES=TRUE)
mapply
is a multivariate version of the sapply
parameter (...). A section can receive multiple data, an mapply
array that will be FUN
applied to the first element of the data, then an array of the second element, and so on. Requires that multiple data be of the same length, or an integer-fold relationship. The return value is a vector or matrix, depending on whether FUN
the return value is one or more.
> mapply(sum, list(a=1,b=2,c=3), list(a=10,b=20,d=30)) a b c 11 22 33 > mapply(function(x,y) x^y, c(1:5), c(1:5))[1] 1 4 27 256 3125> mapply(function(x,y) c(x+y, x^y), c(1:5), c(1:5)) [,1] [,2] [,3] [,4] [,5][1,] 2 4 6 8 10[2,] 1 4 27 256 3125
Rapply
rapply(list, FUN, classes="ANY", deflt=NULL, how=c("unlist", "replace", "list"), ...)
rapply
is a recursive version lappy
. The basic principle is to iterate over the list, and if one of the elements is still a list, continue the traversal; for each element of a non-list type, it is called if its type is classes
one of the types specified by the parameter FUN
. Classes= "Any" means that all types are matched.
How parameters are used to specify the mode of operation, there are three kinds:
- "Replace"
FUN
replaces the original list element directly with the result of the call
- "List" Creates a new list, an element type in
classes
the, called, or FUN
is not in classes
the type, used deflt
. The structure of the original list is preserved.
- "Unlist" is equivalent to a result call to "list" mode
unlist(recursive=TRUE)
> lst <- list(a=list(aa=c(1:5), ab=c(6:10)), b=list(ba=c(1:10)))> lst$a$a$aa[1] 1 2 3 4 5$a$ab[1] 6 7 8 9 10$b$b$ba [1] 1 2 3 4 5 6 7 8 9 10> rapply(lst, sum, how=‘list‘)$a$a$aa[1] 15$a$ab[1] 40$b$b$ba[1] 55> rapply(lst, sum, how=‘unlist‘)a.aa a.ab b.ba 15 40
The second is an classes
deflt
example of the use of arguments and parameters
> lst2$a$a$aa[1] 1 2 3 4 5$a$ab[1] 6 7 8 9 10$b$b$ba[1] "I am a string"> rapply(lst2, sum, how=‘unlist‘)Error in .Primitive("sum")("I am a string", ...) : invalid ‘type‘ (character) of argument> rapply(lst2, sum, classes=c(‘integer‘), deflt=-1, how=‘unlist‘)a.aa a.ab b.ba 15 40 -1 > rapply(lst2, nchar, classes=c(‘character‘), deflt=as.integer(NA), how=‘unlist‘)a.aa a.ab b.ba NA NA
Eapply
Apply on the environment. Never used environment, not studied for the time being.
Application of tapply for crosstable function
Shown in an example. Raw data is a sales volume that is counted by year, Region Loc, and Commodity category type. We're going to make two crosstable of total sales, one for the year, the region for the row, one for the year, and the category for the column.
> df <- data.frame(year=kronecker(2001:2003, rep(1,4)), loc=c(‘beijing‘,‘beijing‘,‘shanghai‘,‘shanghai‘), type=rep(c(‘A‘,‘B‘),6), sale=rep(1:12))> df year loc type sale1 2001 beijing A 12 2001 beijing B 23 2001 shanghai A 34 2001 shanghai B 45 2002 beijing A 56 2002 beijing B 67 2002 shanghai A 78 2002 shanghai B 89 2003 beijing A 910 2003 beijing B 1011 2003 shanghai A 1112 2003 shanghai B 12> tapply(df$sale, df[,c(‘year‘,‘loc‘)], sum) locyear beijing shanghai 2001 3 7 2002 11 15 2003 19 23> tapply(df$sale, df[,c(‘year‘,‘type‘)], sum) typeyear A B 2001 4 6 2002 12 14 2003 20 22
Reference
Andrew Lim
https://screamyao.wordpress.com/2011/05/03/various-apply-functions-in-r-explained/
https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/
Http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm#apply
R language Apply function family notes