R language Apply function family notes

Source: Internet
Author: User

Why use Apply

Because I am a programmer, so when I first learn R, as "another programming language" to learn, but how to learn all feel awkward. Now I am inclined to think that R is not a general-purpose programming language, but a software tool in the field of statistics. Therefore, it is not possible to design R code with general-purpose programming thinking. In Andrew Lim's comparative answer to R and Python, R is an array-oriented (array-oriented) syntax, which is more like math, allowing scientists to translate mathematical formulas into R code. Python is a common programming language that is more engineered. When using R, try to think in array mode and avoid the for loop. How do iterations be implemented without loops? This requires the use of a apply family of functions. It is not a function, but a function similar to a family function.

Overview

The basic role of the Apply series function is to iterate over an array (array, which can be multidimensional), or a list of elements or elements of a subset, and to invoke a specified function with the current element or subset in the form of a parameter. Vector is a one-dimensional array,dataframe that can be considered a special list.

The relationships between these functions

Action Target apply on each element apply on a subset
Array apply tapply
List lapply(...) by

This lapply(...) includes a family of functions

lapply   |   |-> 简化版: sapply   |             | -> 可设置返回值模板: vapply   |             |-> 多变量版: mapply   |   |-> 递归版: rapply

In addition vector is strange, vector is a one-dimensional array, but not all the same function with the array. In the case of iteration by element, use the same function as list, and lapply in the case of iteration by Subcollections, tapply and by all can be used, but the return value form is different.

function and Syntax description apply
apply(array, margin, FUN, ...)

On the array, in the margin direction, called sequentially FUN . The return value is vector. Margin represents the ordinal subscript of an array reference (that is, array[index1, Index2, ...] The 1 corresponds to 1 for the row, 2 for the column, and C for rows. Margin=1 is apply(a, 1, sum) equivalent to the following operation

a <- array(c(1:24), dim=c(2,3,4))result=c()for (i in c(1:dim(a)[1])) {    result <- c(result, sum(a[i,,]))}

Measured, can only be used on the two-dimensional and above the array, can not be used on the vector (if you want to apply to the vector, please use lapply or sapply ). Take matrix for example, as follows

> m <- matrix(c(1:10), nrow=2)> m     [,1] [,2] [,3] [,4] [,5][1,]    1    3    5    7    9[2,]    2    4    6    8   10> apply(m, 1, sum)[1] 25 30> apply(m, 2, sum)[1]  3  7 11 15 19
Tapply
tapply(array, indices, margin, FUN=NULL, ...)

indicesgrouped by the values in, the same value corresponding to the subscript array of elements form a collection, applied to FUN . Operations similar to group by indices. Returns FUN a vector if a value is returned, or returns a tapply FUN list if more than one value is returned tapply . The length of a vector or list is indices equal to the number of different values in.

When FUN NULL done, returns a vector of the same length and number of elements in the array, indicating the result of the grouping, and the subscript for the equivalent element in the vector belongs to the same group. For example, return C (1, 2, 1, 3, 2), representing the 1th, 3 elements as a group, the 2nd, 5 elements as a group, and the indices 4th element as a group.

Example of a one-dimensional array (i.e. vector)

> v <- c(1:5)> ind <- c(‘a‘,‘a‘,‘a‘,‘b‘,‘b‘)> tapply(v, ind)[1] 1 1 1 2 2> tapply(v, ind, sum)a b 6 9 > tapply(v, ind, fivenum)$a[1] 1.0 1.5 2.0 2.5 3.0$b[1] 4.0 4.0 4.5 5.0 5.0

Example of a two-dimensional array (i.e. matrix)

> m <- matrix(c(1:10), nrow=2)> m     [,1] [,2] [,3] [,4] [,5][1,]    1    3    5    7    9[2,]    2    4    6    8   10> ind <- matrix(c(rep(1,5), rep(2,5)), nrow=2)> ind     [,1] [,2] [,3] [,4] [,5][1,]    1    1    1    2    2[2,]    1    1    2    2    2> tapply(m, ind) [1] 1 1 1 1 1 2 2 2 2 2> tapply(m, ind, mean)1 2 3 8 > tapply(m, ind, fivenum)$`1`[1] 1 2 3 4 5$`2`[1]  6  7  8  9 10
By
by(dataframe, INDICES, FUN, ..., simplify=TRUE)

bycan be regarded as dataframe tapply . indicesshould be the same length as dataframe per column. The return value is an by object of type. If Simplify=false, it is essentially a list.

> df <- data.frame(a=c(1:5), b=c(6:10))> ind <- c(1,1,1,2,2)> res <- by(df, ind, colMeans) > resind: 1a b 2 7 ------------------------------------------------------------ ind: 2  a   b 4.5 9.5 > class(res)[1] "by"> names(res)[1] "1" "2"
Lapply
lapply(list, FUN, ...)

listcalled on a per-element basis FUN . Can be used on dataframe, because Dataframe is a special form of list. Cases

> lst <- list(a=c(1:5), b=c(6:10))> lapply(lst, mean)$a[1] 3$b[1] 8> lapply(lst, fivenum)$a[1] 1 2 3 4 5$b[1]  6  7  8  9 10
Sapply
sapply(list, FUN, ..., simplify, USE.NAME=TRUE)

lapplymore than one simplify parameter. If simplify=FALSE , it is equivalent to lapply . Otherwise, the lapply output list is simplified to a vector or matrix, based on the previous case. Cases

> lst <- list(a=c(1:5), b=c(6:10))> sapply(lst, mean)a b 3 8 > sapply(lst, fivenum)     a  b[1,] 1  6[2,] 2  7[3,] 3  8[4,] 4  9[5,] 5 10
Vapply
vapply(list, FUN, FUN.VALUE, ..., USE.NAME=TRUE)

vapplySimilar to sapply , but provides a third parameter to FUN.VALUE indicate the form of the return value, which can be considered as a template for the return value. Cases

 > lst <- list(a=c(1:5), b=c(6:10)) > res <- vapply(lst, function(x) c(min(x), max(x)), c(min.=0, max.=0)) > res      a  b min. 1  6 max. 5 10
Mapply
mapply(FUN, ..., MoreArgs=NULL, SIMPLIFY=TRUE, USE.NAMES=TRUE)

mapplyis a multivariate version of the sapply parameter (...). A section can receive multiple data, an mapply array that will be FUN applied to the first element of the data, then an array of the second element, and so on. Requires that multiple data be of the same length, or an integer-fold relationship. The return value is a vector or matrix, depending on whether FUN the return value is one or more.

> mapply(sum, list(a=1,b=2,c=3), list(a=10,b=20,d=30)) a  b  c 11 22 33 > mapply(function(x,y) x^y, c(1:5), c(1:5))[1]    1    4   27  256 3125> mapply(function(x,y) c(x+y, x^y), c(1:5), c(1:5))     [,1] [,2] [,3] [,4] [,5][1,]    2    4    6    8   10[2,]    1    4   27  256 3125
Rapply
rapply(list, FUN, classes="ANY", deflt=NULL, how=c("unlist", "replace", "list"), ...)

rapplyis a recursive version lappy . The basic principle is to iterate over the list, and if one of the elements is still a list, continue the traversal; for each element of a non-list type, it is called if its type is classes one of the types specified by the parameter FUN . Classes= "Any" means that all types are matched.

How parameters are used to specify the mode of operation, there are three kinds:

    • "Replace" FUN replaces the original list element directly with the result of the call
    • "List" Creates a new list, an element type in classes the, called, or FUN is not in classes the type, used deflt . The structure of the original list is preserved.
    • "Unlist" is equivalent to a result call to "list" modeunlist(recursive=TRUE)
 > lst <- list(a=list(aa=c(1:5), ab=c(6:10)), b=list(ba=c(1:10)))> lst$a$a$aa[1] 1 2 3 4 5$a$ab[1]  6  7  8  9 10$b$b$ba [1]  1  2  3  4  5  6  7  8  9 10> rapply(lst, sum, how=‘list‘)$a$a$aa[1] 15$a$ab[1] 40$b$b$ba[1] 55> rapply(lst, sum, how=‘unlist‘)a.aa a.ab b.ba   15   40   

The second is an classes deflt example of the use of arguments and parameters

> lst2$a$a$aa[1] 1 2 3 4 5$a$ab[1]  6  7  8  9 10$b$b$ba[1] "I am a string"> rapply(lst2, sum, how=‘unlist‘)Error in .Primitive("sum")("I am a string", ...) :   invalid ‘type‘ (character) of argument> rapply(lst2, sum, classes=c(‘integer‘), deflt=-1, how=‘unlist‘)a.aa a.ab b.ba   15   40   -1 > rapply(lst2, nchar, classes=c(‘character‘), deflt=as.integer(NA), how=‘unlist‘)a.aa a.ab b.ba   NA   NA   
Eapply

Apply on the environment. Never used environment, not studied for the time being.

Application of tapply for crosstable function

Shown in an example. Raw data is a sales volume that is counted by year, Region Loc, and Commodity category type. We're going to make two crosstable of total sales, one for the year, the region for the row, one for the year, and the category for the column.

> df <- data.frame(year=kronecker(2001:2003, rep(1,4)), loc=c(‘beijing‘,‘beijing‘,‘shanghai‘,‘shanghai‘), type=rep(c(‘A‘,‘B‘),6), sale=rep(1:12))> df   year      loc type sale1  2001  beijing    A    12  2001  beijing    B    23  2001 shanghai    A    34  2001 shanghai    B    45  2002  beijing    A    56  2002  beijing    B    67  2002 shanghai    A    78  2002 shanghai    B    89  2003  beijing    A    910 2003  beijing    B   1011 2003 shanghai    A   1112 2003 shanghai    B   12> tapply(df$sale, df[,c(‘year‘,‘loc‘)], sum)      locyear   beijing shanghai  2001       3        7  2002      11       15  2003      19       23> tapply(df$sale, df[,c(‘year‘,‘type‘)], sum)      typeyear    A  B  2001  4  6  2002 12 14  2003 20 22
Reference

Andrew Lim

https://screamyao.wordpress.com/2011/05/03/various-apply-functions-in-r-explained/

https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/

Http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm#apply

R language Apply function family notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.