Efficient batch processing functions in R (lapply sapply apply tapply mapply) _r language

Source: Internet
Author: User
The R language provides a bulk processing function that iterates through all or part of the elements within a collection to simplify operations.

The bottom of these functions is achieved by C, so efficiency is more efficient than manual.
The batch processing function has very important apply family function: lapply sapply apply tapply mapply. The Apply family function is one of the methods to realize the computational quantization (vectorization) with high efficiency, and better performance is often obtained than the traditional for,while.
Apply: Used to traverse a row or column in an array and use the specified function to process its elements.
Lapply: Iterates through each element within the list vector and uses the specified function to process its elements. Returns the list vector.
Sapply: Basically the same as the lapply, it simply simplifies the return result and returns the normal vector.
Mapply: Supports passing in more than two lists.

Tapply: The access parameter index, which operates on a grouping of data, is the same as by group in SQL.

(1) Row or column traversal operation function apply

Apply (X, MARGIN, FUN, ...)

Parameters:

X:an array, including a matrix.

Margin:1: Row action; 2: Column operation

FUN: The name of the function

The result is the same as that of colmeans,colsums,rowmeans,rowsums, by using apply, which can easily be summed/averaged by ranks.

Examples are as follows:

> A<-matrix (1:12,c (3,4))
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> Apply (a,1,sum)
[1] 22 26 30
> Apply (a,2,sum)
[1] 6 15 24 33
> Apply (a,1,function (x) sum (x) +2)
[1] 24 28 32
> Apply (a,1,function (x) x^2)
[, 1] [, 2] [, 3]
[1,] 1 4 9
[2,] 16 25 36
[3,] 49 64 81
[4,] 100 121 144

(2) listing (list) traversal function lapply

Lapply (list, function, ...)
Features: operation on each column, very suitable for the data box; The data entered must be a list type.

> A<-matrix (1:12,c (3,4))
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> A.df<-data.frame (a)
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> is.list (A.DF)
[1] TRUE
> str (A.DF)
' Data.frame ': 3 OBS 4 variables:
$ x1:int 1 2 3
$ x2:int 4 5 6
$ x3:int 7 8 9
$ x4:int 10 11 12
> lapply (A.DF, function (x) x+3)
$X 1
[1] 4 5 6
$X 2
[1] 7 8 9
$X 3
[1] 10 11 12
$X 4
[1] 13 14 15
> lapply (A.DF, function (x) sum (x) +3)
$X 1
[1] 9
$X 2
[1] 18
$X 3
[1] 27
$X 4
[1] 36
> y<-lapply (A.DF, function (x) sum (x) +3)
> is.list (Y)
[1] TRUE
> Names (y)
[1] "X1" "X2" "X3" "X4"
> y
$X 1
[1] 9
$X 2
[1] 18
$X 3
[1] 27
$X 4
[1] 36
> Y[1]
$X 1
[1] 9
> Y[[1]]
[1] 9
> y$x1
[1] 9

(3) Sapply

Sapply (list, function, ..., simplify)
Simplify=f: The type of the return value is the list, which is exactly the same as the lapply
Simplify=t (default): The type of the return value is determined by the calculation, and if the function return value is 1, then sapply simplifies the list to vector;
If the length of each element in the returned list is greater than 1 and the length is the same, then sapply simplifies the bit by a matrix



> yy<-sapply (a.df, function (x) x^2)
> yy
     X1 X2 x3  X4
[1,]  1 1 6
[2,]  4 121
[3,]  9 (bayi 144
> str (yy)
 num [1:3, 1:4], 1 4 9 16 25 36 49 64 81 100 ...
 -attr (*, "dimnames") =list of 2
 . $: NULL
 . $: CHR [1:4] "X1" "X2" "X3" "X4"
> str (y)
List of 4
 $ x1:num 9
 $ x2:num
 $ X 3:num
 $ x4:num

> yy<-sapply (a.df, function (x,y) x^2+y, y=3)
> yy
X1 X2 X3 X4
[1,] 4 19 52 103
[2,] 7 28 67 124
[3,] 147> y1<-sapply (a.df, sum)
> y1
X1 X2 X3 X4
6 15 24 33
> str (y1)
Named int [1:4] 6 15 24 33
-attr (*, "names") = CHR [1:4] "X1" "X2" "X3" "X4"
> y1<-sapply (A.DF, sum,simplify=f)
> y1
$X 1
[1] 6

$X 2
[1] 15

$X 3
[1] 24

$X 4
[1] 33

> str (y1)
List of 4
$ x1:int 6
$ x2:int 15
$ x3:int 24
$ x4:int 33

(4) Mapply:mapply is a multivariable version of sapply (multivariate sapply), Apply a Function to multiple List or Vector Arguments

Mapply (FUN, ..., Moreargs = NULL, simplify = TRUE, use. NAMES = TRUE)

> mapply (function (x,y) x^y, C (1:5), C (1:5))
[1] 1 4 27 256 3125
> A<-matrix (1:12,c (3,4))
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> mapply (SUM, a[,1],a[,3],a[,4])
[1] 18 21 24

> mapply (function (x,y,z) x^2+y+z, a[,1],a[,3],a[,4])
[1] 18 23 30

(5) tapply (X, INDEX, FUN = NULL, ..., simplify = TRUE)

X is the vector to be processed, index is the factor (factor list), fun is the function that needs to be performed, simplify refers to whether to simplify the input result (consider sapply for lapply simplification)

Add a factor function GL, it can be very convenient to produce factors, in the analysis of variance is often used

> GL (3,5) 3 is the factor level, 5 is the number of repetitions
[1] 1 1 1 1 1 2 2 2 2-2 3 3 3 3 3
Levels:1 2 3
> GL (3,1,15) 15 is the total length of the result
[1] 1 2 3 1 2 3 1 2 3-1 2 3 1 2 3
Levels:1 2 3


> DF <-data.frame (Year=kronecker (2001:2003, Rep (1,4)), Loc=c (' Beijing ', ' Beijing ', ' Shanghai ', ' Shanghai '), type =rep (C (' A ', ' B '), 6), Sale=rep (1:12))
> DF
Year LOC Type Sale
1 2001 Beijing A 1
2 2001 Beijing B 2
3 2001 Shanghai A 3
4 2001 Shanghai B 4
5 2002 Beijing A 5
6 2002 Beijing B 6
7 2002 Shanghai A 7
8 2002 Shanghai B 8
9 2003 Beijing A 9
Ten 2003 Beijing B 10
One 2003 Shanghai A 11
2003 Shanghai B 12
> tapply (Df$sale,df[,c (' Year ', ' Loc ')],sum)
Loc
Year Beijing Shanghai
2001 3 7
2002 11 15
2003 19 23
> tapply (Df$sale,df[,c (' Type ', ' loc ')],sum)
Loc
Type Beijing Shanghai
A 15 21
B 18 24

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.