The R language provides a bulk processing function that iterates through all or part of the elements within a collection to simplify operations.
The bottom of these functions is achieved by C, so efficiency is more efficient than manual.
The batch processing function has very important apply family function: lapply sapply apply tapply mapply. The Apply family function is one of the methods to realize the computational quantization (vectorization) with high efficiency, and better performance is often obtained than the traditional for,while.
Apply: Used to traverse a row or column in an array and use the specified function to process its elements.
Lapply: Iterates through each element within the list vector and uses the specified function to process its elements. Returns the list vector.
Sapply: Basically the same as the lapply, it simply simplifies the return result and returns the normal vector.
Mapply: Supports passing in more than two lists.
Tapply: The access parameter index, which operates on a grouping of data, is the same as by group in SQL.
(1) Row or column traversal operation function apply
Apply (X, MARGIN, FUN, ...)
Parameters:
X:an array, including a matrix.
Margin:1: Row action; 2: Column operation
FUN: The name of the function
The result is the same as that of colmeans,colsums,rowmeans,rowsums, by using apply, which can easily be summed/averaged by ranks.
Examples are as follows:
> A<-matrix (1:12,c (3,4))
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> Apply (a,1,sum)
[1] 22 26 30
> Apply (a,2,sum)
[1] 6 15 24 33
> Apply (a,1,function (x) sum (x) +2)
[1] 24 28 32
> Apply (a,1,function (x) x^2)
[, 1] [, 2] [, 3]
[1,] 1 4 9
[2,] 16 25 36
[3,] 49 64 81
[4,] 100 121 144
(2) listing (list) traversal function lapply
Lapply (list, function, ...)
Features: operation on each column, very suitable for the data box; The data entered must be a list type.
> A<-matrix (1:12,c (3,4))
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> A.df<-data.frame (a)
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> is.list (A.DF)
[1] TRUE
> str (A.DF)
' Data.frame ': 3 OBS 4 variables:
$ x1:int 1 2 3
$ x2:int 4 5 6
$ x3:int 7 8 9
$ x4:int 10 11 12
> lapply (A.DF, function (x) x+3)
$X 1
[1] 4 5 6
$X 2
[1] 7 8 9
$X 3
[1] 10 11 12
$X 4
[1] 13 14 15
> lapply (A.DF, function (x) sum (x) +3)
$X 1
[1] 9
$X 2
[1] 18
$X 3
[1] 27
$X 4
[1] 36
> y<-lapply (A.DF, function (x) sum (x) +3)
> is.list (Y)
[1] TRUE
> Names (y)
[1] "X1" "X2" "X3" "X4"
> y
$X 1
[1] 9
$X 2
[1] 18
$X 3
[1] 27
$X 4
[1] 36
> Y[1]
$X 1
[1] 9
> Y[[1]]
[1] 9
> y$x1
[1] 9
(3) Sapply
Sapply (list, function, ..., simplify)
Simplify=f: The type of the return value is the list, which is exactly the same as the lapply
Simplify=t (default): The type of the return value is determined by the calculation, and if the function return value is 1, then sapply simplifies the list to vector;
If the length of each element in the returned list is greater than 1 and the length is the same, then sapply simplifies the bit by a matrix
> yy<-sapply (a.df, function (x) x^2)
> yy
X1 X2 x3 X4
[1,] 1 1 6
[2,] 4 121
[3,] 9 (bayi 144
> str (yy)
num [1:3, 1:4], 1 4 9 16 25 36 49 64 81 100 ...
-attr (*, "dimnames") =list of 2
. $: NULL
. $: CHR [1:4] "X1" "X2" "X3" "X4"
> str (y)
List of 4
$ x1:num 9
$ x2:num
$ X 3:num
$ x4:num
> yy<-sapply (a.df, function (x,y) x^2+y, y=3)
> yy
X1 X2 X3 X4
[1,] 4 19 52 103
[2,] 7 28 67 124
[3,] 147> y1<-sapply (a.df, sum)
> y1
X1 X2 X3 X4
6 15 24 33
> str (y1)
Named int [1:4] 6 15 24 33
-attr (*, "names") = CHR [1:4] "X1" "X2" "X3" "X4"
> y1<-sapply (A.DF, sum,simplify=f)
> y1
$X 1
[1] 6
$X 2
[1] 15
$X 3
[1] 24
$X 4
[1] 33
> str (y1)
List of 4
$ x1:int 6
$ x2:int 15
$ x3:int 24
$ x4:int 33
(4) Mapply:mapply is a multivariable version of sapply (multivariate sapply), Apply a Function to multiple List or Vector Arguments
Mapply (FUN, ..., Moreargs = NULL, simplify = TRUE, use. NAMES = TRUE)
> mapply (function (x,y) x^y, C (1:5), C (1:5))
[1] 1 4 27 256 3125
> A<-matrix (1:12,c (3,4))
> A
[, 1] [, 2] [, 3] [, 4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> mapply (SUM, a[,1],a[,3],a[,4])
[1] 18 21 24
> mapply (function (x,y,z) x^2+y+z, a[,1],a[,3],a[,4])
[1] 18 23 30
(5) tapply (X, INDEX, FUN = NULL, ..., simplify = TRUE)
X is the vector to be processed, index is the factor (factor list), fun is the function that needs to be performed, simplify refers to whether to simplify the input result (consider sapply for lapply simplification)
Add a factor function GL, it can be very convenient to produce factors, in the analysis of variance is often used
> GL (3,5) 3 is the factor level, 5 is the number of repetitions
[1] 1 1 1 1 1 2 2 2 2-2 3 3 3 3 3
Levels:1 2 3
> GL (3,1,15) 15 is the total length of the result
[1] 1 2 3 1 2 3 1 2 3-1 2 3 1 2 3
Levels:1 2 3
> DF <-data.frame (Year=kronecker (2001:2003, Rep (1,4)), Loc=c (' Beijing ', ' Beijing ', ' Shanghai ', ' Shanghai '), type =rep (C (' A ', ' B '), 6), Sale=rep (1:12))
> DF
Year LOC Type Sale
1 2001 Beijing A 1
2 2001 Beijing B 2
3 2001 Shanghai A 3
4 2001 Shanghai B 4
5 2002 Beijing A 5
6 2002 Beijing B 6
7 2002 Shanghai A 7
8 2002 Shanghai B 8
9 2003 Beijing A 9
Ten 2003 Beijing B 10
One 2003 Shanghai A 11
2003 Shanghai B 12
> tapply (Df$sale,df[,c (' Year ', ' Loc ')],sum)
Loc
Year Beijing Shanghai
2001 3 7
2002 11 15
2003 19 23
> tapply (Df$sale,df[,c (' Type ', ' loc ')],sum)
Loc
Type Beijing Shanghai
A 15 21
B 18 24