Dplyr is the tool for HW data processing
Introduction of Dplyr Dplyr
Memo: distinct to heavy
Distinct (select (flights, tailnum))
#> source:local Data frame [4,044 x 1]
#>
#> tailnum
#> (CHR)
#> 1 N14228
#> 2 N24211
#> 3 n619aa
#> 4 N804JB
#> . ...
Sample_n; Sample_frac sampling
Sample_n (flights, ten) #> source:local data frame [x] #> #> Year Month day Dep_time Dep_delay Arr_ti Me Arr_delay carrier tailnum #> (int) (int) (int) (int) (DBL) (int) (DBL) (CHR) (CHR) #> 1 20 6 1428-7 1733-17 DL n370nw #> 2 2013 3 23 600-7 95 5 UA N510ua #> 3 3 1814 3 B6 N238JB #> 4 201 3 2 1957-10 2156-10 EV N19966 #>. ..... #> Variables not shown:flight (int), origin (CHR), and ..... ....???? ), Dest (CHR), Air_time #> (dbl), distance (dbl), hour (dbl), Minute (dbl) Sample_frac (flights, 0.01) #> source:l Ocal data frame [3,368 x] #> #> Year Month day Dep_time dep_delay arr_time arr_delay carrier Tailnum #> (int) (int) (int) (int) (DBL) (int) (DBL) (CHR) (CHR) #> 1 5 7 1156-4 1321-17 UA N432ua #> 2 2013 4 18 1 543-5 1755-25 DL N369NB #> 3 3 1408-7 1623-8 DL N344NB #> 4 1 2001 2211, MQ n526mq #>. ..... #> Variables not shown:flight (int), origin (CHR), and ..... ....???? ), Dest (CHR), Air_time #> (dbl), distance (dbl), hour (dbl), Minute (dbl)
Top_n
Top N Lag
Implements the function of the diff class Tally
Summarise (n ())