Do data preprocessing has been using hardly Wickham Plyr software package, the amount of data is slightly larger, the basic use of data.table software package. Hardly Wickham's Dplyr package has been out for some time, and has improved in performance. For later use, make some notes.
These five functions provide the basis of a language of data manipulation. At the more basic level, you can only alter a tidy data frame in five useful ways:you can reorder the rows ( arrange()
), pick Observations and variables of interest ( filter()
select()
and), add new variables that is functions of existing variables ( ) or collapse many values to a summary ( summarise()
). The remainder of the language comes from applying the five functions to different types of data, like to grouped data, as Described next.
Example 1:plyr: Comparison of:d dply and dplyr::group_by
1 System.time ({2Plans <-group_by (flights, Tailnum)3Delay <-summarise (plans,4Count =N (),5Dist = mean (distance, na.rm=T),6Delay = mean (arr_delay,na.rm =T)7 ) 8 })9 Ten User System Elapsed One0.092 0.003 0.097 A - System.time ({ -Ddply (flights,'Tailnum', function (x) data.frame (Count=nrow (x), Dist=mean (x$distance,na.rm=t), Delay=mean (x$arr_delay,na.rm=T ))) the }) - - User System Elapsed -2.467 0.016 2.500
Use of Dplyr