When we do the data analysis, the operation of the data is also an extremely important content, here we also introduce the powerful package reshape2, several of the functions, the operation of the data cast and melt two functions absolutely indispensable.
The first is cast, which converts the long data into any wide data you want,
Dcast (data, formula, fun.aggregate = NULL, ..., margins = null, subset = NULL, fill = null, drop = TRUE, Value.var = guess _value (data))
Acast (data, formula, fun.aggregate = NULL, ..., margins = null, subset = NULL, fill = null, drop = TRUE, Value.var = guess _value (data))
The difference between acast,dcast is the result of the output. The output of the Acast is vector/matrix/array,dcast and the result is data.frame.
Parameters:
Data frame to be converted
Formula the formula used for conversion
Fun.aggregate aggregation function, the expression is: Row variable ~ column variable ~ three-dimensional variable ~ ..., in addition,. Indicates that there is no data column, ... Represents all columns of data before or after
Margins for adding boundary summary data
Subset is used to add filter conditions and needs to load PLYR package
The other three parameters are used in relatively few cases.
Here's a look at some concrete examples.
Build a data set first
X<-data.frame (Id=1:6, name=c ("Wang", "Zhang", "Li", "Chen", "Zhao", "song"), shuxue=c (89,85,68,79,96,53) , yuwen=c (77,68,86,87,92,63))
X
The data is first melted using the melt function.
Library (RESHAPE2)
X1<-melt (x,id=c ("id", "name")) x1
You can see that the data has been turned into long data (described in detail later in the Melt function).
The next step is to manipulate the data in various variants.
Acast (x1,id~variable)
Dcast (x1,id~variable)
Judging from the above two execution results, we can see the difference between Acast and dcast.
Here the Acast output omits the ID column, and dcast outputs the ID column
Acast (x1,id~name~variable)
Three-dimensional case Acast output is an array, and dcast error, because the Dcast output is a data frame.
Dcast (x1,id~variable,mean,margins=t)
As you can see, two columns of aggregated data are the result of averaging the rows.
Dcast (x1,id~variable,mean,margins=c ("id"))
Only the column to average, of course, can also be averaged over the line, the ID is changed to variable.
Library (PLYR) Dcast (x1,id~variable,mean,subset=. ( id==1|id==3))
The subset filter function here is powerful for a variety of filtering operations, similar to the role of filter.
Dcast (x1,id+name~variable)
The data is restored to its original shape.
Dcast (X1,variable~name)
Swap the rows and columns.
Acast (X1,variable~id+name)
Here, we have really realized the strong cast, the data can almost be converted into any form.
Similar to the PivotTable report in Excel.
Reshape2 Data manipulation Data fusion (CAST)