Reshape2 Data manipulation Data fusion (CAST)

Source: Internet
Author: User

When we do the data analysis, the operation of the data is also an extremely important content, here we also introduce the powerful package reshape2, several of the functions, the operation of the data cast and melt two functions absolutely indispensable.

The first is cast, which converts the long data into any wide data you want,

Dcast (data, formula, fun.aggregate = NULL, ..., margins = null, subset = NULL, fill = null, drop = TRUE, Value.var = guess _value (data))

Acast (data, formula, fun.aggregate = NULL, ..., margins = null, subset = NULL, fill = null, drop = TRUE, Value.var = guess _value (data))

The difference between acast,dcast is the result of the output. The output of the Acast is vector/matrix/array,dcast and the result is data.frame.

Parameters:

Data frame to be converted

Formula the formula used for conversion

Fun.aggregate aggregation function, the expression is: Row variable ~ column variable ~ three-dimensional variable ~ ..., in addition,. Indicates that there is no data column, ... Represents all columns of data before or after

Margins for adding boundary summary data

Subset is used to add filter conditions and needs to load PLYR package

The other three parameters are used in relatively few cases.

Here's a look at some concrete examples.

Build a data set first

X<-data.frame (Id=1:6,              name=c ("Wang", "Zhang", "Li", "Chen", "Zhao", "song"),              shuxue=c (89,85,68,79,96,53) ,              yuwen=c (77,68,86,87,92,63))
X

The data is first melted using the melt function.

Library (RESHAPE2)
X1<-melt (x,id=c ("id", "name")) x1

You can see that the data has been turned into long data (described in detail later in the Melt function).

The next step is to manipulate the data in various variants.

Acast (x1,id~variable)

Dcast (x1,id~variable)

Judging from the above two execution results, we can see the difference between Acast and dcast.

Here the Acast output omits the ID column, and dcast outputs the ID column

Acast (x1,id~name~variable)

Three-dimensional case Acast output is an array, and dcast error, because the Dcast output is a data frame.

Dcast (x1,id~variable,mean,margins=t)

 

As you can see, two columns of aggregated data are the result of averaging the rows.

Dcast (x1,id~variable,mean,margins=c ("id"))

Only the column to average, of course, can also be averaged over the line, the ID is changed to variable.

Library (PLYR) Dcast (x1,id~variable,mean,subset=. ( id==1|id==3))

The subset filter function here is powerful for a variety of filtering operations, similar to the role of filter.

Dcast (x1,id+name~variable)

The data is restored to its original shape.

Dcast (X1,variable~name)

Swap the rows and columns.

Acast (X1,variable~id+name)

Here, we have really realized the strong cast, the data can almost be converted into any form.

Similar to the PivotTable report in Excel.

Reshape2 Data manipulation Data fusion (CAST)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.