R language Study 13th: Reshaping Data with reshape2 packages

Source: Internet
Author: User
Tags vars

Data remodeling typically uses the Reshape2 package, which is used to convert between wide data and long data, because the RESHAPE2 package is not in the default installation package for R, and needs to be installed and referenced before first use:

Install.packages ("Reshape2") library (reshape2)

Reshape the data by first merging the data (melt) so that each row has a unique identifier-variable combination, and then reshape (cast) the data to any shape you want. During the reshaping process, you can use any function to consolidate the data, or you can convert the long format to a wide format, which is similar to Excel's perspective and inverse perspective.

one, recognize the wide data

Create sample data that shows a wide format called Data , also called wide data :

> ID <-C (1,1,2,2) > Time <-C (1,2,1,2) > X1 <-C (5,3,6,2) > X2 <-c (6,5,1,4) > Myda Ta <- data.frame (id,time,x1,x2)

As shown in the wide format, the combination of ID and time is unique, and X1 and X2 are the observed variable values for that line:

  ID time X1 X21  1    1  5  6 2  1    2  3  5 3  2    1  6  1 4  2    2  2  4
two, fusion data

Data set fusion refers to refactoring a dataset into a specific format: each observation variable has a single row, each row has a unique identification variable to identify each observation, and we use the melt () function to dissolve the data frame:

Melt (data,id.vars,measure.vars,variable.name='variable',..., na.rm=false,value.name= ' value ', factorasstrings=true)

Parameter comment:

    • Data: A converged frame
    • Id.vars: A vector of identified variables used to identify observed variables
    • Measure.vars: vectors consisting of observed variables
    • Variable.name: The name of the variable used to hold the original variable name
    • Value.name: The name used to hold the original value

example, the identity variables are IDs and time,x1 and X2 as observation variables:

MD <-Melt (Mydata,id=c ("ID","time"), Measure=c ("  X1","X2"))

After data fusion, it becomes the so-called long format, also known as Long data:

ID Time Variable Value1  1    1X152  1    2X133  2    1X164  2    2X125  1    1X266  1    2X257  2    1X218  2    2X24

Note: You must specify the variables (ID and time) that are required to uniquely determine each observation, and variables that represent the names of the observed variables (X1 and X2) are created automatically by the program, and as you can see from the results, the function automatically creates two variables: variable and value, which are the default names. This can be defined in the melt () function, through the parameters variable.name= "New_variable_name" and Value.name= "New_value_name".

MD <-Melt (Mydata,id=c ("ID","time"), Measure=c ("  X1","X2""measuredvariable"  "intvalue")

Third, reshape the data

The Dcast () function reads the fused data frame (d refers to data frame) and reshapes the data set using formula and the functions used to consolidate the data.

Dcast (data, formula, fun.aggregate = NULL, ..., margins = NULL,  subset = NULL, fill = null, drop = TRUE,  v Alue.var = Guess_value (data)) 

Parameter comment:

    • Data: A converged frame
    • Formula: a result set format for specifying output
    • Fun.aggregate: Used to specify aggregate functions to perform aggregation operations on aggregated data
    • Margins: equivalent to row totals and column totals in a pivot table
    • Subset: Select data that satisfies some specific value, which is equivalent to the filter for the Excel pivot table. For example, subset =. (Variable = = "Length")

The format of the parameter formula is:

Rowvar1 + rowvar2 + ...  ~  colvar1 + colvar2 + ...

In this formula, Rowvar defines the reserved variable names to uniquely determine the contents of each row, and Colvar defines the variable names that need to be reshaped to determine the values of each column. The meaning of remodeling is: According to Rowvar, expand Colvar and perform an aggregate operation on value (when fun.aggregate is an aggregate function).

1, expand Colvar

The process of expanding Colvar is actually the process of converting a column value to a column name, which is determined by the formula parameter.

The special case in the reshaping operation is the inverse operation of data fusion, which transforms the long format of data into the wide format of data, that is, converting the fused data into the original data format, for this operation, the format of the formula parameter is fixed: the identity variable ~variable.

> dcast (md,id+time~variable)  ID time X1 X21  1    1  5  1    2 3  /  2    1  6  2    2  2  4

2, the observed variables are aggregated

The average of the observed variables is computed by ID:

> Dcast (md,id~variable,mean)  ID X1  X21  1  4 5.52  2  4 2.5

This operation, similar to the grouping aggregation: Group by ID, calculates the aggregate values of the variables X1 and X2, respectively.

3, add a total column

Calculates the mean of the X1 and X2 grouped by ID and calculates the mean values of each column according to the ID of the remodeling, and calculates the mean of each row according to X1 and X2.

> dcast (md,id~variable,mean,margins = C ("ID","variable" ))     ID X1  X2 (All)1     1  45.5  4.75  2     2  42.5  3.253 (All)  4 4.0  4.00

The process of calculation is:

Calculates the mean value of each column by ID: The value of X1 is (5.5+2.5)/2=4

Calculates the mean of each row by variable: The mean value of the first row is (4+5.5)/2=4.75

Reference Documentation:

Data reverse perspective and pivot using RESHAPE2 package

R language Study 13th: Reshaping Data with reshape2 packages

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.