R language Study 13th: Reshaping Data with reshape2 packages

Last Update:2018-07-18 Source: Internet

Author: User

Tags vars

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data remodeling typically uses the Reshape2 package, which is used to convert between wide data and long data, because the RESHAPE2 package is not in the default installation package for R, and needs to be installed and referenced before first use:

Install.packages ("Reshape2") library (reshape2)

Reshape the data by first merging the data (melt) so that each row has a unique identifier-variable combination, and then reshape (cast) the data to any shape you want. During the reshaping process, you can use any function to consolidate the data, or you can convert the long format to a wide format, which is similar to Excel's perspective and inverse perspective.

one, recognize the wide data

Create sample data that shows a wide format called Data , also called wide data :

> ID <-C (1,1,2,2) > Time <-C (1,2,1,2) > X1 <-C (5,3,6,2) > X2 <-c (6,5,1,4) > Myda Ta <- data.frame (id,time,x1,x2)

As shown in the wide format, the combination of ID and time is unique, and X1 and X2 are the observed variable values for that line:

  ID time X1 X21  1    1  5  6 2  1    2  3  5 3  2    1  6  1 4  2    2  2  4

two, fusion data

Data set fusion refers to refactoring a dataset into a specific format: each observation variable has a single row, each row has a unique identification variable to identify each observation, and we use the melt () function to dissolve the data frame:

Melt (data,id.vars,measure.vars,variable.name='variable',..., na.rm=false,value.name= ' value ', factorasstrings=true)

Parameter comment:

Data: A converged frame
Id.vars: A vector of identified variables used to identify observed variables
Measure.vars: vectors consisting of observed variables
Variable.name: The name of the variable used to hold the original variable name
Value.name: The name used to hold the original value

example, the identity variables are IDs and time,x1 and X2 as observation variables:

MD <-Melt (Mydata,id=c ("ID","time"), Measure=c ("  X1","X2"))

After data fusion, it becomes the so-called long format, also known as Long data:

ID Time Variable Value1  1    1X152  1    2X133  2    1X164  2    2X125  1    1X266  1    2X257  2    1X218  2    2X24

Note: You must specify the variables (ID and time) that are required to uniquely determine each observation, and variables that represent the names of the observed variables (X1 and X2) are created automatically by the program, and as you can see from the results, the function automatically creates two variables: variable and value, which are the default names. This can be defined in the melt () function, through the parameters variable.name= "New_variable_name" and Value.name= "New_value_name".

MD <-Melt (Mydata,id=c ("ID","time"), Measure=c ("  X1","X2""measuredvariable"  "intvalue")

Third, reshape the data

The Dcast () function reads the fused data frame (d refers to data frame) and reshapes the data set using formula and the functions used to consolidate the data.

Dcast (data, formula, fun.aggregate = NULL, ..., margins = NULL,  subset = NULL, fill = null, drop = TRUE,  v Alue.var = Guess_value (data))

Parameter comment:

Data: A converged frame
Formula: a result set format for specifying output
Fun.aggregate: Used to specify aggregate functions to perform aggregation operations on aggregated data
Margins: equivalent to row totals and column totals in a pivot table
Subset: Select data that satisfies some specific value, which is equivalent to the filter for the Excel pivot table. For example, subset =. (Variable = = "Length")

The format of the parameter formula is:

Rowvar1 + rowvar2 + ...  ~  colvar1 + colvar2 + ...

In this formula, Rowvar defines the reserved variable names to uniquely determine the contents of each row, and Colvar defines the variable names that need to be reshaped to determine the values of each column. The meaning of remodeling is: According to Rowvar, expand Colvar and perform an aggregate operation on value (when fun.aggregate is an aggregate function).

1, expand Colvar

The process of expanding Colvar is actually the process of converting a column value to a column name, which is determined by the formula parameter.

The special case in the reshaping operation is the inverse operation of data fusion, which transforms the long format of data into the wide format of data, that is, converting the fused data into the original data format, for this operation, the format of the formula parameter is fixed: the identity variable ~variable.

> dcast (md,id+time~variable)  ID time X1 X21  1    1  5  1    2 3  /  2    1  6  2    2  2  4

2, the observed variables are aggregated

The average of the observed variables is computed by ID:

> Dcast (md,id~variable,mean)  ID X1  X21  1  4 5.52  2  4 2.5

This operation, similar to the grouping aggregation: Group by ID, calculates the aggregate values of the variables X1 and X2, respectively.

3, add a total column

Calculates the mean of the X1 and X2 grouped by ID and calculates the mean values of each column according to the ID of the remodeling, and calculates the mean of each row according to X1 and X2.

> dcast (md,id~variable,mean,margins = C ("ID","variable" ))     ID X1  X2 (All)1     1  45.5  4.75  2     2  42.5  3.253 (All)  4 4.0  4.00

The process of calculation is:

Calculates the mean value of each column by ID: The value of X1 is (5.5+2.5)/2=4

Calculates the mean of each row by variable: The mean value of the first row is (4+5.5)/2=4.75

Reference Documentation:

Data reverse perspective and pivot using RESHAPE2 package

R language Study 13th: Reshaping Data with reshape2 packages

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More