Data Remodeling _

Data Remodeling __r

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The data boxes to be analyzed in statistics usually have two forms: (1) Long data (stacked data), Long data is the variable values in one column, and the corresponding variable name in another column. (2) wide-type data (not stacked data), wide-format data is generally the variable value type of the same, and variables in the form of different columns. (commonly used) 1. For example, the first four-column subset of Iris is a typical wide-type data. For example, convert wide data to Long data: Data_w <-iris[,1:4] data_l <-Stack (data_w) head (data_l) VALUES IND 1 5.1 Th 2 4.9 sepal.length 3 4.7 sepal.length 4 4.6 sepal.length 5 5.0 sepal.length 6 5.4 sepal.length Data_w
t;-Unstack (data_l) as long as you have a category variable in a column, you can view it as a long type of data.

In the preceding example, the first four columns of iris can be thought of as wide data, but the last two columns can be considered as a long type of data. 2. The data can be converted to a wide type according to the species variable.
And get the average of each flower species.         > Subdata<-iris[,4:5] > Head (subdata) #长型数据 petal.width species 1 0.2 setosa 2 0.2 setosa 3 0.2 setosa 4 0.2 setosa 5 0.2 setosa 6 0.4 setosa > Data_w&lt        ;-unstack (Subdata) > Head (data_w) setosa versicolor virginica #宽型数据 1 0.2 1.4 2.5 2 0.2        1.5 1.9 3 0.2 1.5 2.1 4 0.2 1.3 1.8 5 0.2 1.5 2.2-6 0.4 1.3       2.1 > Colmeans (data_w) #列平均值 setosa versicolor virginica 0.246 1.326 2.026 Data Remodeling Meter Count 3.
In the above example, we first convert the data format and then compute the analysis results, and more commonly, we get the analysis results directly.  Library (reshape2) Dcast (Data=subdata, # Parsing Object formula=species~., # The way data is grouped value.var= ' petal.width ', # The numeric object to be computed Fun=mean) #
The calculation is species with the function name. The idea of 1 setosa 0.246 2 versicolor 1.326 3 virginica 2.026 is very similar to that of Dcast, which is based on the data of variable segmentation, then calculates the data after grouping, but the output format and function of aggregate are multidimensional

Conditions to be more convenient. 4. Fusion of a wide data into a long data, that is, the melt function.
For example, we fused the iris dataset.  Iris_long <-Melt (Data=iris, # objects to be fused id= ' species ') # which variables do not participate in the Fusion > Head (Iris_long) Species value 1 Setosa sepal.length 5.1 2 setosa sepal.length 4.9 3 setosa sepal.length 4.7 4 setosa sepal.length 4.6 5 Setos
A sepal.length 5.0 6 setosa sepal.length 5.4 A pure Long data, contains only one numeric variable, the others are classified variables.
and a pure wide-type data, it does not contain category variables, are numeric variables.

The real data is mostly mixed, as in the iris dataset. 5. The following example is a summary of the data that was generated before Dcast (Data=iris_long, formula=species~variable, value.var= ' value ', fun=mean), species SEPAL.L EngtH sepal.width 1 setosa 5.006 3.428 2 versicolor 5.936 2.770 3 virginica 6.588 2.974 petal.length petal.width 1 1.462 0.246 2 4.260 1.326 3 5.552 2.026 dcast function The use of the prerequisite data already exist classification variables, such as sex or smoke based on the classification of data to calculate the value of a numerical variable 6.
Small Practice Tips DataSet practice, which is a restaurant waiter collects data about tips, which contains seven variables, including total costs, tipping amount, sex of payer, smoking, date, day, customer number. Calculate whether customers of different genders will pay different tip ratios.
You can assemble the data by sex variable.
Alternatively, the tip amount is calculated by dividing the data by sex and size variables. Dcast (tips,sex~.,value.var= ' tip ', Fun=mean) dcast (tips,sex~size,value.var= ' tip ', Fun=mean) 7. Merge two data boxes to combine them into one data box by ID number.
You cannot use Cbind to merge because the order of IDs is not the same.
Use the merge function to combine two sets of data by ID, which is similar to the join in database operations. Datax <-data.frame (id=c (1,2,3) gender=c (23,34,41)) Datay <-data.frame (id=c), 3,1,2 (' Tom ', ' John ', ' Ken ' )) Merge (datax,datay,by= ' id ') 8.
Data splitting by a variable of conventional data is actually a subset, using the subset function can be completed.
Unconventional data splitting is done according to a category variable.
For example, you need to split the data by different flower attributes in the iris data, using the Split function. iris_splited <-Split (Iris,f=iris$species) class (iris_splited) #拆分后数据类型为列表list [1] "list" > Head (iris_spliteD[[1]]) 9.
The data is merged into the split function by a variable you can split a data box into multiple data boxes and exist in a list object.
Merging this list requires only the use of the Unsplit function. Unsplit (iris_splited,f=iris$species)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data Remodeling __r

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data Remodeling __r

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support