Data Remodeling __r

Source: Internet
Author: User
The data boxes to be analyzed in statistics usually have two forms: (1) Long data (stacked data), Long data is the variable values in one column, and the corresponding variable name in another column. (2) wide-type data (not stacked data), wide-format data is generally the variable value type of the same, and variables in the form of different columns. (commonly used) 1. For example, the first four-column subset of Iris is a typical wide-type data. For example, convert wide data to Long data: Data_w <-iris[,1:4] data_l <-Stack (data_w) head (data_l) VALUES IND 1 5.1 Th 2 4.9 sepal.length 3 4.7 sepal.length 4 4.6 sepal.length 5 5.0 sepal.length 6 5.4 sepal.length Data_w
t;-Unstack (data_l) as long as you have a category variable in a column, you can view it as a long type of data.

In the preceding example, the first four columns of iris can be thought of as wide data, but the last two columns can be considered as a long type of data. 2. The data can be converted to a wide type according to the species variable.
And get the average of each flower species.         > Subdata<-iris[,4:5] > Head (subdata) #长型数据 petal.width species 1 0.2 setosa 2 0.2 setosa 3 0.2 setosa 4 0.2 setosa 5 0.2 setosa 6 0.4 setosa > Data_w&lt        ;-unstack (Subdata) > Head (data_w) setosa versicolor virginica #宽型数据 1 0.2 1.4 2.5 2 0.2        1.5 1.9 3 0.2 1.5 2.1 4 0.2 1.3 1.8 5 0.2 1.5 2.2-6 0.4 1.3       2.1 > Colmeans (data_w) #列平均值 setosa versicolor virginica 0.246 1.326 2.026 Data Remodeling Meter Count 3.
In the above example, we first convert the data format and then compute the analysis results, and more commonly, we get the analysis results directly.  Library (reshape2) Dcast (Data=subdata, # Parsing Object formula=species~., # The way data is grouped value.var= ' petal.width ', # The numeric object to be computed Fun=mean) #
The calculation is species with the function name. The idea of 1 setosa 0.246 2 versicolor 1.326 3 virginica 2.026 is very similar to that of Dcast, which is based on the data of variable segmentation, then calculates the data after grouping, but the output format and function of aggregate are multidimensional

Conditions to be more convenient. 4. Fusion of a wide data into a long data, that is, the melt function.
For example, we fused the iris dataset.  Iris_long <-Melt (Data=iris, # objects to be fused id= ' species ') # which variables do not participate in the Fusion > Head (Iris_long) Species value 1 Setosa sepal.length 5.1 2 setosa sepal.length 4.9 3 setosa sepal.length 4.7 4 setosa sepal.length 4.6 5 Setos
A sepal.length 5.0 6 setosa sepal.length 5.4 A pure Long data, contains only one numeric variable, the others are classified variables.
and a pure wide-type data, it does not contain category variables, are numeric variables.

The real data is mostly mixed, as in the iris dataset. 5. The following example is a summary of the data that was generated before Dcast (Data=iris_long, formula=species~variable, value.var= ' value ', fun=mean), species SEPAL.L EngtH sepal.width 1 setosa 5.006 3.428 2 versicolor 5.936 2.770 3 virginica 6.588 2.974 petal.length petal.width 1 1.462 0.246 2 4.260 1.326 3 5.552 2.026 dcast function The use of the prerequisite data already exist classification variables, such as sex or smoke based on the classification of data to calculate the value of a numerical variable 6.
Small Practice Tips DataSet practice, which is a restaurant waiter collects data about tips, which contains seven variables, including total costs, tipping amount, sex of payer, smoking, date, day, customer number. Calculate whether customers of different genders will pay different tip ratios.
You can assemble the data by sex variable.
Alternatively, the tip amount is calculated by dividing the data by sex and size variables. Dcast (tips,sex~.,value.var= ' tip ', Fun=mean) dcast (tips,sex~size,value.var= ' tip ', Fun=mean) 7. Merge two data boxes to combine them into one data box by ID number.
You cannot use Cbind to merge because the order of IDs is not the same.
Use the merge function to combine two sets of data by ID, which is similar to the join in database operations. Datax <-data.frame (id=c (1,2,3) gender=c (23,34,41)) Datay <-data.frame (id=c), 3,1,2 (' Tom ', ' John ', ' Ken ' )) Merge (datax,datay,by= ' id ') 8.
Data splitting by a variable of conventional data is actually a subset, using the subset function can be completed.
Unconventional data splitting is done according to a category variable.
For example, you need to split the data by different flower attributes in the iris data, using the Split function. iris_splited <-Split (Iris,f=iris$species) class (iris_splited) #拆分后数据类型为列表list [1] "list" > Head (iris_spliteD[[1]]) 9.
The data is merged into the split function by a variable you can split a data box into multiple data boxes and exist in a list object.
Merging this list requires only the use of the Unsplit function. Unsplit (iris_splited,f=iris$species)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.