R language Data Conversion--plyr Package _r Language Foundation

Source: Internet
Author: User
Plyr Package Introduction

The PLYR package is a package written by Hadley Wickham to solve the split–apply–combine problem, with the motivation to provide a package solution that transcends the for loop and the built-in apply function family. Using the Plyr package, you can complete the split–apply–combine three steps simultaneously within a function to achieve maximum efficiency and simplicity for different data types.
The PLYR package is especially suitable for dealing with large dataset problems, such as the spatial location of spatial data or the time-point modeling of time series panel data, or the exploration of data in high-dimensional arrays, etc.

The author of this package, Hadley Wickham, can be said to be a great God-level figure in the R community. He has written 17 R packs, including the current statistical graphics of the big hot Ggplot2. The author notes that his research interests lie in the development of tools to simplify data analysis, especially data cleansing, organizing and exploring methods that transcend traditional statistics.

To illustrate the features and advantages of the PLYR package, see two examples below:

(1) For simple problems, the PLYR and apply functions have the same effect.
> M<-matrix (C (1:4,1,4,1:6), ncol=3)
> Apply (M,1,mean)
[1] 1.666667 3.333333 3.000000 4.000000
> aaply (M,1,mean)
1 2 3 4
1.666667 3.333333 3.000000 4.000000

(2) Here's a complicated one.
A linear regression model was set up for each flower, and the results were given by the iris Iris DataSet.

> Attach (Iris)
> Head (IRIS)
Sepal.length sepal.width petal.length petal.width species
1 5.1 3.5 1.4 0.2 Setosa
2 4.9 3.0 1.4 0.2 Setosa
3 4.7 3.2 1.3 0.2 Setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 Setosa
6 5.4 3.9 1.7 0.4 Setosa

To establish the regression equation,
Model <-function (x) {lm (Speal.length~speal.width, Data=x)}
Using the Apply function family, you need to split, compute, and merge (at least three statements are required):
> Pieces <-Split (Iris,list (Iris$species))
> Models <-lapply (Pieces,model)
#这两句也可替代为models <-by (iris[, 1:4], species, model)
> Result <-lapply (MODELS,COEF)
> Do.call (' rbind ', result)

It takes only two sentences to use a PYLR bag:
> Result1 <-dlply (Iris,. ( species), model)
> Result2 <-ldply (result1,function (x) coef (x))

As for the statement, you don't have to write it, the place is not enough. Use of PYLR Packages

(1) Naming rules
The basic set of functions for PYLR are as follows (according to 1.7.1 version):


Naming rule: The first three lines are the base type.
Depending on the input type and output type: A=array,d=data frame,l=list,_ indicates that the output is discarded. The first letter represents the input, and the 2nd letter represents the output.
The latter two lines correspond to the replicates and mapply functions of the Apply family, representing the case of n repetition and multivariate function parameters, and the 2nd letter representing the output type.
From the naming characteristics, we do not need to list the case of each function, as long as the input and output from the two aspects discussed separately. (2) Parameter description

These functions have two to three main parameters, depending on the type of input:
. A*ply (. Data,. Margins,. Fun, ...,. Progress = "None")
. D*ply (. Data,. variables,. Fun, ...,. Progress = "None")
. L*ply (. Data,. Fun, ...,. Progress = "None")
Parameter. Data is the way we use to fragment-compute-merge; margins or. variables describes the fragmentation;
The parameter. Fun represents the function to be processed, and more arguments are passed to the processing function; Progress is used to control the display of a progress bar.

(3) input

There are three types of input, each of which gives a different way of slicing.
In simple terms:
A*ply (): arrays (including matrices and vectors) are divided into low dimensional slices by dimension.
D*ply (): A data box is divided into subsets by a combination of variables.
L*ply (): Each element of the list is a fragment.

Therefore, the fragmentation of the input dataset depends not on the structure of the data, but on the method used.
An object using a*ply () fragment must correspond to dim () and accept multidimensional index, using d*ply () fragment to use Split () and cast to list, using l*ply (), need length () and [.
So the data box can be passed to a*ply (), which can be treated like a 2-D matrix, or passed to L*ply (), which is treated as a list of vectors.

Three different types of individual features:
(a): Input array (a*ply ())
A*ply () is a piecewise feature of the. Margins parameter, which is similar to apply.
For 2-D arrays,. Margins can take 1, 2, or C (1:2), corresponding 2-dimensional arrays of 3-D.
As pictured,




For 3-D arrays, there are 7 ways to slice:




Margins, which corresponds to a higher dimension, may face an explosive combination.

(b) Input Data box (d*ply ())

When using d*ply, you need to specifically specify the variables or variable functions used for grouping, which are evaluated first and then the entire data box.
There are several ways to specify:
• . (VAR1). Group data boxes According to the value of the variable var1
• Multiple variables. (A,B,C). will be grouped according to the interaction values of three variables.




This form of output is a bit complicated. If the output is an array, it has three dimensions, with the value of A,b,c as the dimension name respectively. If the output is a data box, it will contain the three additional columns that the A,B,C takes. If the output is a list, the list element name is the value of the a,b,c separated by the period.

• The character vector as a column name: C ("Var1", "var2").
• Formula ~ var1 + var2.


(c) Input list (l*ply ())
L*ply () does not require a function to describe how to fragment, because the list itself is divided according to the elements. Using L*ply () is equivalent to a*ply () effect on a one-dimensional array.

Original: http://site.douban.com/182577/widget/notes/10567181/note/246634257/

and several other integrated

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.