R study Notes "six" R Language Beginner's guide--accessing variables, working with subsets of data

Source: Internet
Author: User

Note: Be sure to save your workspace before you close R to ensure continuity of learning. The effect of the console commands of the previous data and the associated variables are still stored in memory.

1 Accessing Data frame variables

Recommendation: View the variables to be processed in the read.table command execution names

Names (Squid) [1] "Sample"   "Year"     "Month"    "Location" "Sex"      "GSI"     

1.1 str function

The STR function can view the properties of each variable in the data frame:

STR (Squid) ' data.frame ':   2644 obs. of  6 variables: $ Sample  : int  1 2 3 4 5 6 7 8 9 ... $ year    : int  1 1 1 1 1 1 1 1 1 1 ... $ Month   : int  1 1 1 1 1 1 1 1 1 2 ... $ location:int  1311111331.. . $ Sex     : int  2 2 2 2 2 2 2 2 2 2 ... $ GSI     : num  10.44 9.83 9.74 9.31 8.99 ...

  

Sample, yead,month,location,sex These variables are integral type

GSI This variable is a numeric type

GSI This variable is present in the data frame squid and cannot be viewed in the R console by entering the GSI

GSI Error: Object ' GSI ' not found


Data parameters in 1.2 functions--best way to access variables in a data frame

M1 <-lm (GSI ~ factor (location) +factor (year), data = Squid) M1

Call:
LM (formula = GSI ~ factor) + factor (year), data = Squid)


Coefficients:
(Intercept) Factor 2 factor (location) 3 factor (location) 4
1.3939-2.2178-0.1417 0.3138
Factor (year) 2 factor (year) 3 factor (year) 4
1.3548 0.9564 1.2270

LM is a function of linear regression, data = squid means taking variables from the data frame squid

data = is not applicable to any function, eg:

Mean (Gsi,data = squid) error in mean (GSI, data = Squid): Object ' GSI ' 1.3 $ Symbolic access variable is another method Squid$gsi squid$gsi[1] 10.4432  9.833 1  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045[9]  7.2156  6.8372  6.3882  6.3672  6.2998  6.0726  5.8395  5.8070[17]  5.7774  5.7757  5.6484  5.6141  5.6017  5.5510  5.3110  5.2970[25]  5.2253  5.1667  5.1405  5.1292  5.0782  5.0612  5.0097  4.9745

Or

squid[,6]

SQUID[,6][1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045[9]  7.2156  6.8372  6.3882  6.3672  6.2998  6.0726  5.8395  5.8070[17]  5.7774  5.7757  5.6484  5.6141  5.6017  5.5510  5.3110  5.2970[25]  5.2253  5.1667  5.1405  5.1292  5.0782  5.0612  5.0097  4.9745

At this point the average can be calculated by mean

1.4 Attach function

The attach function adds a data frame to the search path of R, where you can view the GSI data directly from the GSI command

Attach (Squid) gsi[1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045[9]  7.2156  6.8372  6.3882  6.3672  6.2998  6.0726  5.8395  5.8070[17]  5.7774  5.7757  5.6484  5.6141  5.6017  5.5510  5.3110  5.2970[25]  5.2253  5.1667  5.1405  5.1292  5.0782  5.0612  5.0097  4.9745

At this point, you can use the relevant function directly.

BoxPlot (GSI)

(The sum, can not understand this figure)

Using the attach function obviously should be careful to ensure that the variable name is unique, and if it is the same as the R comes with a function name or variable, it will definitely be problematic.

Attach Use Summary:

(1) To avoid copying variables, avoid input squid$gsi more than two times

(2) The use of the attach command should guarantee the uniqueness of the variable

(3) If you are working with multiple datasets and only one data set at a time, use the Detach function to remove the dataset from the R search path

2 Accessing datasets

First execute the Detach (SQUID) command!!!

View the value of sex in squid

SQUID$SEX[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2[36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1

  

Show displacement values

Unique (Squid$sex) [1] 2 1

Where 1 means the male 2 indicates that the female

Sel <-Squid$sex = = 1SquidM <-squid[sel,]squidm     Sample year Month location    Sex       GSI24    1     5< C5/>1   1 5.297048    1     5        3   1 4.296858 1       6 1 1 3.500860    1     6        1   1 3.248761 1 6 1   1 3.2304

Sel <-Squid$sex = = 1 This command generates a vector with the same length as sex, if the value of sex equals 1, the value of the variable is true, otherwise false, such a variable can be called a Boolean variable and can be used to select rows.

SQUIDM <-Squid[sel,] This command indicates the choice of a line in Squid where Sel equals True and stores the data in SQUIDM. Because you are selecting rows, you need to use the square width.

The third chapter is to be continued ...

Go on

Get Female data

SQUIDF <-Squid[squid$sex = = 2,]SQUIDF     Sample year Month location Sex     GSI1         1    1     1        1   2 10.44322         2    1     1        3   2  9.83313         3    1     1        1   2  9.73564         4    1     1        1   2  9.31075         5    1     1        1   2  8.9926

The following few commands do not explain:

Unique (squid$location) Squid123 <-squid[squid$location = = 1 | Squid$location ==2 |  Squid$location = = 3,]squid123 <-squid[squid$location! = 4,]squid123 <-squid[squid$location < 4,]Squid123 <- Squid[squid$location <=3,]squid123 <-squid[squid$location >=1 &squid$location <=3,]

  

is to get a row with a location value of

Unique (squid$location) [1] 1 3 4 2squid123 <-squid[squid$location = = 1 | Squid$location ==2 | Squid$location = = 3,]squid123     Sample year Month location Sex     GSI1         1    1     1        1   2 10.44322         2    1     1        3   2  9.83313         3    1     1        1   2  9.73564         4    1     1        1   2  9.31075         5    1     1        1   2  8.99266         6    1     1        1   2  8.7707

Get a male data row with a location value of 1

squidm.1 <-Squid[squid$sex = = 1 & squid$location = 1,]squidm.1     Sample year Month location Sex    gsi24
   24    1     5        1   1 5.297058       1 6 1   1 3.500860    1     6        1   1 3.2487

Get male data with a position of 1 or 2

squidm.12 <-Squid[squid$sex = = 1 & (squid$location = = 1 | Squid$location = = 2),]squidm.12     Sample year Month location    Sex       GSI24    1     5        1   1 5.297058    1     6        1   1 3.500860    1 6 1   1 3.2487

Attention! :
SquidM1 <-Squidm[squid$location = = 1,] SquidM1         Sample year Month location Sex    GSI    1     5        1   1 5.2970    1     6        1   1 3.5008 ............. Na-na Na na na na     nana.1        na     na na na na nana.2        NA   Na na na na nana.3 na na na na na     nana.4        na   na    NA       Na Na na .....     

Cause Analysis:

The previously obtained SQUIDM represents the male data, apparently squidm the number of rows is inconsistent with the length of the Squid$location = = 1 Boolean vector. So the export appears above the phenomenon.

2.1 Sorting Data

Ord1 <-Order (Squid$month) Squid[ord1,]     Sample year Month location Sex     GSI1         1    1     1        1   2 10.44322         2    1     1        3   2  9.83313         3    1     1        1   2  9.73564         4    1     1        1)   2  9.3107

Sort by Month

You can also sort on only one variable

SQUID$GSI[ORD1][1] 10.4432  9.8331  9.7356  9.3107  8.9926  8.7707  8.2576  7.4045[ 9]  7.2156  6.3882  6.0726  5.7757  1.2610  1.1997  0.8373  0.6716[17]  0.5758  0.5518  0.4921  0.4808  0.3828  0.3289  0.2758  0.2506[25]  0.2092  0.1792  0.1661  0.1618  0.1543  0.1541  0.1490  0.1379

3 combining two datasets with the same identifier

SETWD ("E:/r/r-beginer-guide/data/rbook") Sql1 <-read.table (file = "Squid1.txt", Header = TRUE) Sql2 <-read.table ( File = "Squid2.txt", Header = TRUE) squidmerged <-merge (Sql1,sql2,by = "Sample") squidmerged     Sample     GSI Year MONTH location Sex1         1 10.4432    1     1        1   2  9.8331    1     1 3 +         3  9.7356    1     1        1         5  8.9926    1     1 1  + 6 8.7707    1     1        1         7  8.2576    1     1        1   2

The merge command uses two data frame SQL1, Sql2 as a parameter and uses the variable sample as the identity of the shape to match and two data. The merger function also has an option of all, and the default state value is false: that is, if the value in SQL1 or SQL2 is missing, it is ignored. If the value of all is set to True, an NA value may be produced

Sql11 <-read.table (file = "Squid1.txt", Header = True) Sql21 <-read.table (file = "Squid2.txt", Header = True) Squidmer Ged1 <-Merge (Sql11,sql21,by = "Sample") SquidMerged1

  

Well, there seems to be no na, it appears that the data is not lost

4 Output data

Output data as an ASCII file via write.table

Write.table (squidm,file = "Malesquid_wujiahua.txt", Sep = "", quote =  False,append = False,na = "NA")

  

View working directory, generate a malesquid_wujiahua.txt file,

Open it:

Sample year Month location Sex GSI24 24 1 5 1 1 5.29748 48 1 5 3 1 4.296858 58 1 6 1 1 3.500860 60 1 6 1 1 3.248761 61 1 6 1 1 3.2304

  

Description

Write.table The first parameter represents the data to be output, the second parameter is the file name of the data save, Sep = "" The data is separated by a space, qoute=false the quotation marks of the string are eliminated, na= "NA" means that the missing value is replaced by NA. Append=true means adding data to the end of a file

5 recoding categorical variables

STR (Squid) ' data.frame ':   2644 obs. of  6 variables: $ Sample  : int  1 2 3 4 5 6 7 8 9 ... $ year    : int  1 1 1 1 1 1 1 1 1 1 ... $ Month   : int  1 1 1 1 1 1 1 1 1 2 ... $ location:int  1 3 1 1 1 1 1 3 3 1 ... $ S Ex     : int  2 2 2 2 2 2 2 2 2 2 ... $ GSI     : num  10.44 9.83 9.74 9.31 8.99 ...

Where the values of sex and Locaton are determined, belong to categorical variables.

Generate new variables in a data frame based on categorical variables in general

Squid$flocation <-Factor (squid$location) Squid$fsex <-factor (squid$sex) squid$flocation   [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1  [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 squid$fsex    [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2   [36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1   ...  .. ..... [[+] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1  ...  ..... ... .....................  ..................... [2591] 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 2 2 1 1 1 1[2626] 1 1 1 1 1 2 1 1 1 2 1 2 1 2 1 2 1 1levels:1 2

Flocation and Fsex are just nominal variables, and f means they're factors.

levels:1,2 can modify it

Squid$fsex <-factor (squid$sex,levels = C (UP), labels = c ("M", "F")) Squid$fsex   [1] f F f f F F. F F F f F F. F F f F F F f f f F m F F f f f F F. F F f f f [+] f F f f f F m F F f f F m F F f F F. F F f m m M m m  . ................  ..................  

In this way each 1 is replaced by M, 2 is replaced by F

Using the reclassify factor variable

BoxPlot (GSI ~ fsex,data = Squid)

M1 <-lm (GSI ~ fsex+flocation,data = squid) m1call:lm (formula = GSI ~ Fsex + flocation, data = squid) coefficients: (Inter CEPT)        fsexf   fLocation2   fLocation3   fLocation4       1.3593       2.0248      -1.8552      -0.1425       0.5876

Summary (M1) call:lm (formula = GSI ~ Fsex + flocation, data = Squid) residuals:    Min      1Q  Median      3Q     Max-3 .4137-1.3195-0.1593  1.2039 11.2159 coefficients:            Estimate Std. Error t value Pr (>|t|)   (Intercept)  1.35926    0.07068  19.230   <2e-16 ***fsexf        2.02481    0.09427  21.479   <2e-16 * * * FLocation2  -1.85525    0.20027  -9.264   <2e-16 ***flocation3  -0.14248    0.12657  -1.126   0.2604   fLocation4   0.58756    0.34934   1.682   0.0927.---signif. Codes:  0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1  residual standard error:2.415 on 2639 degrees of Freedommultiple r-squared:0.1759,     adjusted R-square d:0.1746f-statistic:140.8 on 4 and 2639 DF,  p-value: < 2.2e-16

  

(There is only one Insert script feature found)

M2 <-lm (GSI ~ factor (Sex) +factor (location), data = Squid) Summary (M2) call:lm (formula = GSI ~ factor (Sex) + factor (Locat ION), data = Squid) residuals:    Min      1Q  Median      3Q     max-3.4137-1.3195-0.1593  1.2039 11.2159 Coefficients:                  Estimate Std. Error t value Pr (>|t|)   (Intercept)        1.35926    0.07068  19.230   <2e-16 ***factor (Sex) 2       2.02481    0.09427  21.479   < 2e-16 ***factor (location) 2-1.85525    0.20027  -9.264   <2e-16 ***factor (location) 3-0.14248    0.12657  -1.126   0.2604   factor (location) 4  0.58756    0.34934   1.682   0.0927.--- Signif. Codes:  0 ' * * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1  residual standard error:2.415 on 2639 degrees of Freedommultiple r-squared:0.1759,     adjusted R-square d:0.1746f-statistic:140.8 on 4 and 2639 DF,  p-value: < 2.2e-16

  

The estimated parameters are consistent, but the second method occupies a larger screen space, which is legendary as a serious problem in the second and third order interactions.

Squid$flocation   [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1  [36] 1 1 1 1 3 1 1 1 1 3 1131111111311111311111111........ [2626] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1levels:1 2 3 4

  

Levels: The order can be changed

Squid$flocation <-Factor (squid$location,levels= C (2,3,1,4)) squid$flocation   [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1  3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1  [approx] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1  [1] 1 1 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 1 3 1 ...  ] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1levels:2 3 1 4
BoxPlot (GSI ~ flocation,data = Squid)

  

Attention:

SQUIDM <-Squid[squid$sex = = 1,]squidm <-Squid[squid$fsex = = "1"]

  

After defining the Fsex factor, the above two formulations are the same effect.

But 1 has double quotation marks is necessary, because fsex is a factor

After you define a new variable, you can also view it through the str command

Squid$fsex <-factor (squid$sex,labels = C ("M", "F")) Squid$flocation <-factor (squid$location) str (SQUID) ' Data.frame ':   2644 obs. of  8 variables: $ Sample   : int  1 2 3 4 5 6 7 8 9 ten ... $ year     : int  1 1 1 1 1 1 1 1 1 1 ... $ Month    : int  1 1 1 1 1 1 1 1 1 2 ... $ location:int  1 3 1 1 1 1 1 3 3 1 ... $ Sex      : I NT  2 2 2 2 2 2 2 2 2 2 ... $ GSI      : num  10.44 9.83 9.74 9.31 8.99 ... $ flocation:factor W/4 levels "1", "2 "," 3 "," 4 ": 1 3 1 1 1 1 1 3 3 1 ... $ fsex     : Factor w/2 Levels" M "," F ": 2222222222...

The third chapter summarizes:

Write.table writes a variable to an ASCII file write.table (squid,file= "test.txt")

Order determines the sort order (x) of the data

Merge merges two data frame Merege (a,b,by= "ID")

STR displays the internal structure of an object str (SQUID)

Factor defining variables as factor factor (X)

R study Notes "six" R Language Beginner's guide--accessing variables, working with subsets of data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.