Note: Be sure to save your workspace before you close R to ensure continuity of learning. The effect of the console commands of the previous data and the associated variables are still stored in memory.
1 Accessing Data frame variables
Recommendation: View the variables to be processed in the read.table command execution names
Names (Squid) [1] "Sample" "Year" "Month" "Location" "Sex" "GSI"
1.1 str function
The STR function can view the properties of each variable in the data frame:
STR (Squid) ' data.frame ': 2644 obs. of 6 variables: $ Sample : int 1 2 3 4 5 6 7 8 9 ... $ year : int 1 1 1 1 1 1 1 1 1 1 ... $ Month : int 1 1 1 1 1 1 1 1 1 2 ... $ location:int 1311111331.. . $ Sex : int 2 2 2 2 2 2 2 2 2 2 ... $ GSI : num 10.44 9.83 9.74 9.31 8.99 ...
Sample, yead,month,location,sex These variables are integral type
GSI This variable is a numeric type
GSI This variable is present in the data frame squid and cannot be viewed in the R console by entering the GSI
GSI Error: Object ' GSI ' not found
Data parameters in 1.2 functions--best way to access variables in a data frame
M1 <-lm (GSI ~ factor (location) +factor (year), data = Squid) M1
Call:
LM (formula = GSI ~ factor) + factor (year), data = Squid)
Coefficients:
(Intercept) Factor 2 factor (location) 3 factor (location) 4
1.3939-2.2178-0.1417 0.3138
Factor (year) 2 factor (year) 3 factor (year) 4
1.3548 0.9564 1.2270
LM is a function of linear regression, data = squid means taking variables from the data frame squid
data = is not applicable to any function, eg:
Mean (Gsi,data = squid) error in mean (GSI, data = Squid): Object ' GSI ' 1.3 $ Symbolic access variable is another method Squid$gsi squid$gsi[1] 10.4432 9.833 1 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045[9] 7.2156 6.8372 6.3882 6.3672 6.2998 6.0726 5.8395 5.8070[17] 5.7774 5.7757 5.6484 5.6141 5.6017 5.5510 5.3110 5.2970[25] 5.2253 5.1667 5.1405 5.1292 5.0782 5.0612 5.0097 4.9745
Or
squid[,6]
SQUID[,6][1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045[9] 7.2156 6.8372 6.3882 6.3672 6.2998 6.0726 5.8395 5.8070[17] 5.7774 5.7757 5.6484 5.6141 5.6017 5.5510 5.3110 5.2970[25] 5.2253 5.1667 5.1405 5.1292 5.0782 5.0612 5.0097 4.9745
At this point the average can be calculated by mean
1.4 Attach function
The attach function adds a data frame to the search path of R, where you can view the GSI data directly from the GSI command
Attach (Squid) gsi[1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045[9] 7.2156 6.8372 6.3882 6.3672 6.2998 6.0726 5.8395 5.8070[17] 5.7774 5.7757 5.6484 5.6141 5.6017 5.5510 5.3110 5.2970[25] 5.2253 5.1667 5.1405 5.1292 5.0782 5.0612 5.0097 4.9745
At this point, you can use the relevant function directly.
BoxPlot (GSI)
(The sum, can not understand this figure)
Using the attach function obviously should be careful to ensure that the variable name is unique, and if it is the same as the R comes with a function name or variable, it will definitely be problematic.
Attach Use Summary:
(1) To avoid copying variables, avoid input squid$gsi more than two times
(2) The use of the attach command should guarantee the uniqueness of the variable
(3) If you are working with multiple datasets and only one data set at a time, use the Detach function to remove the dataset from the R search path
2 Accessing datasets
First execute the Detach (SQUID) command!!!
View the value of sex in squid
SQUID$SEX[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2[36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1
Show displacement values
Unique (Squid$sex) [1] 2 1
Where 1 means the male 2 indicates that the female
Sel <-Squid$sex = = 1SquidM <-squid[sel,]squidm Sample year Month location Sex GSI24 1 5< C5/>1 1 5.297048 1 5 3 1 4.296858 1 6 1 1 3.500860 1 6 1 1 3.248761 1 6 1 1 3.2304
Sel <-Squid$sex = = 1 This command generates a vector with the same length as sex, if the value of sex equals 1, the value of the variable is true, otherwise false, such a variable can be called a Boolean variable and can be used to select rows.
SQUIDM <-Squid[sel,] This command indicates the choice of a line in Squid where Sel equals True and stores the data in SQUIDM. Because you are selecting rows, you need to use the square width.
The third chapter is to be continued ...
Go on
Get Female data
SQUIDF <-Squid[squid$sex = = 2,]SQUIDF Sample year Month location Sex GSI1 1 1 1 1 2 10.44322 2 1 1 3 2 9.83313 3 1 1 1 2 9.73564 4 1 1 1 2 9.31075 5 1 1 1 2 8.9926
The following few commands do not explain:
Unique (squid$location) Squid123 <-squid[squid$location = = 1 | Squid$location ==2 | Squid$location = = 3,]squid123 <-squid[squid$location! = 4,]squid123 <-squid[squid$location < 4,]Squid123 <- Squid[squid$location <=3,]squid123 <-squid[squid$location >=1 &squid$location <=3,]
is to get a row with a location value of
Unique (squid$location) [1] 1 3 4 2squid123 <-squid[squid$location = = 1 | Squid$location ==2 | Squid$location = = 3,]squid123 Sample year Month location Sex GSI1 1 1 1 1 2 10.44322 2 1 1 3 2 9.83313 3 1 1 1 2 9.73564 4 1 1 1 2 9.31075 5 1 1 1 2 8.99266 6 1 1 1 2 8.7707
Get a male data row with a location value of 1
squidm.1 <-Squid[squid$sex = = 1 & squid$location = 1,]squidm.1 Sample year Month location Sex gsi24
24 1 5 1 1 5.297058 1 6 1 1 3.500860 1 6 1 1 3.2487
Get male data with a position of 1 or 2
squidm.12 <-Squid[squid$sex = = 1 & (squid$location = = 1 | Squid$location = = 2),]squidm.12 Sample year Month location Sex GSI24 1 5 1 1 5.297058 1 6 1 1 3.500860 1 6 1 1 3.2487
Attention! :
SquidM1 <-Squidm[squid$location = = 1,] SquidM1 Sample year Month location Sex GSI 1 5 1 1 5.2970 1 6 1 1 3.5008 ............. Na-na Na na na na nana.1 na na na na na nana.2 NA Na na na na nana.3 na na na na na nana.4 na na NA Na Na na .....
Cause Analysis:
The previously obtained SQUIDM represents the male data, apparently squidm the number of rows is inconsistent with the length of the Squid$location = = 1 Boolean vector. So the export appears above the phenomenon.
2.1 Sorting Data
Ord1 <-Order (Squid$month) Squid[ord1,] Sample year Month location Sex GSI1 1 1 1 1 2 10.44322 2 1 1 3 2 9.83313 3 1 1 1 2 9.73564 4 1 1 1) 2 9.3107
Sort by Month
You can also sort on only one variable
SQUID$GSI[ORD1][1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045[ 9] 7.2156 6.3882 6.0726 5.7757 1.2610 1.1997 0.8373 0.6716[17] 0.5758 0.5518 0.4921 0.4808 0.3828 0.3289 0.2758 0.2506[25] 0.2092 0.1792 0.1661 0.1618 0.1543 0.1541 0.1490 0.1379
3 combining two datasets with the same identifier
SETWD ("E:/r/r-beginer-guide/data/rbook") Sql1 <-read.table (file = "Squid1.txt", Header = TRUE) Sql2 <-read.table ( File = "Squid2.txt", Header = TRUE) squidmerged <-merge (Sql1,sql2,by = "Sample") squidmerged Sample GSI Year MONTH location Sex1 1 10.4432 1 1 1 2 9.8331 1 1 3 + 3 9.7356 1 1 1 5 8.9926 1 1 1 + 6 8.7707 1 1 1 7 8.2576 1 1 1 2
The merge command uses two data frame SQL1, Sql2 as a parameter and uses the variable sample as the identity of the shape to match and two data. The merger function also has an option of all, and the default state value is false: that is, if the value in SQL1 or SQL2 is missing, it is ignored. If the value of all is set to True, an NA value may be produced
Sql11 <-read.table (file = "Squid1.txt", Header = True) Sql21 <-read.table (file = "Squid2.txt", Header = True) Squidmer Ged1 <-Merge (Sql11,sql21,by = "Sample") SquidMerged1
Well, there seems to be no na, it appears that the data is not lost
4 Output data
Output data as an ASCII file via write.table
Write.table (squidm,file = "Malesquid_wujiahua.txt", Sep = "", quote = False,append = False,na = "NA")
View working directory, generate a malesquid_wujiahua.txt file,
Open it:
Sample year Month location Sex GSI24 24 1 5 1 1 5.29748 48 1 5 3 1 4.296858 58 1 6 1 1 3.500860 60 1 6 1 1 3.248761 61 1 6 1 1 3.2304
Description
Write.table The first parameter represents the data to be output, the second parameter is the file name of the data save, Sep = "" The data is separated by a space, qoute=false the quotation marks of the string are eliminated, na= "NA" means that the missing value is replaced by NA. Append=true means adding data to the end of a file
5 recoding categorical variables
STR (Squid) ' data.frame ': 2644 obs. of 6 variables: $ Sample : int 1 2 3 4 5 6 7 8 9 ... $ year : int 1 1 1 1 1 1 1 1 1 1 ... $ Month : int 1 1 1 1 1 1 1 1 1 2 ... $ location:int 1 3 1 1 1 1 1 3 3 1 ... $ S Ex : int 2 2 2 2 2 2 2 2 2 2 ... $ GSI : num 10.44 9.83 9.74 9.31 8.99 ...
Where the values of sex and Locaton are determined, belong to categorical variables.
Generate new variables in a data frame based on categorical variables in general
Squid$flocation <-Factor (squid$location) Squid$fsex <-factor (squid$sex) squid$flocation [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 squid$fsex [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 [36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 ... .. ..... [[+] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 ... ..... ... ..................... ..................... [2591] 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 2 2 1 1 1 1[2626] 1 1 1 1 1 2 1 1 1 2 1 2 1 2 1 2 1 1levels:1 2
Flocation and Fsex are just nominal variables, and f means they're factors.
levels:1,2 can modify it
Squid$fsex <-factor (squid$sex,levels = C (UP), labels = c ("M", "F")) Squid$fsex [1] f F f f F F. F F F f F F. F F f F F F f f f F m F F f f f F F. F F f f f [+] f F f f f F m F F f f F m F F f F F. F F f m m M m m . ................ ..................
In this way each 1 is replaced by M, 2 is replaced by F
Using the reclassify factor variable
BoxPlot (GSI ~ fsex,data = Squid)
M1 <-lm (GSI ~ fsex+flocation,data = squid) m1call:lm (formula = GSI ~ Fsex + flocation, data = squid) coefficients: (Inter CEPT) fsexf fLocation2 fLocation3 fLocation4 1.3593 2.0248 -1.8552 -0.1425 0.5876
Summary (M1) call:lm (formula = GSI ~ Fsex + flocation, data = Squid) residuals: Min 1Q Median 3Q Max-3 .4137-1.3195-0.1593 1.2039 11.2159 coefficients: Estimate Std. Error t value Pr (>|t|) (Intercept) 1.35926 0.07068 19.230 <2e-16 ***fsexf 2.02481 0.09427 21.479 <2e-16 * * * FLocation2 -1.85525 0.20027 -9.264 <2e-16 ***flocation3 -0.14248 0.12657 -1.126 0.2604 fLocation4 0.58756 0.34934 1.682 0.0927.---signif. Codes: 0 ' * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1 residual standard error:2.415 on 2639 degrees of Freedommultiple r-squared:0.1759, adjusted R-square d:0.1746f-statistic:140.8 on 4 and 2639 DF, p-value: < 2.2e-16
(There is only one Insert script feature found)
M2 <-lm (GSI ~ factor (Sex) +factor (location), data = Squid) Summary (M2) call:lm (formula = GSI ~ factor (Sex) + factor (Locat ION), data = Squid) residuals: Min 1Q Median 3Q max-3.4137-1.3195-0.1593 1.2039 11.2159 Coefficients: Estimate Std. Error t value Pr (>|t|) (Intercept) 1.35926 0.07068 19.230 <2e-16 ***factor (Sex) 2 2.02481 0.09427 21.479 < 2e-16 ***factor (location) 2-1.85525 0.20027 -9.264 <2e-16 ***factor (location) 3-0.14248 0.12657 -1.126 0.2604 factor (location) 4 0.58756 0.34934 1.682 0.0927.--- Signif. Codes: 0 ' * * * * ' 0.001 ' * * ' 0.01 ' * ' 0.05 '. ' 0.1 ' 1 residual standard error:2.415 on 2639 degrees of Freedommultiple r-squared:0.1759, adjusted R-square d:0.1746f-statistic:140.8 on 4 and 2639 DF, p-value: < 2.2e-16
The estimated parameters are consistent, but the second method occupies a larger screen space, which is legendary as a serious problem in the second and third order interactions.
Squid$flocation [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [36] 1 1 1 1 3 1 1 1 1 3 1131111111311111311111111........ [2626] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1levels:1 2 3 4
Levels: The order can be changed
Squid$flocation <-Factor (squid$location,levels= C (2,3,1,4)) squid$flocation [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [approx] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 [1] 1 1 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 1 3 1 ... ] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1levels:2 3 1 4
BoxPlot (GSI ~ flocation,data = Squid)
Attention:
SQUIDM <-Squid[squid$sex = = 1,]squidm <-Squid[squid$fsex = = "1"]
After defining the Fsex factor, the above two formulations are the same effect.
But 1 has double quotation marks is necessary, because fsex is a factor
After you define a new variable, you can also view it through the str command
Squid$fsex <-factor (squid$sex,labels = C ("M", "F")) Squid$flocation <-factor (squid$location) str (SQUID) ' Data.frame ': 2644 obs. of 8 variables: $ Sample : int 1 2 3 4 5 6 7 8 9 ten ... $ year : int 1 1 1 1 1 1 1 1 1 1 ... $ Month : int 1 1 1 1 1 1 1 1 1 2 ... $ location:int 1 3 1 1 1 1 1 3 3 1 ... $ Sex : I NT 2 2 2 2 2 2 2 2 2 2 ... $ GSI : num 10.44 9.83 9.74 9.31 8.99 ... $ flocation:factor W/4 levels "1", "2 "," 3 "," 4 ": 1 3 1 1 1 1 1 3 3 1 ... $ fsex : Factor w/2 Levels" M "," F ": 2222222222...
The third chapter summarizes:
Write.table writes a variable to an ASCII file write.table (squid,file= "test.txt")
Order determines the sort order (x) of the data
Merge merges two data frame Merege (a,b,by= "ID")
STR displays the internal structure of an object str (SQUID)
Factor defining variables as factor factor (X)
R study Notes "six" R Language Beginner's guide--accessing variables, working with subsets of data