1. Reading and writing of CSV files
2. Data Set filtering
3. Simple random sampling sample function
Text:
1. Reading and writing of CSV files
- File reads: DF2 <-read.table ("c:\\users\\lee\\desktop\\r language \\dummyData.csv", header= TRUE, sep= ",")
- File written out: write.table (DF1, "c:\\users\\lee\\desktop\\r language \\dummyData.csv", sep= ",", Row.names=false)
2. Data Set filtering
Method One: Data frame method is Newdata[filter condition, filter condition]
> NewData <-read.table ("c:\\users\\lee\\desktop\\r language \\leadership.csv", header= TRUE, sep= ",")
> NewData
Manager Date Country gender age Q1 Q2 Q3 Q4 Q5
1 1 2014/10/27 US M 32 5 4 5 5 5
2 2 2014/10/28 US F 45 3 5 2 5 5
3 3 2014/10/29 UK F 25 3 5 5 5 2
4 4 2014/10/30 UK M 3 3 4 na Na
5 5 2014/10/31 UK F 99 2 2 1 2 1
>newdata<-Leadership[with (Leadership,which (gender== "M")),]
> NewData
Manager Date Country gender age Q1 Q2 Q3 Q4 Q5
1 1 2014/10/27 US M 32 5 4 5 5 5
4 4 2014/10/30 UK M 3 3 4 na Na
> newdata<-Leadership[with (Leadership, which (gender== "M" & Age>34)),]
> NewData
Manager Date Country gender age Q1 Q2 Q3 Q4 Q5
4 4 2014/10/30 UK M 3 3 4 na Na
Attention
> newdata<-Leadership[which (gender== "M" & age>34),]
Error in which (gender = = "M" & Age >): Object ' Gender ' not found# to indicate that the gender belongs to the data frame, otherwise it will go wrongMethod Two: Filter with subset function
Subset (DataSet, condition # Filter row, filter column) NewData <-subset (leadership, gender== "M" & Age>25,select=c (GENDER:Q2)) # column only takes GENDER:Q2 column
> NewData
Gender Age Q1 Q2
1 M 32 5 4
4 M 39 3 3
> NewData <-subset (leadership, gender== "M" & Age>25,select=gender:q5)
> NewData
Gender age Q1 Q2 Q3 Q4 Q5
1 M 32 5 4 5 5 5
4 M 3 3 4 na Na
>
3. Simple random sampling sample function
ID <-sample (1:2,nrow (Iris), Replace=true,prob=c (0.7,0.3)) #1:2 indicates that at 1:2 this interval, Replace=true has the decimation nrow (IRIS) values put back, nrow ( IRIS) is a number, that is, the number of iris observations, the number of records, how many rows, where 1, 2 of the distribution ratio is prob=c (0.7,0.3)
> ID
[1] 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 1 2 2 1 2 1 1 1 1 1 1 1 2 1 1 1 2 2 1 2
[40] 1 1 1 2 1 1 2 1 1 1 2 1 1 2 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 2
[79] 1 1 2 1 1 1 1 2 1 2 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2 2 2 1 1 1
[118] 2 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 1 1 2 2 2 1 1traindata <-iris[id==1,]# Training set testdata <-Iris [Id==2,] # Test Set "Example" > Mysample <-binary[sample (1:nrow (binary), 3,replace=0),]
> Mysample
Admit GRE GPA rank
390 0 640) 3.51 2
225 0 800) 2.90 2
44 0 500) 3.31 3
> Mysample <-binary[sample (1:nrow (binary), 3,replace=0),]
> Mysample
Admit GRE GPA rank
60 0 600) 2.82 4
213 0 460) 2.87 2
25 1 760) 3.35 2
> Mysample <-binary[sample (1:nrow (binary), 3,replace=0),]
> Mysample
Admit GRE GPA rank
30 0 520) 3.29 1
303 1 400) 3.15 2
395 1 460) 3.99 3
>
R Language Learning Log 1