The SQLDF package is a useful data management aid in the R language, but the latest version of the package is garbled when it is processed in Chinese , pending resolution
Usage: sqldf (x, stringsasfactors = False, Row.names = False ...)
- Row.names: When True, row name renaming is not renamed
- Need to install SQLDF package: install.packages ("Sqldf")
- Load the following packages:
Library (GSUBFN) library (proto) library (rsqlite) library (DBI) library (TCLTK) library (SQLDF)
- Take the first few rows of data example
> x <-head (Iris,2) > y <-sqldf ( " select * from Iris limit 2 " ) > # Compare two data frames > identical (x, y) [ 1] TRUE > y sepal.length sepal.width petal.length petal.width species 1 5.1 3.5 1.4 0.2 Setosa 2 4.9 3.0 1.4 0.2 setosa
- Like extracts a subset of data
# take out the subset of data in the species column beginning with se > a2r <-subset (Iris, Grepl ("^se"> A2s <- Sqldf ("select * from Iris where species like ' se% '")> all.equal ( As.data.frame (A2R), A2s) [1] TRUE
In extract subset
>#in the Iris DataSet, the selected species are rows of Setosa and virginica.> a3r <-subset (Iris, species%inchC"Setosa","virginica"))> A3s <-sqldf ("SELECT * from Iris where species in (' Setosa ', ' virginica ')")#note Single and double quotation marks> >#A3r is selected as a subset, so the row name is the same as the original data set> Row.names (A3R) <-NULL>identical (a3r, a3s) [1] TRUE
Total data
> Head (iris,2) sepal.length sepal.width petal.length petal.width species1 5.1 3.5) 1.4 0.2Setosa2 4.9 3.0) 1.4 0.2Setosa> Aggregate (Iris[1:2], iris[5], mean)#calculate the average of the first two variables of a speciesspecies Sepal.length sepal.width1 setosa 5.006 3.4282 versicolor 5.936 2.7703 virginica 6.588 2.974> sqldf ('Select species, Avg ("Sepal.length"), avg ("Sepal.width") from the iris Group by species') Species avg ("Sepal.length") AVG ("Sepal.width")1 setosa 5.006 3.4282 versicolor 5.936 2.7703 virginica 6.588 2.974
ORDER BY
> Head (warpbreaks,2) breaks wool tension1 a l2 a l > Head (warpbreaks[order (warpbreaks$breaks, decreasing = TRUE),], 2) breaks wool tension5 9 a l > sqldf ("select * FROM Warpbreaks ORDER BY breaks desc limit 2") breaks wool tension1 A L2 A L
R (5): SQL data processing