Python and R are the two major mainstream languages for today's data analysis. As a student of the statistical department, I was first approached by R, and then by Python. Python is a universal programming language, and scientific computing, data analysis is an important part of it, but not all, and r is more emphasis on statistical analysis, after all, R is a statistical scientist invented, in itself is for statistics. The advantage of Python is its versatility, with Python in almost all fields, and R as a professional in statistics and related fields. Each has an advantage. So what's so good about two things that can be combined together? The answer is yes. In order to implement this function, it is generally necessary to provide the appropriate calling interface. Rpy2 This third-party library provides an interface for Python to invoke R. This article mainly introduces the simple use of rpy2.
In fact, in the previous period, I have tried to install rpy2, but only in the Linux environment installed successfully, under Windows always install failed. Until recently, I found an unofficial Python third-party library download URL:
Http://www.lfd.uci.edu/~gohlke/pythonlibs. The third-party libraries provided by this web site are still relatively full, and are all compiled. WHL files, each version has, such as pymediea,rpy2 and other difficult to find libraries here, so I hereby share to you. Find Rpy2 This library, because my Python version is 2.7 64 bit, so choose Http://www.lfd.uci.edu/~gohlke/pythonlibs/tuoh5y4k/rpy2-2.7.8-cp27-none-win_ AMD64.WHL this address to download. After the download is complete, unzip and put all the extracted files into the Python site-packages directory. Another important thing to do is to increase the environment variable r_home and set its value to the installation path of R in your computer. Once you've set it up, you'll see that import rpy2 can be imported properly.
Common commands:
1. Import rpy2.robjects as Robjects This command imports R objects
2. ROBJECTS.R ("R_script") can execute R code, such as Pi = ROBJECTS.R (' pi ') can get pi in R, the returned variable pi is a vector, or understood as a list in Python, by
PI[0] You can remove the value of pi.
3. Robjects.r.source ("FILE.R") can execute R script file. Examples are as follows:
Robjects.r.source ('plot_demo.r')
PLOT_DEMO.R content is as follows:
# R Language test script x <-C (1,2,3,4<-x*xjpeg (File="plot.jpg"# Save Image # Draw Scatter chart # Close Device
After running the above code, you can get a scatter plot called plot.jpg.
A = ROBJECTS.R ('a<-c ()')print(a)
Run get [1] 1 2 3
x = ROBJECTS.R (' x ') y = robjects.r (' y ') print (x) print (y)
Run to get:
[1] 1 2 3 4
[1] 1 4 9 16
Of course rpy2 not only converts the R data object into a Python variable (or object), but also converts the Python list, dictionary, and other data types to R's vector or data frame type, and the corresponding function has robjects. Intvector (),
Robjects. Floatvector () and so on, look at these names basically know what to do. Examples are as follows:
Print(Robjects. Intvector ([A]))Print(Robjects. Factorvector (['a','a','b','C']))Print(Robjects. Floatvector ([1.2,2.3]))Print(robjects.baseenv)#Basic Environment SpacePrint(Robjects. DataFrame ({'a': [+],'b': [3,4]}))
The results were as follows:
[1] 1 2 3
[1] a A B C
Levels:a b C
[1] 1.2 2.3
<environment:namespace:base>
a.1l a.2l b.3l b.4l
1 1 2) 3 4
Finally, let's look at a bit more complex R code execution in python.
" " Library (randomforest) # import Random Forest Package # # Use data Set irisdata = Iris # using the Iris DataSet table (data$species) # # Create a randomforest m Odel to classfy the iris species# create a random forest model for IRIS classification Iris.rf <-randomforest (species~., data = data, importance=t, proximity =t) print ('--------Here are the random model-------') print (IRIS.RF) print ('--------Here's the names of the model-----') print ( Names (IRIS.RF)) confusion = Iris.rf$confusionprint (confusion)"robjects.r (r_script)
The results were as follows:
Randomforest 4.6-12
Type Rfnews () to see new features/changes/bug fixes.
[1] "--------Here is the random model-------"
Call:
Randomforest (formula = Species ~., data = data, importance = t, Proximity = t)
Type of Random Forest:classification
Number of trees:500
No. of variables tried at each split:2
OOB estimate of error rate:4%
Confusion Matrix:
Setosa versicolor virginica Class.error
Setosa 50 0 0 0.00
Versicolor 0 47 3 0.06
Virginica 0 3 47 0.06
[1] "--------Here is the names of the model-----"
[1] "Call" "type" "predicted" "Err.rate"
[5] "confusion" "votes" "Oob.times" "Classes"
[9] "Importance" "Importancesd" "localimportance" "proximity"
[] "Ntree" "Mtry" "Forest" "Y"
[+] "test" "inbag" "terms"
Setosa versicolor virginica Class.error
Setosa 50 0 0 0.00
Versicolor 0 47 3 0.06
Virginica 0 3 47 0.06
Finally, an example of how Python interacts with R in Jupyter notebook can be consulted: http://nbviewer.jupyter.org/gist/xccds/d692e468e21aeca6748a
Python calls R, using Rpy2