reprint: http://ices01.sinaapp.com/?p=129
R (also known as R) is an open source cross-platform numerical statistics and numerical graphical presentation tool. The popular point is that R is used for statistics and drawing. R has its own scripting language and a large number of statistics and graphics libraries (thanks to the open source community), which makes her look both beautiful and practical. Compared with other similar software (such as SPSS), R is characterized by pure command line, which is better, we should focus on the data itself, rather than the statistical tools of the UI.
R Although a set of their own language, is quite complete, but she is the most professional to do statistics and drawings, and such as the connection database, text processing, file operations and other dirty work can not be wronged R to do, these have to be responsible for other languages, my choice is the most familiar, do these dirty work the best python. The next question is clear, how does R and Python work together? Clap your head and think of several possible ways:
1. R and Python only share files, python clean source data processing, generate formatted files in a predetermined directory, do a timer let R read the file, the final output of statistical results and charts.
This approach is somewhat feasible, in addition to making a timer, you can also let Python immediately execute "rscript" command invoke R script to work, but this method is too restrictive, only to exchange files, Python can not be precise control of R.
2. Let Python directly invoke R's function, R is an open source project, there will certainly be some third-party libraries to implement Python and r interoperability.
Sure enough, I found the rpy2, which can be implemented using Python to read R's objects, call R's method, and Python and r data structure transformation. In fact, in addition to Python, other languages and r interoperability of third-party packages are also significant.
Finally I chose the 2nd method to let R and Python dance together.
Module Rpy2.robjects is an advanced package for Rpy2 R, which contains an R object and a series of r data structures. In most cases with rpy2, you only need to deal with this module. RPY2 installation Here is not much to say, interested students to read the document, directly experience how r and Python seamlessly integrated it.
Understanding R Instance R instance refers to RPY2.ROBJECTS.R, which is an embedded R process in Python, and it is possible to see R as a channel from Python to R. With the R instance, we can read the built-in variables of R, invoke the function of R, and even use it as a parser for R directly.
The object that accesses R is in the command line of R, we enter the object name directly to access R's built-in object, such as PI, letters:
accessing R objects in the R console
With an instance of R, Python accesses the R object very simply and in many ways:
accessing R objects in Python
In this code, we use three ways to access the R object, the R instance as a dictionary, the R instance as a method, and the R instance as a class object (really divine). In practice, which way to use is different from the habit, I like the method is to use the third, the R instance as their own people, directly use "." To access the R object. But there is one drawback to this approach, which is that you cannot access r objects or functions with namespaces, while the other two are possible, as explained later.
Call R functionWith R instances, we can easily implement functions that call R in Python. Let's read a data file and draw a point graph below the R console and the python command line, respectively.
r Console Read file draw point graph
Code interpretation:
The content of Data.csv is 3 to 7 of the above code.
data = read.table (' data.csv '): reads the file into a data frame variable.
MTX = Data.matrix (data): Turns data into a matrix.
Dotchart (MTX) uses matrix data to draw a point graph.
The results are as follows:
Next use Python to do the same thing, we learned before, using R instance can directly access R object, also can directly invoke R's function, in fact, in Python's view, the object and function is the same thing, the function is a kind of object. Now try calling the "read.table ()" function to read into a data file Data.csv:
It's a mistake! What's going on? As I mentioned above, use "." The referenced method cannot access r objects and functions with namespaces, read.table is a table function that is represented under the read package, through the "." The form call fails and must be obtained in the way of a dictionary or a parameter:
The result of this code is the same as the effect of drawing a dot graph under the R console. The last line of R.dotchart (MTX) is passed directly through the "." To call R's function Dotchart, in the absence of Mingshi space, is normal. If you want to avoid too many uncontrollable error opportunities, you can use the dictionary to access r objects and methods in a uniform way, this is the safest way, although I personally think it seems a bit awkward.
The R instance is an R console in fact, the R instance is an interactive R console, but the interactive object is Python and r, in order to prove that the R instance has r console characteristics, to do an experiment, write a string of R script, as a python string variable content, Pass the string to the R instance and invoke the R instance as a method:
The results come out like this:
Note that the instance of R can only be used by R (r code) as the console, and the dictionary does not work in a way.
Loading custom functions in practice, using the R language to write your own functions is also unavoidable, and in the R console, you can use the source (' Script_path ') method to load custom R scripts. It is also convenient to use the functions in Python's own semantic R script: Use R.source (' Script_path ') to load the custom function into the global environment. Then use R. Custom method name can be implemented call, I do this, no longer detailed in this, the students to play their own hands.
The r vector and python list vector (vector) is one of the most important and most commonly used data types of r, and can be understood as a two-dimensional data corresponding to the Python list. In the R console, declare a variable: "x <-1", X will be declared as a vector, and its first value is 1. R often uses the C () function to create a vector that consists of multiple values, such as C (1,2,3,4). Python wants to deal with R, in addition to accessing R objects and calling R functions, and learning how to convert common data types.
RPY2 provides several classes that let us convert a python list into a vector of R. Robjects, respectively. Intvector,robjects. Boolvector,robjects. Stringvector,robjects. Floatvector. Take Intvector as an example and convert the list of Python to R's Vector:robjects.IntVector ([1,2,3,4,5]), BI!
Here's an example of a scatter plot using the type conversion knowledge you just learned to end this experience:
It's still going on ..... RPY2 provides more than just the above, the above knowledge is only 80% provided by the 20%, but it is enough to solve the problem of rpy2. Rpy2 also provides a lower-level API, you can do more things, such as you can implement another Robjects object to support the use of "." To access objects and functions with namespaces. For more information, please visit the official documentation.
Let R and Python dance together