This article on Cython is mainly used to illustrate what is the main purpose of Cython,cython. For the specific usage of Cython is not involved, because I think understand its main purpose and its advantages and disadvantages, then wait until there is a use of the scene to learn its document on it.
1. Python Extensions (extention module)
We know that Python can be extended in C and C + +, which is designed to implement some of the key features in a faster, more efficient language (c, C + +) to improve the efficiency of the Python program.
Here is an example:
#include <Python.h>StaticPyobject *fun (Pyobject *self, Pyobject *args) {intn, I, t = A;if(! Pyarg_parsetuple (args,"I", &n)) {returnNULL; } for(i =0; I < n; i++) {t = t + i; }returnPy_buildvalue ("I", t);}StaticPymethoddef foraddmethods[] = {{"Fun", Fun, Meth_varargs,"For loop Add."}, {null, NULL,0, NULL}/ * Sentinel * /}; Pymodinit_funcinitforadd (void){ (void) Py_initmodule ("Foradd", foraddmethods);}
The function of this extension is very simple, of course, this function can finally be compared with Python to realize how much faster and hard to say, if the input of n is very large, this version of C may be more than the pure Python version of a lot faster.
When using C as an extension of Python, you need to write in a fixed format, and I think that when you extend Python with C, you need to be aware of the Python reference count problem, which is likely to be a memory leak once you are unfamiliar with the PYTHON/C API. In C code, the program itself is responsible for the object's reference management, such as you call to PyInt_FromLong(12)
create a pyobject, then you need to remember when to invoke Py_XINCREF
, Py_DECREF
to manage the object's reference.
2. Use Cython to generate Python extensions
Cython is a tool used to quickly generate a Python extension module (extention module) with the syntax of the Python language syntax and the C-language syntax of the hybrid.
Here is a foradd feature written in Python:
def fun(n): 12 0 while i < n: t = t + i 1 return t
Then use the cython -a test_foradd.pyx
command to generate a. c and. html file, about the use of Cython you read the document yourself, in this article basically only use this command.
The generated test_foradd.c file is Cython test_foradd.pyx "translated" Into the C language version, test_foradd.html is a py code and C Code control of the page, you can see in the page each PY statement "translation" Which C statements have been made. In other words, you can use Python to write an extension that requires the C language, and then use Cython to automatically "translate" python into C, so you don't have to pay attention to the problem of using the C language to write Python extensions in front of you.
There are too many test_foradd.c of generated files, and I extract the key code as follows:
StaticPyobject *__pyx_int_0;StaticPyobject *__pyx_int_12;Static int__pyx_initglobals (void) {__pyx_int_0 = Pyint_fromlong (0); __pyx_int_12 = Pyint_fromlong ( A);return 0;}/ * Python wrapper * /StaticPymethoddef __pyx_mdef_11test_foradd_1fun = {"Fun", (pycfunction) __pyx_pf_11test_foradd_fun, Meth_o,0}; Pymodinit_func Pyinit_test_foradd (void){if(__pyx_initglobals () <0) __pyx_err (0,1, __pyx_l1_error) __pyx_m = Py_initmodule4 ("Test_foradd", __pyx_methods,0,0, python_api_version); Py_xincref (__pyx_m);}StaticPyobject *__pyx_pf_11test_foradd_fun (cython_unused pyobject *__pyx_self, Pyobject *__pyx_v_n) {PyObject *__pyx_v_t = NU LL; Pyobject *__pyx_v_i = NULL; Pyobject *__pyx_r = NULL; __pyx_refnannydeclarations pyobject *__pyx_t_1 = NULL;int__pyx_t_2; __pyx_refnannysetupcontext ("Fun",0);/* "Test_foradd.pyx": * * def fun (n): * t = # <<<<<<<<<<<<&L t;< * i = 0 * while I < n: */__pyx_incref (__pyx_int_12); __pyx_v_t = __pyx_int_12;/* "Test_foradd.pyx": * def fun (n): * t = * i = 0 # <<<<<<<<<<& lt;<<< * While I < n: * t = t + i */__pyx_incref (__PYX_INT_0); __pyx_v_i = __pyx_int_0;/* "Test_foradd.pyx": + * t = * i = 0 * while I < n: # <<<<<<<<& lt;<<<<< * t = t + i * + = 1 */ while(1) {__pyx_t_1 = Pyobject_richcompare (__pyx_v_i, __pyx_v_n, py_lt); __pyx_xgotref (__pyx_t_1);if(Unlikely (!__pyx_t_1)) __pyx_err (0, -, __pyx_l1_error) __pyx_t_2 = __pyx_pyobject_istrue (__pyx_t_1);if(Unlikely (__pyx_t_2 <0)) __pyx_err (0, -, __pyx_l1_error) __pyx_decref (__pyx_t_1); __pyx_t_1 =0;if(!__pyx_t_2) Break;/* "Test_foradd.pyx": * i = 0 * while I < n: * t = t + I # <<<<<<&L t;<<<<<<< * i + = 1 * return t */__pyx_t_1 = Pynumber_add (__pyx_v_t, __pyx_v_i);if(Unlikely (!__pyx_t_1)) __pyx_err (0, -, __pyx_l1_error) __pyx_gotref (__pyx_t_1); __pyx_decref_set (__pyx_v_t, __pyx_t_1); __pyx_t_1 =0;/* "Test_foradd.pyx": * While I < n: * t = t + i * i + = 1 # <<<<<& lt;<<<<<<<< * Return T * /__pyx_t_1 = __PYX_PYINT_ADDOBJC (__pyx_v_i, __pyx_int_1,1,1);if(Unlikely (!__pyx_t_1)) __pyx_err (0, +, __pyx_l1_error) __pyx_gotref (__pyx_t_1); __pyx_decref_set (__pyx_v_i, __pyx_t_1); __pyx_t_1 =0; }/* "Test_foradd.pyx": * t = t + i * + = 1 * return t # <<<<<<< <<<<<<< * *__pyx_xdecref (__pyx_r); __pyx_incref (__pyx_v_t); __pyx_r = __pyx_v_t;Goto__pyx_l0;/* "Test_foradd.pyx": * # ChangeLog: * * def fun (n): # <<<<<<<<<<<< << * t = * i = 0 * * / * Function exit code * /__pyx_l1_error:; __pyx_xdecref (__pyx_t_1); __pyx_addtraceback ("Test_foradd.fun", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_l0:; __pyx_xdecref (__pyx_v_t); __pyx_xdecref (__pyx_v_i); __pyx_xgiveref (__pyx_r); __pyx_refnannyfinishcontext ();return__pyx_r;}
Although a lot of code has been simplified, but the above C code still looks a bit complicated, the main reason is that the variable name inside is too random. This C code function with our own handwritten C extension code to complete the same function, just one is "manual" one is "automatic".
The comments in the code are a good explanation of what line of code each piece of code represents in Python, and if you don't want to look at that long code, you can just open the generated HTML file, click on each line with a + statement, and then show the C code after "translate". The more yellow lines in the page indicate that the more C code that comes out of the last "translation", the more you can simply understand the more opcode this line of code needs to execute in Python.
From this example Cython's "translation" is actually the Python code "translated" into the equivalent of C code (in fact, called various python/c APIs), and then no longer need us to focus on the C extension "format", the various Pyobject object reference count. Cython does give us a lot of convenience in writing C extension modules, even if you don't understand python/c API, even you can not C language.
3. Dynamic type and static type
C + +, Java, and C # are static languages, and their greatest feature is that variables must be type-declared before they can be used. And Python, JS is a dynamic type language, so-called dynamic, popular point is that the type of variable is determined by the value of the last given it.
The static type is better than the dynamic type in terms of operational efficiency, because the type of each variable can be determined at compile time, so the compiler can do some optimizations on the results of the compilation. While dynamic types are generally interpreted, the types of variables need to be determined at the time of interpretation, and a bit of performance will inevitably be lost.
For the following C code:
int1;int2;int c = a + b;
Assuming that there are multiple implementations of the + operator (for example, an integer version, a floating-point version, and so on), the GCC compiler knows that a, B, C must be an integer type when compiling, and that for two integers the compiler can choose an integer version as the result of the compilation (although I don't know if GCC did), Then the process of judgment is omitted at the time of operation.
In contrast, for the following Python code:
12c = a + b
Because Python is a dynamic type, the type of the variable can be determined only at run time,a + b
This line of code executes in Python first to determine whether a, B is of the same type, if not because Python is a strongly typed definition language, it will be reportedTypeError: unsupported operand type(s) for +: ‘int‘ and ‘str‘
A similar error. If the two variables are of the same type (integer), then thePyIntObject->PyObject_HEAD->ob_type
Gets the type struct body pyint_type of the corresponding integer type, and then calls thePyInt_Type->int_methods->int_add
To add a total of two integers. This process is so "lengthy" that interested classmates can lookPyNumber_Add
The implementation of this function.
4. Advantages of Cython
The C-language extension generated automatically by Cython is much more complex than our handwritten extension code (in fact, it looks more complicated), and the last Cython that the handwritten dozens of lines of code can do is generate hundreds of lines of code, so can this auto-generated extension provide a performance boost? The answer is: there may be a performance boost.
Why would you say it's possible? For example, this is the pure Python code with Cython to generate extension modules, the code does not introduce any Cython syntax, then Cython can only "scripted" to each Python statement "translated" into the corresponding C language version, Because Python itself is implemented in C and the PYTHON/C API provides a rich interface, this equivalent "translation" implementation is possible.
If you have seen Python's source code then you will find that Cython's "translation" results are very well understood, in Pythoni=1
This Python statement is calledPyInt_FromLong(1)
To generate aPyIntObject
。
Our "scripted" translation is not much different from the pure Python code being executed in the interpreter, so it may not be a performance boost. So how can cython give us a performance boost, the answer to the next section.
5. Further optimization
The above example does not introduce any Cython syntax, so the resulting performance increase is limited, so let's look at the further optimized version:
def fun(n): 12 0 while i < n: t = t + i return t
The optimized Python code introduces the Cython variable type definition cdef int
to define an integer variable, which is written here only to find that the previously mentioned dynamic type and static type do not seem to have much to do with this article, but I added it, but I can also understand that Cython is a mixture of dynamic types (Python) and static type (C language), our optimization this time is to turn some of the previous dynamic types of variables into static types of variables.
Here is the C code after the Cython conversion:
Static int__pyx_initglobals (void) {if(__pyx_initstrings (__pyx_string_tab) <0) __pyx_err (0,1, __pyx_l1_error);return 0; __pyx_l1_error:;return-1;}StaticPyobject *__pyx_pf_12test_foradd2_fun (cython_unused pyobject *__pyx_self, Pyobject *__pyx_v_n) {int__pyx_v_t;int__pyx_v_i; Pyobject *__pyx_r = NULL; __pyx_refnannydeclarations pyobject *__pyx_t_1 = NULL; Pyobject *__pyx_t_2 = NULL;int__pyx_t_3; __pyx_refnannysetupcontext ("Fun",0);/* "Test_foradd2.pyx": * * def fun (n): * Cdef int t = # <<<<<<<<<< <<<< * cdef int i = 0 * while I < n: */__pyx_v_t = A;/* "Test_foradd2.pyx": * def fun (n): * cdef int t = n * cdef int i = 0 # <<<<<&L t;<<<<<<<< * While I < n: * t = t + i */__pyx_v_i =0;/* "Test_foradd2.pyx": cdef int t = cdef int i = 0 * while I < n: # <<<&L t;<<<<<<<<<< * t = t + i * + = 1 */ while(1) {__pyx_t_1 = __pyx_pyint_from_int (__pyx_v_i);if(Unlikely (!__pyx_t_1)) __pyx_err (0, -, __pyx_l1_error) __pyx_gotref (__pyx_t_1); __pyx_t_2 = Pyobject_richcompare (__pyx_t_1, __pyx_v_n, py_lt); __pyx_xgotref (__pyx_t_2);if(Unlikely (!__pyx_t_2)) __pyx_err (0, -, __pyx_l1_error) __pyx_decref (__pyx_t_1); __pyx_t_1 =0; __pyx_t_3 = __pyx_pyobject_istrue (__pyx_t_2);if(Unlikely (__pyx_t_3 <0)) __pyx_err (0, -, __pyx_l1_error) __pyx_decref (__pyx_t_2); __pyx_t_2 =0;if(!__pyx_t_3) Break;/* "Test_foradd2.pyx": * cdef int i = 0 * while I < n: * t = t + I # <<<< <<<<<<<<<< * i + = 1 * return t */__pyx_v_t = (__pyx_v_t + __pyx_v_i);/* "Test_foradd2.pyx": * While I < n: * t = t + i * i + = 1 # <<<<< <<<<<<<<< * return T */__pyx_v_i = (__pyx_v_i +1); }/* "Test_foradd2.pyx": * t = t + i * + = 1 * return t # <<<<<<< ;<<<<<<< * *__pyx_xdecref (__pyx_r); __pyx_t_2 = __pyx_pyint_from_int (__pyx_v_t);if(Unlikely (!__pyx_t_2)) __pyx_err (0, -, __pyx_l1_error) __pyx_gotref (__pyx_t_2); __pyx_r = __pyx_t_2; __pyx_t_2 =0;Goto__pyx_l0;/* "Test_foradd2.pyx": * # ChangeLog: * * def fun (n): # <<<<<<<<<<<< ;<< * Cdef int t = cdef int i = 0 */ / * Function exit code * /__pyx_l1_error:; __pyx_xdecref (__pyx_t_1); __pyx_xdecref (__pyx_t_2); __pyx_addtraceback ("Test_foradd2.fun", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_l0:; __pyx_xgiveref (__pyx_r); __pyx_refnannyfinishcontext ();return__pyx_r;}
Here I only list the parts of the code with Chapter 3, in Python we only added the Cython cdef int
type definition, from the final "translation" results are very obvious, the final C code no longer appear PyNumber_Add
such complex functions, but all are simple C code. cdef int
all operations with the modified variable I, t become our common C operation and do not need __Pyx_InitGlobals
to be used PyInt_FromLong
to create the Pyintobject object.
The students can test the performance improvement between the two.
The first glimpse of Cython