I've always liked Python very much. When people talk about Python, they often mention the following two advantages: easy to write and easily invoke the C + + library
In reality, however, the 1th is at the cost of slow execution speed, and the 2nd requires the library itself to use the Python API in accordance with Python's specifications and export the corresponding symbols.
In Tian Shi internship, with Cython dozen dealings, feel this tool although a lot of bugs, write some user experience is not good place, but has been able to greatly improve the speed and easy to call C + +, or very good. Here to give you a brief introduction to Cython (note difference from CPython). Cython allows us to easily: Mix python and C + + code with Python syntax, and promote Python speed call C + + code example: Matrix multiplication
Let's say we're writing a very simple matrix multiplication code where the matrix is stored in Numpy.ndarray. Python code can write this:
# dot_python.py
Import NumPy as NP
def Naive_dot (A, b):
if a.shape[1]!= b.shape[0]:
Raise ValueError (' Shape not matched ')
N, p, m = a.shape[0], a.shape[1], b.shape[1]
C = Np.zeros ((n, m), Dtype=np.float32) for
I In Xrange (n): for
J in Xrange (m):
s = 0 for
k in Xrange (p):
s = = A[i, K] * b[k, J]
c[i, j] = s
Return C
Do not have to guess also know that this is compared to C + + written a lot slower. What we are interested in is how to use Cython to speed up the program. Let's start with the Cython program code:
# Dot _cython.pyx import NumPy as NP Cimport NumPy as NP cimport Cython @cython. Boundscheck (False) @cython. Wraparound (false) CD EF np.ndarray[np.float32_t, ndim=2] _naive_dot (np.ndarray[np.float32_t, ndim=2] A, np.ndarray[np.float32_t, ndim=2] b) : Cdef np.ndarray[np.float32_t, ndim=2] C cdef int n, p, m cdef np.float32_t s if a.shape[1]!= b.shape[0] : Raise ValueError (' shape not matched ') n, p, m = a.shape[0], a.shape[1], b.shape[1] C = Np.zeros ((n, M),
Dtype=np.float32) for I in Xrange (n): for J-Xrange (m): s = 0 for k in xrange (p): S + + A[i, K] * b[k, J] C[i, j] = s return C def Naive_dot (A, B): Return _naive_dot (A, B )
You can see that this program is almost as written as Python. Let's see the difference. Part: The extension of the Cython program is. Pyx Cimport is the command used in Cython to introduce the. pxd file. For the. pxd file, you can easily understand the header file that is used to write a declaration in C + +, and more specifically, I'll write it later. The two introduced here are Cython presets. @cython. Boundscheck (False) and @cython. Wraparound (false) two modifiers to turn off Cython boundary checks
The Cython function uses the CDEF definition, and he can specify the type for all parameters and return values. Let's say we can write the integer min function like this:
cdef int my_min (int x, int y): return
x if x <= y else y
Here np.ndarray[np.float32_t, ndim=2] is a type name like int, but it is longer and the amount of information is relatively large. It means that this is a type of np.float32_t 2-D np.ndarray. Inside the function body, we can use syntax like Cdef TypeName varname to declare a variable in a Python program and not see the Cdef function, so here we have Def Naive_dot (A, B) to invoke Cdef _naive _dot function.
In addition, the Cython program needs to compile before it can be called by Python, the process is: Cython compiler to compile Cython code to invoke the Python source code C + + code to compile the generated codes into a dynamic link library python interpreter load dynamic link library
To complete the first two steps, we'll write the following code:
# setup.py
from Distutils.core Import Setup, Extension from
cython.build import cythonize
import NumPy
Setup (ext_modules = Cythonize (Extension (
' Dot_cython ',
sources=[' Dot_cython.pyx '),
language= ' C ',
Include_dirs=[numpy.get_include ()],
library_dirs=[],
libraries=[],
extra_compile_args=[],
extra_link_args=[]))
This code is a bit too complicated for our simple example, but in fact, it's so complicated, in order to save the post again, so simply put the most complex list here. Here's a passing explanation: ' Dot_cython ' is the name of the dynamic-link library we want to generate sources can contain. pyx files, and later if we want to call C + + programs, we can add. C/. cpp files Language In fact, the default C, if you want to use C + +, you can change it to C + + Include_dirs This is passed to GCC's-i parameter library_dirs this is the-l parameter to GCC libraries This is the-l parameter passed to GCC extr A_compile_args is the extra compilation parameter passed to GCC, for example, you can pass a-std=c++11 Extra_link_args is the additional link parameter passed to GCC (i.e. when generating the dynamic link library) If you've never seen a few of them. GCC parameters, indicating that you do not have these requirements for the time being, you will understand when you meet
Then we just have to execute the following command to compile the Cython program into a dynamic link library.
Python setup.py build_ext--inplace
Successfully run the above sentence, you can see in the current directory more out of DOT_CYTHON.C and dot_cython.so. The former is the generated C program, the latter is the compiled dynamic link library.
Now let's try the effect:
$ Ipython 15:07:43 Pytho
n 2.7.12 (default, OCT 2016, 05:20:59) Type "copyright", "credits" or "license" for the more information.
IPython 4.0.1--an enhanced Interactive Python. ?
-> Introduction and Overview of IPython ' s features.
%quickref-> Quick Reference.
Help-> the Python ' s own Help system. Object?
-> details about ' object ', use ' object?? ' for extra details. In [1]: Import NumPy as NP into [2]: Import Dot_python in [3]: Import Dot_cython in [4]: a = np.random.randn (MB). Astyp E (Np.float32) in [5]: b = Np.random.randn (M). Astype (Np.float32) in [6]:%timeit-n 100-r 3 Dot_python.naive_dot (A, b loops, best 3:560 ms per loop in [7]:%timeit-n 100-r 3 Dot_cython.naive_dot (A, B) loops, Best of 3:98
2µs per loop in [8]:%timeit-n 100-r 3 Np.dot (A, B) loops, Best of 3:49.2µs each loop
So, that's about 570 times times more efficient. And our code is basically unchanged. Of course, you have to do with highly optimized numpy, of course, still a lot slower. But pinch point, this 0.982ms actually with direct write C + + is almost, can achieve this effect has been very satisfactory. I don't believe we can try it. C + + version:
Dot.cpp
#include <ctime>
#include <cstdlib>
#include <chrono>
#include < Iostream>
class Matrix {
float *data;
Public:
size_t N, m;
Matrix (size_t R, size_t c): Data (new Float[r*c]), N (R), M (c) {}
~matrix () {delete[] data;
float& operator () (size_t x, size_t y) {return data[x*m+y];}
Float operator () (size_t x, size_t y) const {return data[x*m+y];}
;
float dot (const matrix &a, const matrix& b) {
Matrix C (</