In my opinion, the Python community is divided into three genres, namely Python 2.x organization, 3.x organization and PyPy organization. This classification can basically be rooted in the compatibility and speed of class libraries. This article will focus on some common code optimization techniques and a significant improvement in performance after compiling to C, of course, I will also give three major python genre run time. My goal is not to prove that one is stronger than the other, just to let you know how to use these specific examples in different contexts for comparison.
Using the generator
A universally overlooked memory optimization is the use of generators. The generator lets us create a function that returns only one record at a time, rather than returning all records at once, if you're using python2.x, that's why you use xrange instead of range or use IFilter instead of filter. A good example is to create a large list and flatten them together.
Import timeitimport randomdef Generate (num): While Num:yield Random.randrange (ten) num-= 1def create_list (num): numbers = [ ]while num:numbers.append (random.randrange) num-= 1return numbersprint (Timeit.timeit ("sum (Generate (999))", setup= "from __main__ import Generate", number=1000)) >>> 0.88098192215 #Python 2.7>>> 1.416813850402832 #Python 3.2print (Timeit.timeit ("Sum (create_list (999)"), setup= "from __main__ import Create_list", number=1000)) >>> 0.924163103104 #Python 2.7>>> 1.5026731491088867 #Python 3.2
This is not only a bit faster, but also avoids you storing all the lists in memory!
Introduction of cTYPES
For the key performance code Python itself also provides us an API to invoke the C method, mainly through ctypes, you can not write any C code to take advantage of ctypes. By default, Python provides a precompiled standard C library, and we go back to the builder example to see how much time is spent using ctypes implementations.
Import timeitfrom ctypes import cdlldef generate_c (num): #Load standard C LIBRARYLIBC = Cdll. LoadLibrary ("libc.so.6") #Linux #libc = cdll.msvcrt #Windowswhile num:yield libc.rand ()% 10num = 1print (Timeit.timeit (" SUM (Generate_c (999)) ", setup=" from __main__ import Generate_c ", number=1000)) >>> 0.434374809265 #Python 2.7 >>> 0.7084300518035889 #Python 3.2
Just replaced by the C random function, the running time is reduced by half! Now if I tell you that we can do better, do you believe it?
Introduction of Cython
Cython is a superset of Python that allows us to invoke C functions and declare variables to improve performance. We need to install Cython before attempting to use it.
sudo pip install Cython
Cython is essentially another branch of similar class library Pyrex that is no longer developed, and it compiles our class Python code into C libraries, which we can invoke in a python file. For your Python file, use the. pyx suffix instead. py suffix, let's take a look at how to run our generator code using Cython.
#cython_generator. Pyximport randomdef Generate (num): While Num:yield random.randrange (num)-= 1
We need to create a setup.py so we can get to Cython to compile our function.
From Distutils.core import setupfrom distutils.extension import extensionfrom cython.distutils import Build_extsetup ( Cmdclass = {' Build_ext ': Build_ext},ext_modules = [Extension ("Generator", ["Cython_generator.pyx"])])
Compile using:
Python setup.py build_ext--inplace
You should be able to see two files cython_generator.c file and generator.so file, we test our program using the following method:
Import Timeitprint (Timeit.timeit ("Sum (generator.generate (999)", setup= "Import Generator", number=1000)) >> > 0.835658073425
Not bad, let's see if there's any room for improvement. We can first declare "num" as shaping, then we can import the standard C library to take charge of our random function.
#cython_generator. pyxcdef extern from "Stdlib.h": Int. C_libc_rand "Rand" () def generate (int num): while Num:yield c_libc_ Rand ()% 10num-= 1
If we compile the run again we will see this string of amazing numbers.
>>> 0.033586025238
Only a few changes have brought good results. However, sometimes this change is tedious, so let's take a look at how to use the rules of Python to achieve it.
introduction of PyPy PyPy is a Python2.7.3 instant compiler, which, in layman's words, means that your code runs faster. Quora used PyPy in the production environment. PyPy has some installation instructions on their download page, but if you use the Ubuntu system, you can install it via Apt-get. It works immediately, so there's no crazy bash or running scripts, just download and run. Let's take a look at how our original generator code performs under PyPy.
Import timeitimport randomdef Generate (num): While Num:yield Random.randrange (ten) num-= 1def create_list (num): numbers = [ ]while num:numbers.append (random.randrange) num-= 1return numbersprint (Timeit.timeit ("sum (Generate (999))", setup= "from __main__ import Generate", number=1000)) >>> 0.115154981613 #PyPy 1.9>>> 0.118431091309 # PyPy 2.0b1print (Timeit.timeit ("Sum (create_list (999))", setup= "from __main__ import Create_list", number=1000)) > >> 0.140175104141 #PyPy 1.9>>> 0.140514850616 #PyPy 2.0b1
Wow! Not modifying one line of code runs 8 times times faster than a pure Python implementation.
Why further research is needed for further testing ? PyPy is the Champion! Not all right. Although most programs can run on PyPy, there are still some libraries that are not fully supported. Also, it's easier to write a C extension for your project than to switch to a compiler. Let's go a little deeper and see how ctypes lets us use C to write libraries. Let's test the speed of the merge sort and calculate the Fibonacci sequence. Here's the C code (FUNCTIONS.C) We're going to use:
/* FUNCTIONS.C */#include <stdio.h> #include <stdlib.h> #include <string.h>/* http://rosettacode.org /wiki/sorting_algorithms/merge_sort#c */inline voidmerge (int *left, int l_len, int *right, int r_len, int *out) {int I, J, K;for (i = j = k = 0; I < L_len && J < R_len;) out[k++] = Left[i] < Right[j]? left[i++]: Right[j++];whi Le (i < l_len) out[k++] = Left[i++];while (J < r_len) out[k++] = right[j++];} /* Inner recursion of merge sort */voidrecur (int *buf, int *tmp, int len) {int L = len/2;if (len <= 1) return;/* note That BUF and TMP is swapped */recur (TMP, BUF, L); recur (tmp + L, buf + L, len-l); Merge (TMP, L, TMP + L, len-l, buf) ;} /* Preparation work before recursion */voidmerge_sort (int *buf, int len) {/* call alloc, copy and free only once */int *tm p = malloc (sizeof (int) * len), memcpy (TMP, buf, sizeof (int) * len), recur (BUF, TMP, Len), free (TMP); Intfibrec (int n) {if (n < 2) return N;elsereturn Fibrec (n-1) + Fibrec (n-2);}
On the Linux platform, we can compile it into a shared library using the following method:
Gcc-wall-fpic-c Functions.cgcc-shared-o libfunctions.so FUNCTIONS.O
Using cTYPES, this library can be used by loading the shared library "libfunctions.so", just as we did with the standard C library in front of us. Here we will compare the Python implementation with the C implementation. Now we begin to calculate the Fibonacci sequence:
# functions.pyfrom ctypes import *import timelibfunctions = Cdll. LoadLibrary ("./libfunctions.so") def Fibrec (n): if n < 2:return nelse:return Fibrec (n-1) + Fibrec (n-2) start = Time.time () Fibrec (+) finish = Time.time () print ("Python:" + str (finish-start)) # C Fibonaccistart = Time.time () x = libfunctions.fi Brec (+) finish = Time.time () print ("C:" + str (finish-start))
As we expected, C is faster than Python and pypy. We can also compare merge sorts in the same way.
We haven't dug deep into the cypes library, so these examples do not reflect the powerful side of Python, and the Cypes library has only a few standard type limitations, such as int, char array, float, byte (bytes), and so on. By default, there is no shaping array, but we can indirectly obtain such an array by multiplying it with C_int (CType is an int type). This is also the 7th line of code to render. We created an array of c_int, a list of our numbers, and the decomposition package into the C_int array.
The main thing is that C can't do that, and you don't want to. We use pointers to modify the function body. To pass the sequence of our c_numbers, we must pass the Merge_sort function by reference. After running Merge_sort, we use the c_numbers array to sort, and I have added the following code to my functions.py file.
#Python Merge sortfrom random import Shuffle, sample#generate 9999 random Numbers between 0 and 100000numbers = sample (range (100000), 9999) Shuffle (numbers) C_numbers = (C_int * len (Numbers)) (*numb ERS) from HEAPQ import Mergedef merge_sort (m): If Len (m) <= 1:return mmiddle = Len (m)//2left = M[:middle]right = M[midd Le:]left = Merge_sort (left) right = Merge_sort [right] return list (merge (left, right)) start = Time.time () numbers = Merge_ Sort (numbers) finish = Time.time () print ("Python:" + str (finish-start)) #C Merge Sortstart = time.time () Libfunctions.merg E_sort (ByRef (C_numbers), Len (numbers)) finish = Time.time () print ("C:" + str (finish-start))
python:0.190635919571 #Python 2.7python:0.11785483360290527 #Python 3.2python:0.266992092133 #PyPy 1.9Python: 0.265724897385 #PyPy 2.0b1c:0.00201296806335 #Python 2.7 + ctypesc:0.0019741058349609375 #Python 3.2 + ctypesc:0.002930 8795929 #PyPy 1.9 + ctypesc:0.00287103652954 #PyPy 2.0b1 + ctypes
Here you can compare different results with tables and icons.
|
Merge Sort |
fibonacci |
python 2.7 |
0.191 |
1.187 |
python 2.7 + ctypes |
0.002 |
0.044 |
Python 3.2 |
0.118 |
1.272 |
Python 3.2 + ctypes |
0.002 |
0.046 |
pypy 1.9 |
0.267 |
0.564 |
pypy 1.9 + ctypes |
0.003 |
0.048 |
pypy 2.0b1 |
0.266 |
0.567 |
pypy 2.0b1 + ctypes |
0.003 |
0.046 |