Original: Python Code performance Optimization Tips
Common tips for Python code optimization
Code optimization allows the program to run faster, which makes the program more efficient without changing the results of the program, and, according to the 80/20 principle, the process of refactoring, optimizing, extending, and documenting the program typically consumes 80% of the work done. Optimization usually consists of two parts: reducing the volume of the Code and improving the efficiency of the Code.
Improve the algorithm, select the appropriate data structure
A good algorithm can play a key role in performance, so the first point of performance improvement is the improvement of the algorithm. The time complexity of the algorithm is sorted by the following sequence:
O (n^k)-O (k^n)-O (n!), O (n^2), O (n^3) o (n lg N) (1)
Therefore, if the algorithm can be improved in time complexity, the performance improvement is self-evident. However, the improvement of the specific algorithm is not the scope of this article, readers can refer to this information by themselves. The following content will focus on the selection of data structures.
- Dictionaries (dictionary) and lists (list)
The Python dictionary uses a hash table, so the complexity of the lookup operation is O (1), and the list is actually the array, in the list, the lookup needs to traverse the entire list, its complexity is O (n), so the operator dictionary of lookup access to members is faster than list.
Listing 1. Code dict.py
From time import time t = time () list = [' A ', ' B ', ' was ', ' Python ', ' Jason ', ' Hello ', ' hill ', ' with ', ' phone ', ' Test ', ' Dfdf ', ' apple ', ' PDDF ', ' ind ', ' basic ', ' none ', ' baecr ', ' var ', ' Bana ', ' dd ', ' WRD '] #list = Dict.fromkeys (list,true) Print List filter = [] for I in range (1000000): for Find in [' was ', ' hat ', ' new ', ' list ', ' old ', '. ']: If find not in list: filter.append (find) print "Total run time:" Print time ()-T
The above code will probably need 16.09seconds to run. If the line #list = Dict.fromkeys (list,true) Comment is removed, the list is converted to a dictionary and then run, and the time is approximately 8.375 seconds, which is about half the efficiency increase. Therefore, using dict instead of list is a good choice when multiple data members are needed for frequent lookups or visits.
- Collections (set) and lists (list)
Set union, the intersection,difference operation is faster than the iteration of the list. So if a list intersection is involved, the problem with the set or the difference can be converted to set to operate.
Listing 2. To find the intersection of lists:
From time import time t = time () lista=[1,2,3,4,5,6,7,8,9,13,34,53,42,44] listb=[2,4,6,9,23] Intersection=[] for i in range (1000000): For A in Lista: for B in Listb: If a = = B: Intersection.appe nd (a) print "Total run time:" Print time ()-T
The operating time of the above program is roughly:
Total run time: 38.4070000648
Listing 3. Using set to find the intersection
From time import time t = time () lista=[1,2,3,4,5,6,7,8,9,13,34,53,42,44] listb=[2,4,6,9,23] Intersection=[] for i in range (1000000): list (set (Lista) &set (LISTB)) print "Total run time:" Print Time ()-T
The running time of the program is reduced to 8.75, and the running time is greatly shortened after the set is changed to 4 times times. The reader can use the other operations in table 1 to test it yourself.
Table 1. Common usage of Set
Grammar |
Operation |
Description |
Set (List1) | Set (LIST2) |
Union |
A new collection that contains all the data for List1 and List2 |
Set (List1) & Set (LIST2) |
Intersection |
A new set of elements containing List1 and List2 |
Set (List1)-Set (LIST2) |
Difference |
The collection of elements appearing in List1 but not appearing in List2 |
Optimization of the cycle
The principle of the optimization of the cycle is to minimize the amount of computation in the cycle, with multiple loops as far as possible to refer to the upper layer of the calculation of the inner layer. The following examples are compared to improve the performance of the loop optimization. In Listing 4, the approximate run time is approximately 132.375 if the loop optimization is not performed.
Listing 4. Before the loop is optimized
From time import time t = time () lista = [1,2,3,4,5,6,7,8,9,10] Listb =[ 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01] for i in range (1000000): For A in range (Len (lista)): for B in Range (len (LISTB)): x=lista[a]+listb[b] print "Total run time:" Print time ()-T
Now for the following optimization, the length calculation refers to the loop, range is replaced by xrange, and the third layer of the calculation Lista[a] refers to the second layer of the loop.
Listing 5. After loop optimization
From time import time t = time () lista = [1,2,3,4,5,6,7,8,9,10] Listb =[ 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01] len1=len (lista) Len2=len (LISTB) for i in xrange (1000000) : For A in xrange (LEN1): temp=lista[a] for b in Xrange (LEN2): x=temp+listb[b] print "total Run time: "Print time ()-T
The optimized program has a reduced operating time of 102.171999931. In Listing 4, Lista[a] was counted as 1000000*10*10, while the number of times in the optimized code was 1000000*10, and the number of computations was significantly shortened, resulting in improved performance.
Take advantage of Lazy if-evaluation features
The conditional expression in Python is lazy evaluation, that is, if there is a conditional expression if x and Y, the value of the Y-expression will no longer be evaluated if X is false. Therefore, this feature can be used to improve program efficiency to some extent.
Listing 6. Using the features of the Lazy if-evaluation
From time import time t = time () abbreviations = [' cf. ', ' e.g ', ' ex. ', ' etc ', ' fig. ', ' i.e ', ' Mr. ', ' vs. '] For I in Range (1000000): For W in (' Mr. ', ' Hat ', ' was ', ' chasing ', ' the ', ' black ', ' cat ', '. '): if w in Abbreviati ONS: #if w[-1] = = '. ' And W in abbreviations: pass print "Total run time:" Print time ()-T
The program runs approximately 8.84 before optimization, and if the comment line is used instead of the first if, the run time is approximately 6.17.
Optimization of strings
The string object in Python is immutable, so the manipulation of any string, such as stitching, modification, and so on, will result in a new string object, not based on the original string, so this continuous copy will somewhat affect the performance of Python. Optimization of strings is also an important aspect of performance improvement, especially in cases where there is more text to be processed. The optimization of strings is mainly focused on the following aspects:
- use Join () instead of + in String connections: Using + for string connections in Listing 7 will probably require 0.125 s, while using join is shortened to 0.016s. Therefore, the join is faster than + on the operation of the character, so use join instead of + as much as possible. Listing 7. Use join instead of + connection string
from time import time t = time () s = "" list = [' A ', ' B ', ' B ', ' d ', ' e ', ' f ', ' G ' , ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n '] for I in range (10000): for substr in list:s+= substr print "Total run time:" Print Tim E ()-T
To avoid:
S = "" for x in List:s + = func (x)
Instead of using:
Slist = [Func (ELT) for ELT in somelist] s =" ". Join (slist)
- When the string can be processed using regular expressions or built-in functions, select the built-in function. such as Str.isalpha (), Str.isdigit (), Str.startswith ((' x ', ' yz '), Str.endswith ((' x ', ' YZ '))
- to format characters faster than direct concatenation reads, So to use
out = " To avoid
out = "
Using list comprehension and generator expressions (generator expression)
List parsing is more efficient than rebuilding a new list in a loop, so we can use this feature to improve the efficiency of the operation.
From time import time t = time () list = [' A ', ' B ', ' was ', ' Python ', ' Jason ', ' Hello ', ' hill ', ' with ', ' phone ', ' Test ', ' Dfdf ', ' apple ', ' PDDF ', ' ind ', ' basic ', ' none ', ' baecr ', ' var ', ' Bana ', ' dd ', ' WRD '] total=[] for i in range ( 1000000): For W in list: total.append (w) print "Total run time:" Print time ()-T
Use list parsing:
For I in Range (1000000): a = [w to W in list]
The above code runs approximately 17s, and the run time is reduced to 9.29s after using list resolution instead. Nearly half of the increase. The generator expression is the new content introduced in 2.4, syntax and list parsing is similar, but in large data processing, the advantage of the generator expression is more obvious, it does not create a list, just return a generator, so the efficiency is high. In the above example, the Code a = [w for w in list] is modified to a = (W for w in list), the running time is further reduced, shortening about 2.98s.
Other optimization techniques
- If you need to swap the values of two variables using a,b=b,a instead of the intermediate variable t=a;a=b;b=t;
>>> from Timeit import timer >>> timer ("T=a;a=b;b=t", "a=1;b=2"). Timeit () 0.25154118749729365 >>> Timer ("A,b=b,a", "a=1;b=2"). Timeit () 0.17156677734181258 >> >
- Using xrange instead of range while looping, using Xrange can save a lot of system memory because Xrange () produces only one integer element per call in the sequence. The range () will return the complete list of elements directly, with unnecessary overhead for looping. Xrange no longer exists in the Python3, and the inside range provides a iterator that can traverse a range of any length.
- Use local variables to avoid the "global" keyword. Python accesses local variables much faster than global variables, so you can take advantage of this feature to improve performance.
- If-is-not-none is faster than the statement if-done! = None, and the reader can authenticate itself;
- In a time-consuming loop, the function's invocation can be changed into an inline mode;
- Use a cascading comparison of "X < Y < Z" instead of "X < Y and y < Z";
- While 1 is faster than while True (the latter is, of course, more readable);
- The build in function is usually faster, and add (a, b) is better than a+b.
Back to top of page
Locating program Performance Bottlenecks
The prerequisite for code optimization is to understand where the performance bottleneck is, where the main time of the program is consumed, and for more complex code to be located with tools, Python has built-in rich performance analysis tools such as Profile,cprofile and hotshot. The Profiler is a set of Python-brought programs that describe the performance of the program at runtime and provide various statistics to help the user locate the program's performance bottleneck. The Python standard module offers three types of profilers:cprofile,profile and hotshot.
Profile is very simple to use, just need to import before use. The concrete examples are as follows:
Listing 8. Profiling with profile
Import Profile def profiletest (): Total =1; For I in range: total=total* (i+1), print total return, total if __name__ = = "__main__": Profile.run ("Profiletest ()")
The running results of the program are as follows:
Figure 1. Performance Analysis Results
The specific explanations for each column of the output are as follows:
- Ncalls: Indicates the number of times the function was called;
- Tottime: Indicates the total elapsed time of the specified function, and removes the run time of the calling child function in the function;
- Percall: (the first percall) equals tottime/ncalls;
- Cumtime: Represents the time that the call to the function and all its child functions runs, that is, when the function starts calling to the return time;
- Percall: (the second percall) is the average time that a function is run, equal to Cumtime/ncalls;
- Filename:lineno (function): The specific information of each function call;
If you need to save the output in the form of a log, simply add another parameter when you call it. such as Profile.run ("Profiletest ()", "Testprof").
For profile profiling data, if you save the results as a binary file, you can use the Pstats module for Text report analysis, which supports a variety of forms of report output, the text interface is a more practical tool. Very simple to use:
Import pstats p = pstats. Stats (' testprof ') p.sort_stats ("name"). Print_stats ()
where the Sort_stats () method can sort the split data, it can accept multiple sort fields, such as Sort_stats (' name ', ' file ') sorted first by function name and then by file name. The common sort fields are calls (the number of times they are called), time (the function's internal runtime), cumulative (total elapsed time), and so on. In addition, Pstats also provides command-line interaction tools that can be used by help to learn more about how to use python–m pstats after execution.
For large applications, if the results of the performance analysis can be presented graphically, it will be very practical and intuitive, common visual tools such as Gprof2dot,visualpytune,kcachegrind, readers can consult the relevant official website, this article does not do a detailed discussion.
Back to top of page
Python Performance Optimization Tool
In addition to improving algorithms and choosing the right data structures, Python performance optimization has several key techniques, such as rewriting key Python code parts to C extension modules, or choosing an interpreter that is more optimized for performance, which is called the Optimization tool in this article. Python has a lot of its own optimization tools, such as Psyco,pypy,cython,pyrex, and so on, these optimization tools are different, this section selected several to introduce.
Psyco
Psyco is a just-in-time compiler, it can improve the performance without changing the source code, Psyco to compile the operation into a somewhat optimized machine code, its operation is divided into three different levels, there are "runtime", "compile-time" and "virtual-time" variables. and to increase and decrease the level of variables as needed. Run-time variables are just raw bytecode and object structures that are processed by the regular Python interpreter. Once Psyco compiles the operation into a machine code, the compile-time variable is represented in the machine register and in the memory location where it can be accessed directly. At the same time Python can cache the compiled machine code for future reuse, which saves a little time. But Psyco also has its drawbacks, which itself runs on a larger memory footprint. Currently Psyco is not supported in python2.7 and is no longer available for maintenance and updates, and it is interesting to refer to http://psyco.sourceforge.net/
PyPy
PyPy represents "python-implemented Python", but in fact it is implemented using a Python subset called Rpython, capable of turning Python code into code for languages such as C,. NET, Java, and so on. The PyPy integrates an instant (JIT) compiler. Unlike many compilers and interpreters, it does not care about the lexical analysis and syntax trees of Python code. Because it is written in the Python language, it uses the Python language's Code Object directly. The code Object is the representation of the Python bytecode, meaning that pypy directly parses the bytecode corresponding to the Python code, which is not stored in characters or in some binary format in a file, but in a Python runtime environment. The current version is 1.8. Support for different platform installation, install PyPy on Windows need first download https://bitbucket.org/pypy/pypy/downloads/ Pypy-1.8-win32.zip, then unzip to the relevant directory and add the extracted path to the environment variable path. Run pypy at the command line, if the following error occurs: "MSVCR100.dll is not found, so this application does not start, reinstalling the application may fix the problem", you also need to download the VS. Runtime libraries on Microsoft's official website to resolve the issue. Specific address is http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5555
After successful installation, run PyPy on the command line and the output is as follows:
C:\Documents and Settings\administrator>pypy Python 2.7.2 (0e28b379d8b3, Feb, 18:31:47) [PyPy 1.8.0 With MSC v.1500 + bit] on Win32 Type ' help ', ' copyright ', ' credits ' or ' license ' for more information. And now for something completely different: "PyPy is vast, and contains multitudes ' >>>>
In the example of the loop in Listing 5, using Python and pypy to run separately, the results are as follows:
C:\Documents and settings\administrator\ desktop \doc\python>pypy loop.py total run time: 8.42199993134 \ C Documents and settings\administrator\ desktop \doc\python>python loop.py total run time: 106.391000032
Can be seen using PyPy to compile and run the program, its efficiency greatly improved.
Cython
Cython is a Python-implemented language that can be used to write Python extensions, and the libraries it writes out can be loaded with import, faster than Python's. Cython can load Python extensions (such as import math), or load the header file of C's library (for example: cdef extern from "math.h"), or it can be used to write Python code. Rewrite critical sections into C expansion modules
Installation of Linux Cpython:
First step: Download
[Email protected] cpython]# wget-n http://cython.org/release/Cython-0.15.1.zip --2012-04-16 22:08:35-- Http://cython.org/release/Cython-0.15.1.zip Resolving cython.org ... 128.208.160.197 connecting to cython.org|128.208.160.197|:80 ... connected . HTTP request sent, awaiting response ... $ OK length:2200299 (2.1M) [Application/zip] saving to: ' Cython-0.15.1.zip ' 100%[========================= =============>] 2,200,299 1.96m/s in 1.1s 2012-04-16 22:08:37 (1.96 mb/s)-' cython-0.15.1.zip ' saved [ 2200299/2200299]
Step Two: Unzip
[Email protected] cpython]# Unzip-o cython-0.15.1.zip
Step Three: Install
Python setup.py Install
When the installation is complete, enter Cython directly and the installation is successful if the following appears.
[[email protected] cython-0.15.1]# Cython Cython (http://cython.org) is a compiler for code written in the Cython Language. Cython is based on Pyrex by Greg Ewing. Usage:cython [Options] sourcefile. {Pyx,py} ... Options:-V,--version Display version number of Cython compiler-l,--create-listing Write Error messages to a listing file-i,--include-dir <directory> Search for include files in named directory (Multiple include directories is allowed). -O,--output-file <filename> specify name of generated C file-t,--timestamps only compile newer SOURCE Files-f,--force Compile all source files (Overrides implied-t)-Q,--quiet Don ' t print module names in recursive Mode-v,--verbose is verbose, print file names on multiple Compil ation-p,--embed-positions If specified, the positions in Cython files ofEach function definition was embedded in its docstring. --cleanup <level> Release interned objects on Python exit, for memory debugging. Level indicates aggressiveness, the default 0 releases nothing. -W,--working <directory> sets the working directory for Cython (the directory modules is searched from)--gdb Output Debug information for cygdb-d,--no-docstrings Strip docstrings from the compiled module. -A,--annotate produce a colorized HTML version of the source. --line-directives produce #line directives pointing to the. Pyx source--cplus Output a C + + Rather than C file. --embed[=<method_name>] Generate a main () function that embeds the Python interpreter. -2 Compile based on Python-2 syntax and code seman tics. -3 Compile based on Python-3 syntax and code seman tics. --fast-fail Abort The compilation on the first error--warning-error,-werRor make all warnings to errors--warning-extra,-wextra Enable extra warnings-x,--directive <name& gt;=<value> [, <name=value,...] Overrides a compiler directive
Installation on other platforms can be referenced in the documentation: http://docs.cython.org/src/quickstart/install.html
Unlike Python, the Cython code must be compiled first, compiled in two stages, compile the Pyx file into a. c file, and then compile the. c file into a. so file. There are several ways to compile:
- Compile by command line:
Suppose you have the following test code, which is compiled to a. c file using the command line.
def sum (int a,int b): print a+b [[email protected] test]# Cython sum.pyx [[email protected] test]# ls tot Al 4 drwxr-xr-x 2 root root 4096 Apr 02:45. 4 drwxr-xr-x 4 root root 4096 Apr 22:20. 4-rw-r--r--1 root root 02:45 1 60-rw-r--r--1 root root 55169 Apr 02:45 sum.c 4-rw-r--r-- 1 Root root ( Apr) 02:45 Sum.pyx
Use GCC to compile to. so files on Linux:
[[email protected] test]# Gcc-shared-pthread-fpic-fwrapv-o2-wall-fno-strict-aliasi Ng-i/usr/include/python2.4-o sum.so SUM.C [[email protected] test]# ls total 4 drwxr-xr-x 2 root root 4096 A PR 17 02:47. 4 drwxr-xr-x 4 root root 4096 Apr 16 22:20. 4-rw-r--r--1 root root 02:45 1 60-rw-r--r--1 root root 55169 Apr 02:45 sum.c 4-rw-r--r--1 root Roo T Apr 02:45 sum.pyx 20-rwxr-xr-x 1 root root 20307 Apr 02:47 sum.so
- Compiling with Distutils
To create a setup.py script:
From Distutils.core Import Setup from distutils.extension import extension from cython.distutils import build_ Ext ext_modules = [Extension ("Sum", ["Sum.pyx"])] setup ( name = ' Sum app ', cmdclass = {' Build_ext ': Build_ext}, ext_modules = ext_modules ) [[email protected] test]# python setup.py build_ext--inplace running Build_ext cythoning sum.pyx to sum.c building ' sum ' extension gcc-pthread - Fno-strict-aliasing-fpic-g-o2-dndebug-g-fwrapv-o3 -wall-wstrict-prototypes-fpic-i/opt/activepython-2.7/ include/python2.7- c sum.c-o build/temp.linux-x86_64-2.7/sum.o gcc-pthread-shared build/temp.linux-x86_ 64-2.7/SUM.O -o/root/cpython/test/sum.so
After the compilation is complete, you can import into Python using:
[email protected] test]# python activepython 2.7.2.5 (ActiveState software INC) based on python 2.7.2 (default, June, 11:24:26) [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 Type ' help ', ' copyright ', ' credits ' or ' L Icense "For more information. >>> import Pyximport; Pyximport.install () >>> import sum >>> sum.sum (1,3)
Here's a simple performance comparison:
Listing 9. Cython Test Code
From time import time def test (int n): cdef int a =0 cdef int i for i in Xrange (n): a+= i retur n a t = time () test (10000000) print "Total run time:" Print time ()-T
Test results:
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 Type ' help ', ' copyright ', ' credits ' or ' license ' for more information . >>> import Pyximport; Pyximport.install () >>> import ctest Total run time: 0.00714015960693
Listing 10. Python Test Code
From time import time def test (n): a =0; For I in Xrange (n): a+= i return a t = time () test (10000000) print "Total run time:" Print time ()-T [[email protected] test]# python test.py Total run time: 0.971596002579
From the above comparison, we can see that the speed of using Cython has increased by nearly 100 times.
Back to top of page
Summarize
This paper discusses the common performance optimization techniques of Python and how to use the tools to locate and analyze the performance bottleneck of the program, and provides some tools or languages that can be optimized for performance, hoping to be more relevant.
Python Code Performance Tuning Tips (RPM)