How to optimize the performance of Python is the main problem discussed in this paper. This article will cover common code optimization methods, the use of performance optimization tools and how to diagnose the performance bottlenecks of the code, and so on, hoping to give Python developers a certain reference.
Common tips for Python code optimization
Code optimization enables programs to run faster, making programs more efficient without altering the results of programs running, and according to the 80/20 principle, implementing refactoring, optimization, scaling, and documentation-related things usually consumes 80% of the effort. Optimization typically involves two things: reducing the size of your code and increasing the efficiency of your code.
Improved algorithm to select the appropriate data structure
A good algorithm can play a key role in performance, so the primary point of performance improvement is to improve the algorithm. The time complexity of the algorithm in order is:
O (1)-> O (lg N)-> o (n lg N)-> O (n^2)-> O (n^3)-> O (n^k)-> O (k^n)-> O (n!)
Therefore, if the algorithm can be improved in time complexity, the improvement of performance is self-evident. However, the improvement of the specific algorithm is not the scope of this article, the reader can refer to this information. The following sections will focus on the selection of data structures.
• Dictionaries (dictionary) and lists (list)
The Python dictionary uses a hash table, so the lookup operation has a complexity of O (1), and the list is actually a number of arrays, in which the lookup needs to traverse the entire list, whose complexity is O (n), so that the operator's lookup access is faster than the list.
Listing 1. Code dict.py
Copy Code code as follows:
From time Import time
T = time ()
list = [' A ', ' B ', ' is ', ' python ', ' Jason ', ' Hello ', ' hill ', ' with ', ' phone ', ' test ',
' Dfdf ', ' apple ', ' PDDF ', ' ind ', ' basic ', ' none ', ' baecr ', ' var ', ' Bana ', ' dd ', ' WRD '
#list = Dict.fromkeys (list,true)
Print List
Filter = []
For I in Range (1000000):
For find in [' Are ', ' hat ', ' new ', ' list ', ' old ', '. ']:
If find isn't in list:
Filter.append (Find)
Print "Total run time:"
Print time ()-T
The above code will probably take 16.09seconds to run. If you remove the comment for the row #list = Dict.fromkeys (list,true), and then run the list after converting it to a dictionary, the time is approximately 8.375 seconds, and the efficiency is probably increased by half. Therefore, it is a good choice to use dict instead of list when multiple data members are frequently searched or accessed.
• Collection (set) vs. lists (list)
Set union, the intersection,difference operation is faster than the iteration of the list. So if it involves finding the list intersection, the problem of the set or the difference can be converted to set to operate.
Listing 2. Find the intersection of list:
Copy Code code as follows:
From time Import time
T = time ()
LISTA=[1,2,3,4,5,6,7,8,9,13,34,53,42,44]
LISTB=[2,4,6,9,23]
Intersection=[]
For I in Range (1000000):
For a in Lista:
For B in Listb:
If a = = B:
Intersection.append (a)
Print "Total run time:"
Print time ()-T
The running time of the above program is probably:
Total run Time:
38.4070000648
Listing 3. Using set to find intersection
Copy Code code as follows:
From time Import time
T = time ()
LISTA=[1,2,3,4,5,6,7,8,9,13,34,53,42,44]
LISTB=[2,4,6,9,23]
Intersection=[]
For I in Range (1000000):
List (set (Lista) &set (LISTB))
Print "Total run time:"
Print time ()-T
The running time of the program was reduced to 8.75 after the set changed to 4 times times, and the running time was shortened greatly. The reader can use the table 1 other actions to test.
Table 1. Set common usage
Syntax Operation instructions
Set (List1) | The set (LIST2) union contains a new set of all data for List1 and List2
Set (List1) & Set (LIST2) intersection contains new collections of the same elements as List1 and List2
Set (List1)-Set (LIST2) difference a collection of elements appearing in List1 but not appearing in List2
Optimization of the cycle
The principle of the optimization of the cycle is to minimize the amount of computation in the cycle, and to have multiple loops as much as possible to refer to the upper layer of the computation. The following examples are used to compare the performance improvement of the cyclic optimization. In Listing 4 of the program, the approximate elapsed time is about 132.375 if the loop optimization is not done.
Listing 4. Before the loop is optimized
Copy Code code as follows:
From time Import time
T = time ()
Lista = [1,2,3,4,5,6,7,8,9,10]
Listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01]
For I in Range (1000000):
For a in range (Len (lista)):
For b in range (Len (LISTB)):
X=LISTA[A]+LISTB[B]
Print "Total run time:"
Print time ()-T
The following optimizations are now made to refer to the length calculation as a loop, and range is replaced with xrange, while the third layer of the computation lista[a] refers to the second layer of the loop.
Listing 5. After loop optimization
Copy Code code as follows:
From time Import time
T = time ()
Lista = [1,2,3,4,5,6,7,8,9,10]
Listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01]
Len1=len (Lista)
Len2=len (LISTB)
For I in Xrange (1000000):
For a in xrange (LEN1):
Temp=lista[a]
For b in Xrange (LEN2):
X=TEMP+LISTB[B]
Print "Total run time:"
Print time ()-T
The optimized program has a shortened running time of 102.171999931. In Listing 4 Lista[a] The number of times calculated is 1000000*10*10, and the number of times calculated in the optimized code is 1000000*10, the number of computations is shortened significantly, so performance improves.
Making full use of the characteristics of Lazy if-evaluation
A conditional expression in Python is lazy evaluation, that is, if there is a conditional expression if x and Y, the value of the Y expression will no longer be evaluated if X is false. Therefore, this feature can be used to improve program efficiency to some extent.
Listing 6. Using the characteristics of Lazy if-evaluation
Copy Code code as follows:
From time Import time
T = time ()
abbreviations = [' cf. ', ' e.g ', ' ex. ', ' etc. ', ' fig. ', ' i.e ', ' Mr. ', ' vs. ']
For I in Range (1000000):
For W in (' Mr. ', ' Hat ', ' are ', ' chasing ', ' the ', ' black ', ' cat ', '. '):
If w in abbreviations:
#if w[-1] = = '. ' And W in abbreviations:
Pass
Print "Total run time:"
Print time ()-T
The program runs about 8.84 times before it is optimized, and if you use a comment line instead of the first if, you run about 6.17.
Optimization of strings
String objects in Python are immutable, so manipulating any string, such as stitching, modifying, and so on, will produce a new string object instead of the original string, so this continuous copy can affect Python's performance to some extent. Optimization of strings is also an important aspect of improving performance, especially in cases where more text is processed. The optimization of strings is mainly focused on the following aspects:
1. Use Join () instead of + when using string concatenation: In Listing 7, use + for string concatenation probably requires 0.125 s, and a join is shortened to 0.016s. Therefore, the join is faster than the + in the operation of the character, so try to use join instead of +.
Listing 7. Use join instead of + connection string
Copy Code code as follows:
From time Import time
T = time ()
s = ""
list = [' A ', ' B ', ' B ', ' d ', ' e ', ' f ', ' g ', ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n ']
For I in range (10000):
For substr in list:
s+= substr
Print "Total run time:"
Print time ()-T
At the same time to avoid:
Copy Code code as follows:
s = ""
For x in list:
S + + func (x)
Instead, you use:
Copy Code code as follows:
Slist = [Func (ELT) for ELT in Somelist]
s = "". Join (Slist)
1. When you can use regular expressions or built-in functions to handle strings, select the built-in functions. such as Str.isalpha (), Str.isdigit (), Str.startswith (' x ', ' yz '), Str.endswith ((' x ', ' YZ '))
2. Formatting characters is faster than direct inline reading, so use the
Copy Code code as follows:
out = "%s%s%s%s"% (head, prologue, query, tail)
and avoid
Copy Code code as follows:
out = "" + head + Prologue + query + tail + ""
Working with List resolution (list comprehension) and builder expressions (generator expression)
List resolution is more efficient than rebuilding a new list in a loop, so we can use this feature to improve the efficiency of the operation.
Copy Code code as follows:
From time Import time
T = time ()
list = [' A ', ' B ', ' is ', ' python ', ' Jason ', ' Hello ', ' hill ', ' with ', ' phone ', ' test ',
' Dfdf ', ' apple ', ' PDDF ', ' ind ', ' basic ', ' none ', ' baecr ', ' var ', ' Bana ', ' dd ', ' WRD '
Total=[]
For I in Range (1000000):
For W in list:
Total.append (W)
Print "Total run time:"
Print time ()-T
Use list resolution:
Copy Code code as follows:
For I in Range (1000000):
A = [w for w in list]
The above code directly runs about 17s, and instead of using list resolution, the elapsed time is shortened to 9.29s. Nearly half the increase. The builder expression is the new content introduced in 2.4, syntax is similar to list resolution, but the advantage of the generator expression in large data processing is obvious, it does not create a list, it simply returns a generator, so it is more efficient. In the example above, the Code a = [w for w in list] is modified to a = (W for w in list), the running time is further reduced, shortened by about 2.98s.
Other optimization techniques
1. If you need to exchange two variables of the value of using A,b=b,a instead of the use of intermediate variable t=a;a=b;b=t;
Copy Code code as follows:
>>> from Timeit import Timer
>>> Timer ("t=a;a=b;b=t", "a=1;b=2"). Timeit ()
0.25154118749729365
>>> Timer ("A,b=b,a", "a=1;b=2"). Timeit ()
0.17156677734181258
>>>
1. Use xrange instead of range when looping, and using xrange can save a lot of system memory because Xrange () produces only one integer element per call in the sequence. Range () Returns the complete list of elements directly, which can be unnecessarily expensive for loops. Xrange no longer exists in Python3, which provides a iterator that can traverse a range of arbitrary lengths.
2. Use local variables to avoid the "global" keyword. Python accesses local variables much faster than global variables, so you can use this feature to improve performance.
3.if do not none more than statement if done!= None faster, readers can authenticate themselves;
4. In a more time-consuming cycle, you can change the function call into an inline method;
5. Use cascade comparison "x < Y < Z" instead of "X < Y and y < Z";
6.while 1 is faster than while True (the latter is more readable);
7.build in functions are usually faster, add (a,b) is better than a+b.
Positioning Program Performance Bottlenecks
The premise of code optimization is the need to understand where the performance bottlenecks, the main time the program runs is consumed, for more complex code can use some tools to locate, Python built a wealth of performance analysis tools, such as Profile,cprofile and hotshot. The Profiler is a python-brought set of programs that describe the performance of the program at run time and provide a variety of statistics to help users locate the program's performance bottlenecks. The Python standard module provides three kinds of profilers:cprofile,profile and hotshot.
Profile is very simple to use, just import it before you use it. Specific examples are as follows:
Listing 8. Profiling by using profile
Copy Code code as follows:
Import profile
Def profiletest ():
Total = 1;
For I in range (10):
total=total* (i+1)
Print Total
Return Total
if __name__ = = "__main__":
Profile.run ("Profiletest ()")
The results of the program's operation are as follows:
Figure 1. Performance Analysis Results
The specific explanations for each column of output are as follows:
Ncalls: Represents the number of function calls;
Tottime: Represents the total elapsed time of the specified function, removing the run-time of the calling child function in the function;
Percall: (first percall) equals tottime/ncalls;
Cumtime: Represents the time that the call of the function and all its child functions runs, that is, when the function starts calling to the time of return;
Percall: (second Percall) is the average time when a function is run, equal to Cumtime/ncalls;
Filename:lineno (function): The specific information of each function call;
If you need to save the output as a log, just add another argument when you call. such as Profile.run ("Profiletest ()", "Testprof").
For profile profiling data, if the binary file to save the results of the time, you can through the Pstats module for text report analysis, it supports a variety of forms of report output, text interface is a more practical tool. Very simple to use:
Copy Code code as follows:
Import pstats
p = pstats. Stats (' testprof ')
P.sort_stats ("name"). Print_stats ()
where the Sort_stats () method can sort the split data and accept multiple sort fields, such as Sort_stats (' name ', ' file ') will first be sorted by the function name, and then sorted by the file name. The common sort fields are calls (the number of calls), time (function internal runtime), cumulative (total elapsed time), and so on. In addition, Pstats also provides command-line interaction tools that enable you to learn more about how to use the Python–m pstats.
For large applications, if the results of performance analysis can be presented graphically, will be very practical and intuitive, common visual tools such as Gprof2dot,visualpytune,kcachegrind, readers can access the relevant official website, this article does not do a detailed discussion.
Python Performance Tuning Tool
Python Performance Optimization In addition to improved algorithms, the selection of appropriate data structures, there are several key technologies, such as the key Python code part to rewrite the C extension module, or to choose more performance-optimized interpreter, etc., these are called optimization tools in this article. Python has many of its own optimization tools, such as Psyco,pypy,cython,pyrex, which have different advantages, and this section chooses several to introduce.
Psyco
Psyco is a just-in-time compiler that can improve performance without changing the source code, Psyco compiles the operation into a somewhat optimized machine code, with operations divided into three different levels, with Run-time, compile-time, and virtual-time variables. and to increase and decrease the level of variables as needed. Run-time variables are simply raw bytecode and object structures that are processed by the regular Python interpreter. Once the Psyco is compiled into machine code, the compile-time variables are represented in machine registers and directly accessible memory locations. At the same time Python can cache the compiled machine code for future reuse, which will save a little time. But Psyco also has its drawbacks, and its own running memory is larger. At present, Psyco has not been supported in python2.7, and no longer provide maintenance and updates, it is interested in the reference http://psyco.sourceforge.net/
PyPy
PyPy represents "Python implemented in Python," but it is actually implemented using a python subset called Rpython, which can turn Python code into code for languages and platforms such as C,. NET, Java, and so on. PyPy integrates a just-in-time (JIT) compiler. Unlike many compilers, interpreters, it doesn't care about the lexical parsing and syntax tree of Python code. Because it's written in Python, it uses the Python language's Code Object directly. The code Object is the representation of the Python bytecode, which means that pypy directly analyzes the bytecode of the Python code, which is not stored in characters or in a binary format in a file, but in the Python runtime environment. The current version is 1.8. Support for different platform installations, Windows installation pypy need to download Https://bitbucket.org/pypy/pypy/downloads/pypy-1.8-win32.zip first, then extract to the relevant directory, and add the extracted path to the environment variable path. To run PyPy on the command line, if the following error occurs: "MSVCR100.dll is not found, so the application could not be started, reinstalling the application may fix the problem", you will also need to download VS. Runtime libraries on Microsoft's official web site to resolve the issue. Specific address for http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5555
After the installation is successful, run PyPy on the command line and the output is as follows:
Copy Code code as follows:
C:\Documents and Settings\administrator>pypy
Python 2.7.2 (0e28b379d8b3, Feb 09 2012, 18:31:47)
[PyPy 1.8.0 with MSC v.1500-bit] on Win32
Type ' help ', ' copyright ', ' credits ' or ' license ' for the more information.
And now for something completely different: "' pypy is vast, and contains
Multitudes '
>>>>
In the example of listing 5, using Python and pypy to run separately, the results are as follows:
Copy Code code as follows:
C:\Documents and settings\administrator\ Desktop \doc\python>pypy loop.py
Total run Time:
8.42199993134
C:\Documents and settings\administrator\ Desktop \doc\python>python loop.py
Total run Time:
106.391000032
Visible use PyPy to compile and run the program, its efficiency is greatly improved.
Cython
Cython is a language implemented in Python that can be used to write Python extensions, and the libraries that are written with it can be loaded through import, and performance is faster than Python. A python extension (such as import math) can be loaded into the Cython, or it can be loaded into the header file of C's library (for example, cdef extern from "math.h"), and it can also be used to write Python code. Rewrite key sections as C extension modules
installation of Linux CPython:
The first step: Download
Copy Code code as follows:
[root@v5254085f259 cpython]# wget-n Http://cython.org/release/Cython-0.15.1.zip
--2012-04-16 22:08:35--Http://cython.org/release/Cython-0.15.1.zip
Resolving cython.org ... 128.208.160.197
Connecting to cython.org|128.208.160.197|:80 ... Connected.
HTTP request sent, awaiting response ... OK
length:2200299 (2.1M) [Application/zip]
Saving to: ' Cython-0.15.1.zip '
100%[======================================>] 2,200,299 1.96m/s in 1.1s
2012-04-16 22:08:37 (1.96 mb/s)-' cython-0.15.1.zip ' saved [2200299/2200299]
The second step: decompression
Copy Code code as follows:
[root@v5254085f259 cpython]# Unzip-o Cython-0.15.1.zip
Step Three: Install
Copy Code code as follows:
When the installation is complete, enter Cython directly, which indicates that the installation is successful if the following occurs.
Copy Code code as follows:
[root@v5254085f259 cython-0.15.1]# Cython
Cython (http://cython.org) is a compiler for code written in the
Cython language. Cython is based on Pyrex by Greg Ewing.
Usage:cython [Options] sourcefile. {Pyx,py} ...
Options:
-V,--version Display version number of Cython compiler
-L,--create-listing Write error messages to a listing file
-I,--include-dir Search for include files in named directory
(multiple include directories are allowed).
-O,--output-file specify name of generated C file
-T,--timestamps only compile newer source files
-F,--force Compile all source files (Overrides implied-t)
-Q,--quiet Don ' t print module names in recursive mode
-V,--verbose is verbose, print file names on multiple compil ation
-P,--embed-positions If specified, the positions in Cython files of each
function definition is embedded in its docstring.
--cleanup
Release interned objects on Python exit, for memory debugging.
Level indicates aggressiveness, default 0 releases nothing.
-W,--working
Sets The working directory for Cython (the directory modules are searched from)
--GDB Output Debug information for Cygdb
-D,--no-docstrings
Strip docstrings from the compiled module.
-A,--annotate
Produce a colorized HTML version of the source.
--line-directives
Produce #line directives pointing to the. Pyx Source
--cplus
Output a C + + rather than c file.
--embed[=]
Generate a main () function that embeds the Python interpreter.
-2 Compile based on Python-2 syntax and code seman tics.
-3 Compile based on Python-3 syntax and code seman tics.
--fast-fail Abort the compilation on the
--warning-error,-werror make all warnings into errors
--warning-extra,-wextra Enable Extra warnings
-X,--directive =
[,
Installation on other platforms can refer to the documentation: http://docs.cython.org/src/quickstart/install.html
Unlike Python, Cython code must be compiled and compiled, typically through two stages, compiling the Pyx file into a. c file, and then compiling the. c file as a. so file. There are several ways to compile:
• Compile from command line:
Suppose you have the following test code, which is compiled using the command line as a. c file.
Copy Code code as follows:
def sum (int a,int b):
Print A+b
[root@v5254085f259 test]# Cython Sum.pyx
[root@v5254085f259 test]# ls
Total 76
4 Drwxr-xr-x 2 root root 4096 Apr 17 02:45.
4 drwxr-xr-x 4 root root 4096 Apr 16 22:20.
4-rw-r--r--1 root Apr 17 02:45 1
60-rw-r--r--1 root 55169 Apr 02:45 sum.c
4-rw-r--r--1 root Apr 02:45 Sum.pyx
Compile with GCC on Linux as. so file:
Copy Code code as follows:
[root@v5254085f259 test]# Gcc-shared-pthread-fpic-fwrapv-o2
-wall-fno-strict-aliasing-i/usr/include/python2.4-o sum.so SUM.C
[root@v5254085f259 test]# ls
Total 96
4 Drwxr-xr-x 2 root root 4096 Apr 17 02:47.
4 drwxr-xr-x 4 root root 4096 Apr 16 22:20.
4-rw-r--r--1 root Apr 17 02:45 1
60-rw-r--r--1 root 55169 Apr 02:45 sum.c
4-rw-r--r--1 root Apr 02:45 Sum.pyx
20-rwxr-xr-x 1 root 20307 Apr 02:47 sum.so
• Compile with Distutils
To create a setup.py script:
Copy Code code as follows:
From Distutils.core Import Setup
From distutils.extension Import extension
From cython.distutils import Build_ext
Ext_modules = [Extension ("Sum", ["Sum.pyx"])]
Setup
name = ' Sum app ',
Cmdclass = {' Build_ext ': Build_ext},
Ext_modules = Ext_modules
)
[root@v5254085f259 test]# python setup.py build_ext--inplace
Running Build_ext
Cythoning Sum.pyx to SUM.C
Building ' sum ' extension
Gcc-pthread-fno-strict-aliasing-fpic-g-o2-dndebug-g-fwrapv-o3
-wall-wstrict-prototypes-fpic-i/opt/activepython-2.7/include/python2.7
-C Sum.c-o BUILD/TEMP.LINUX-X86_64-2.7/SUM.O
Gcc-pthread-shared BUILD/TEMP.LINUX-X86_64-2.7/SUM.O
-o/root/cpython/test/sum.so
After the compilation is complete, you can import it into Python using:
Copy Code code as follows:
[root@v5254085f259 test]# python
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
Python 2.7.2 (default, June 24 2011, 11:24:26)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type ' help ', ' copyright ', ' credits ' or ' license ' for the more information.
>>> import Pyximport; Pyximport.install ()
>>> Import Sum
>>> Sum.sum (1,3)
Here's a simple performance comparison:
Listing 9. Cython Test Code
Copy Code code as follows:
From time Import time
def test (int n):
cdef int a =0
cdef int I
For I in Xrange (n):
a+= I
Return a
T = time ()
Test (10000000)
Print "Total run time:"
Print time ()-T
Test results:
Copy Code code as follows:
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type ' help ', ' copyright ', ' credits ' or ' license ' for the more information.
>>> import Pyximport; Pyximport.install ()
>>> Import CTest
Total run Time:
0.00714015960693
Listing 10. Python Test Code
Copy Code code as follows:
From time Import time
def test (N):
A = 0;
For I in Xrange (n):
a+= I
Return a
T = time ()
Test (10000000)
Print "Total run time:"
Print time ()-T
[root@v5254085f259 test]# python test.py
Total run Time:
0.971596002579
From the above comparison you can see that the speed of using Cython is increased by nearly 100 times.
Summarize
This paper discusses the common performance optimization techniques of Python and how to use tools to locate and analyze the performance bottleneck of the program, and provide the relevant tools or language to optimize the performance, hoping to be more relevant for reference.