If you select a scripting language, you have to endure its speed. this statement illustrates to some extent the shortcomings of python as a script, that is, the execution efficiency and performance are not ideal, especially on machines with poor performance, it is necessary to optimize code to improve the execution efficiency of programs. how to optimize Python performance is the main issue discussed in this article. This article involves common code optimization methods, use of performance optimization tools, and how to diagnose code performance bottlenecks. it is hoped that this article can be provided to Python developers for reference.
Common Python code optimization skills
Code optimization can make the program run faster. it makes the program run more efficiently without changing the program running result. according to the 80/20 principle, it usually takes 80% of the workload to implement program refactoring, optimization, expansion, and documentation-related tasks. Optimization usually involves two aspects: reducing the size of the code and improving the code running efficiency.
Improve algorithms and select appropriate data structures
A good algorithm plays a key role in performance. Therefore, the primary point of performance improvement is to improve the algorithm. Sort the time complexity of the algorithm in sequence:
O (1)-> O (lg n)-> O (n ^ 2)-> O (n ^ 3) -> O (n ^ k)-> O (k ^ n)-> O (n !)
Therefore, if the algorithm can be improved in terms of time complexity, the performance improvement is self-evident. However, the improvement of specific algorithms does not fall within the scope of this article. readers can refer to this information on their own. The following content will focus on the selection of data structures.
• Dictionary and list)
The hash table is used in the Python dictionary, so the complexity of the query operation is O (1), while the list is actually an array. in the list, the query needs to traverse the entire list, the complexity is O (n), so the operation dictionaries such as searching and accessing members are faster than list.
Listing 1. code dict. py
The code is as follows:
From time import time
T = time ()
List = ['A', 'B', 'is ', 'Python', 'Jason', 'hello', 'Hill ', 'with', 'phone ', 'test ',
'Dfdf ', 'apple', 'pddf', 'IND ', 'Basic', 'none', 'baecr', 'var', 'bana', 'DD ', 'wrd ']
# List = dict. fromkeys (list, True)
Print list
Filter = []
For I in range (1000000 ):
For find in ['is, 'hat', 'new', 'list', 'old', '.']:
If find not in list:
Filter. append (find)
Print "total run time :"
Print time ()-t
The above code requires about 16.09 seconds to run. If you remove the comments # list = dict. fromkeys (list, True) from the row, convert the list into a dictionary and run it again. The time is about 8.375 seconds, which improves the efficiency by about half. Therefore, it is a good choice to use dict instead of list when multiple data members need to perform frequent searches or accesses.
• Set and list)
The union, intersection, and difference operations of set are faster than the list iteration. Therefore, if the list intersection is involved, the union or difference problem can be converted to the set operation.
Listing 2. intersection of list:
The code is as follows:
From time import time
T = time ()
Lista = [,]
Listb = [2, 4, 6, 9, 23]
Intersection = []
For I in range (1000000 ):
For a in lista:
For B in listb:
If a = B:
Intersection. append ()
Print "total run time :"
Print time ()-t
The running time of the above program is:
Total run time:
38.4070000648
Listing 3. using set to calculate the intersection
The code is as follows:
From time import time
T = time ()
Lista = [,]
Listb = [2, 4, 6, 9, 23]
Intersection = []
For I in range (1000000 ):
List (set (lista) & set (listb ))
Print "total run time :"
Print time ()-t
After changing to set, the running time of the program is reduced to 8.75, which is more than 4 times higher and the running time is greatly shortened. You can use Table 1 for testing.
Table 1. Common set usage
Syntax operation instructions
Set (list1) | set (list2) union contains a new set of all data of list1 and list2
Set (list1) & set (list2) intersection contains a new set of common elements in list1 and list2
Set (list1)-set (list2) difference is a set of elements that appear in list1 but not in list2
Loop optimization
The principle for loop optimization is to minimize the amount of computation in the cycle process. if there are multiple cycles, we should try to refer the calculation of the inner layer to the previous layer. The following examples are used to compare the performance improvements brought about by loop optimization. In program List 4, if loop optimization is not performed, the approximate running time is about 132.375.
Listing 4. before cyclic optimization
The code is as follows:
From time import time
T = time ()
Lista = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Listb = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.01]
For I in range (1000000 ):
For a in range (len (lista )):
For B in range (len (listb )):
X = lista [a] + listb [B]
Print "total run time :"
Print time ()-t
Now we will perform the following optimization: we will take the length calculation out of the loop, the range should be replaced by xrange, and the calculation of the third layer lista [a] will be mentioned to the second layer of the loop.
Listing 5. After loop optimization
The code is as follows:
From time import time
T = time ()
Lista = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Listb = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.01]
Len1 = len (lista)
Len2 = len (listb)
For I in xrange (1000000 ):
For a in xrange (len1 ):
Temp = lista [a]
For B in xrange (len2 ):
X = temp + listb [B]
Print "total run time :"
Print time ()-t
The running time of the optimized program is reduced to 102.171999931. In listing 4, the number of lista [a] computations is 1000000*10*10, and the number of times calculated in the optimized code is 1000000*10. the number of computations is greatly reduced, therefore, the performance has been improved.
Make full use of the features of Lazy if-evaluation
In python, the conditional expression is lazy evaluation. that is to say, if the conditional expression if x and y exists, the value of the y expression will not be calculated if x is false. Therefore, this feature can be used to improve program efficiency to a certain extent.
Listing 6. use the features of Lazy if-evaluation
The code is as follows:
From time import time
T = time ()
Abbreviations = ['CF. ', 'E. g. ',' ex. ', 'etc. ',' fig. ',' I. e. ', 'Mr. ','. ']
For I in range (1000000 ):
For w in ('Mr. ', 'hat', 'is', 'chasing', ''The, 'black', 'cat ','.'):
If w in abbreviations:
# If w [-1] = '.' and w in abbreviations:
Pass
Print "total run time :"
Print time ()-t
The running time of the program before optimization is about 8.84. if you use the comment line to replace the first if, the running time is about 6.17.
String optimization
The string object in python cannot be changed. Therefore, any string operations, such as concatenation or modification, will generate a new string object instead of based on the original string, therefore, this continuous copy will affect python performance to a certain extent. Optimizing strings is also an important aspect to improve performance, especially when there are many texts. String optimization mainly involves the following aspects:
1. try to use join () instead of + for string connection. in code listing 7, it takes about 0.125 s to use + for string connection, and 0.016 s to use join. Therefore, join is faster than + in character operations. therefore, try to use join instead of +.
Listing 7. use join instead of join strings
The code is as follows:
From time import time
T = time ()
S = ""
List = ['A', 'B', 'B', 'D', 'e', 'e', 'F', 'G', 'H', 'I ', 'j', 'K', 'L', 'M', 'n']
For I in range (10000 ):
For substr in list:
S + = substr
Print "total run time :"
Print time ()-t
Avoid:
The code is as follows:
S = ""
For x in list:
S + = func (x)
Instead, use:
The code is as follows:
Slist = [func (elt) for elt in somelist]
S = "". join (slist)
1. when you can use regular expressions or built-in functions to process strings, select built-in functions. Such as str. isalpha (), str. isdigit (), str. startswith ('X', 'yz'), str. endswith ('X', 'yz '))
2. formatting the characters is faster than directly reading the characters in series. Therefore, you must use
The code is as follows:
Out = "% s" % (head, prologue, query, tail)
Avoid
The code is as follows:
Out = "" + head + prologue + query + tail + ""
Use list comprehension and generator expression)
List parsing is more efficient than rebuilding a new list in a loop, so we can use this feature to improve the running efficiency.
The code is as follows:
From time import time
T = time ()
List = ['A', 'B', 'is ', 'Python', 'Jason', 'hello', 'Hill ', 'with', 'phone ', 'test ',
'Dfdf ', 'apple', 'pddf', 'IND ', 'Basic', 'none', 'baecr', 'var', 'bana', 'DD ', 'wrd ']
Total = []
For I in range (1000000 ):
For w in list:
Total. append (w)
Print "total run time :"
Print time ()-t
Use list parsing:
The code is as follows:
For I in range (1000000 ):
A = [w for w in list]
It takes about 17 s to run the above code directly. Instead, after list resolution is used, the running time is shortened to 9.29 s. It is nearly halved. The generator expression is the new content introduced in 2.4. The syntax is similar to list parsing. However, when processing large data volumes, the generator expression has obvious advantages and does not create a list, it only returns a generator, so it is more efficient. In the above example, the code a = [w for w in list] is changed to a = (w for w in list), and the running time is further reduced to about 2.98 s.
Other optimization skills
1. if you need to exchange the values of two variables, use a, B = B, a instead of using the intermediate variable t = a; a = B; B = t;
The code is as follows:
>>> From timeit import Timer
>>> Timer ("t = a; a = B; B = t", "a = 1; B = 2"). timeit ()
0.25154118749729365
>>> Timer ("a, B = B, a", "a = 1; B = 2"). timeit ()
0.17156677734181258
>>>
1. use xrange instead of range during the loop. using xrange can save a lot of system memory, because xrange () generates only one integer element each time it is called in the sequence. Range () directly returns the complete list of elements, which is unnecessary during loops. In python3, xrange no longer exists. in it, range provides an iterator that can traverse any range of length.
2. use local variables to avoid the "global" keyword. Python accesses local variables much faster than global variables, because this feature can be used to improve performance.
3. if done is not None than the statement if done! = None is faster, and readers can verify it by themselves;
4. you can change the function call method to inline in a time-consuming cycle;
5. use cascading comparison "x <y <z" instead of "x <y and y <z ";
6. while 1 is faster than while True (of course the latter is more readable );
7. the build in function is usually faster. add (a, B) is better than a + B.
Locate program performance bottlenecks
The premise of code optimization is that you need to know where the performance bottleneck is and where the main time for running the program is consumed. you can use some tools to locate complicated code, python has a variety of built-in performance analysis tools, such as profile, cProfile, and hotshot. Profiler is a set of python programs that can describe the performance when the program runs and provide various statistics to help you locate the performance bottleneck of the program. The Python Standard module provides three types of profilers: cProfile, profile, and hotshot.
The use of profile is very simple. you only need to import it before use. The specific example is as follows:
Listing 8. using profile for performance analysis
The code is as follows:
Import profile
Def profileTest ():
Total = 1;
For I in range (10 ):
Total = Total * (I + 1)
Print Total
Return Total
If _ name _ = "_ main __":
Profile. run ("profileTest ()")
The program running result is as follows:
Figure 1. performance analysis results
The specific explanations of each output column are as follows:
• Ncballs: indicates the number of function calls;
• Tottime: indicates the total running time of the specified function, removing the running time of the sub-function called in the function;
• Percall: (the first percall) is equal to tottime/ncils;
• Cumtime: indicates the time when the function and all its subfunctions call and run, that is, the time when the function starts to call and returns;
• Percall: (the second percall) indicates the average time of one function run, which is equal to cumtime/nccalls;
• Filename: lineno (function): the specific information of each function call;
To save the output as a log, you only need to add another parameter during the call. For example, profile. run ("profileTest ()", "testprof ").
If profile profiling data is saved as a binary file, you can use the pstats module to analyze text reports. it supports multiple forms of report output, is a more practical tool on the text interface. Easy to use:
The code is as follows:
Import pstats
P = pstats. Stats ('testprof ')
P. sort_stats ("name"). print_stats ()
The sort_stats () method can sort the split data and accept multiple sorting fields. for example, sort_stats ('name', 'file') will first sort the data by function name, then sort by file name. Common sorting fields include cils, time, and cumulative. In addition, pstats provides command line interaction tools. after running python-m pstats, you can learn more about the usage through help.
For large applications, it is very practical and intuitive to present performance analysis results in a graphical manner. common visual chemicals include Gprof2Dot, visualpytune, KCacheGrind, etc, you can check the official website on your own. This article will not discuss it in detail.
Python performance optimization tool
In addition to improving algorithms and selecting appropriate data structures, Python performance optimization also involves several key technologies, such as rewriting key python code into a C extension module, or choose an interpreter that is more optimized in performance. these are called optimization tools in this article. Python has many built-in optimization tools, such as Psyco, Pypy, Cython, and Pyrex. These optimization tools have their own merits. This section introduces several types of optimization tools.
Psyco
Psyco is a just-in-time compiler that can improve the performance without changing the source code. Psyco compiles the operation into a slightly optimized machine code, its operations are divided into three different levels: "Runtime", "compile-time", and "virtual time. Increase or decrease the variable level as needed. The runtime variable is only the original bytecode and object structure processed by the regular Python interpreter. Once Psyco compiles the operation into a machine code, the variables will be expressed in the machine registers and memory locations that can be accessed directly. At the same time, python can cache compiled machine codes for reuse in the future, saving a little time. However, Psyco also has its disadvantages, and its operation occupies a large amount of memory. Psyco is not currently supported in python2.7, and no longer provide maintenance and updates, interested in it can refer to the http://psyco.sourceforge.net/
Pypy
PyPy indicates "Python implemented using Python", but it is actually implemented using a Python subset called RPython, which can convert Python code to C ,.. NET, Java, and other language and platform code. PyPy integrates a real-time (JIT) compiler. Unlike many compilers, the interpreter does not care about the lexical analysis and syntax tree of Python code. Because it is written in the Python language, it uses the Python Code Object directly .. Code Object is the representation of Python bytecode, that is, PyPy directly analyzes the bytecode corresponding to the Python Code ,, these bytecode are not stored in the Python runtime environment, nor in a binary format. The current version is 1.8. support different platform installation, install Pypy on windows need to download the https://bitbucket.org/pypy/pypy/downloads/pypy-1.8-win32.zip first, then extract to the relevant directory, and the decompressed path to add to the environment variable path. Run pypy on the command line. if the following error occurs: "MSVCR100.dll is not found. Therefore, this application cannot be started. re-installing the application may fix this problem ", you also need to download VS 2010 runtime libraries on Microsoft's official website to solve the problem. The specific address is http://www.microsoft.com/download/en/details.aspx? Displaylang = en & id = 5555
After the installation is successful, run pypy in the command line. The output is as follows:
The code is as follows:
C: \ Documents Ents and Settings \ Administrator> pypy
Python 2.7.2 (0e28b0000d8b3, Feb 09 2012, 18:31:47)
[PyPy 1.8.0 with MSC v.1500 32 bit] on win32
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ''PyPy is vast, and contains
Multitudes''
>>>>
Taking the loop in listing 5 as an example, run python and pypy respectively. the running results are as follows:
The code is as follows:
C: \ Documents Ents and Settings \ Administrator \ Desktop \ doc \ python> pypy loop. py
Total run time:
8.42199993134
C: \ Documents Ents and Settings \ Administrator \ Desktop \ doc \ python> python loop. py
Total run time:
106.391000032
It can be seen that pypy is used to compile and run programs, which greatly improves the efficiency.
Cython
Cython is a language implemented using python. it can be used to write python extensions. all the libraries written with it can be loaded using import, which is faster than python. In cython, python extensions (such as import math) can be loaded, and header files (such as cdef extern from "math. h "). You can also use it to write python code. Rewrite the key part into the C extension module
Installation of Linux Cpython:
Step 1: Download
The code is as follows:
[Root @ v5254085f259 cpython] # wget-N http://cython.org/release/Cython-0.15.1.zip
-- 22:08:35 -- http://cython.org/release/Cython-0.15.1.zip
Resolving cython.org... 128.208.160.197
Connecting to cython.org | 128.208.160.197 |: 80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2200299 (2.1 M) [application/zip]
Saving to: 'Cython-0.15.1.zip'
100% [============================================== >] 2,200,299 1.96 M/s in 1.1 s
22:08:37 (1.96 MB/s)-'Cython-0.15.1.zip 'saved [2200299/2200299]
Step 2: extract
The code is as follows:
[Root @ v5254085f259 cpython] # unzip-o Cython-0.15.1.zip
Step 3: Install
The code is as follows:
Python setup. py install
Enter cython directly after installation. if the following content appears, the installation is successful.
The code is as follows:
[Root @ v5254085f259 Cython-0.15.1] # cython
Cython is a compiler for code written in
Cython language. Cython is based on Pyrex by Greg Ewing.
Usage: cython [options] sourcefile. {pyx, py }...
Options:
-V, -- version Display version number of cython compiler
-L, -- create-listing Write error messages to a listing file
-I, -- include-dir Search for include files in named directory
(Multiple include directories are allowed ).
-O, -- output-file Specify name of generated C file
-T, -- timestamps Only compile newer source files
-F, -- force Compile all source files (overrides implied-t)
-Q, -- quiet Don't print module names in recursive mode
-V, -- verbose Be verbose, print file names on multiple compil ation
-P, -- embed-positions If specified, the positions in Cython files of each
Function definition is embedded in its docstring.
-- Cleanup
Release interned objects on python exit, for memory debugging.
Level indicates aggressiveness, default 0 releases nothing.
-W, -- working
Sets the working directory for Cython (the directory modules are searched from)
-- Gdb Output debug information for cygdb
-D, -- no-docstrings
Strip docstrings from the compiled module.
-A, -- annotate
Produce a colorized HTML version of the source.
-- Line-directives
Produce # line directives pointing to the. pyx source
-- Cplus
Output a C ++ rather than C file.
-- Embed [=]
Generate a main () function that embeds the Python interpreter.
-2 Compile based on Python-2 syntax and code seman tics.
-3 Compile based on Python-3 syntax and code seman tics.
-- Fast-fail Abort the compilation on the first error
-- Warning-error,-Werror Make all warnings into errors
-- Warning-extra,-Wextra Enable extra warnings
-X, -- directive =
[,
Installation on other platforms can refer to the documentation: http://docs.cython.org/src/quickstart/install.html
Different from python, Cython code must be compiled first. generally, it takes two phases to compile the pyx file into a. c file, and then compile the. c file into a. so file. There are multiple compilation methods:
• Compile using the command line:
If the following test code is available, use the command line to compile it into a. c file.
The code is as follows:
Def sum (int a, int B ):
Print a + B
[Root @ v5254085f259 test] # cython sum. pyx
[Root @ v5254085f259 test] # ls
Total 76
4 drwxr-xr-x 2 root 4096 Apr 17.
4 drwxr-xr-x 4 root 4096 Apr 16 ..
4-rw-r -- 1 root 35 Apr 17 1
60-rw-r -- 1 root 55169 Apr 17 sum. c
4-rw-r -- 1 root 35 Apr 17 sum. pyx
Compile the. so file using gcc on linux:
The code is as follows:
[Root @ v5254085f259 test] # gcc-shared-pthread-fPIC-fwrapv-O2
-Wall-fno-strict-aliasing-I/usr/include/python2.4-o sum. so sum. c
[Root @ v5254085f259 test] # ls
Total 96
4 drwxr-xr-x 2 root 4096 Apr 17.
4 drwxr-xr-x 4 root 4096 Apr 16 ..
4-rw-r -- 1 root 35 Apr 17 1
60-rw-r -- 1 root 55169 Apr 17 sum. c
4-rw-r -- 1 root 35 Apr 17 sum. pyx
20-rwxr-xr-x 1 root 20307 Apr 17 sum. so
• Use distutils for compilation
Create a setup. py script:
The code is as follows:
From distutils. core import setup
From distutils. extension import Extension
From Cython. Distutils import build_ext
Ext_modules = [Extension ("sum", ["sum. pyx"])]
Setup (
Name = 'sum app ',
Using Class = {'build _ ext ': build_ext },
Ext_modules = ext_modules
)
[Root @ v5254085f259 test] # python setup. py build_ext -- inplace
Running build_ext
Cythoning sum. pyx to sum. c
Building 'sum' extension
Gcc-pthread-fno-strict-aliasing-fPIC-g-O2-DNDEBUG-g-fwrapv-O3
-Wall-Wstrict-prototypes-fPIC-I/opt/ActivePython-2.7/include/python2.7
-C sum. c-o build/temp. linux-x86_64-2.7/sum. o
Gcc-pthread-shared build/temp. linux-x86_64-2.7/sum. o
-O/root/cpython/test/sum. so
After compilation, you can import it to python for use:
The code is as follows:
[Root @ v5254085f259 test] # python
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
Python 2.7.2 (default, Jun 24 2011, 11:24:26)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> Import pyximport; pyximport. install ()
>>> Import sum
>>> Sum. sum (1, 3)
Below is a simple performance comparison:
Listing 9. Cython test code
The code is as follows:
From time import time
Def test (int n ):
Cdef int a = 0
Cdef int I
For I in xrange (n ):
A + = I
Return
T = time ()
Test (10000000)
Print "total run time :"
Print time ()-t
Test results:
The code is as follows:
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> Import pyximport; pyximport. install ()
>>> Import ctest
Total run time:
0.00714015960693
Listing 10. Python test code
The code is as follows:
From time import time
Def test (n ):
A = 0;
For I in xrange (n ):
A + = I
Return
T = time ()
Test (10000000)
Print "total run time :"
Print time ()-t
[Root @ v5254085f259 test] # python test. py
Total run time:
0.971596002579
From the comparison above, we can see that the speed of using Cython has increased by nearly 100 times.
Summary
This article discusses common python performance optimization techniques and how to use tools to locate and analyze program performance bottlenecks, and provides tools or languages for performance optimization, we hope to provide more references.