Python Code performance Optimization Tips sharing

Source: Internet
Author: User
Tags new set
How to optimize the performance of Python is the main problem discussed in this paper. This article will cover the common code optimization methods, the use of performance optimization tools and how to diagnose the code performance bottlenecks, and so on, I hope you can give a reference to Python developers.

Common tips for Python code optimization
Code optimization allows the program to run faster, which makes the program more efficient without changing the results of the program, and, according to the 80/20 principle, the process of refactoring, optimizing, extending, and documenting the program typically consumes 80% of the work done. Optimization usually consists of two parts: reducing the volume of the Code and improving the efficiency of the Code.
Improve the algorithm, select the appropriate data structure
A good algorithm can play a key role in performance, so the first point of performance improvement is the improvement of the algorithm. The time complexity of the algorithm is sorted by the following sequence:
O (n^k)-O (k^n)-O (n!), O (n^2), O (n^3) o (n lg N) (1)
Therefore, if the algorithm can be improved in time complexity, the performance improvement is self-evident. However, the improvement of the specific algorithm is not the scope of this article, readers can refer to this information by themselves. The following content will focus on the selection of data structures.
• Dictionaries (dictionary) and lists (list)
The Python dictionary uses a hash table, so the complexity of the lookup operation is O (1), and the list is actually the array, in the list, the lookup needs to traverse the entire list, its complexity is O (n), so the operator dictionary of lookup access to members is faster than list.

Listing 1. Code dict.py
Copy CodeThe code is as follows:


From time Import time
T = time ()
list = [' A ', ' B ', ' is ', ' python ', ' Jason ', ' Hello ', ' hill ', ' with ', ' phone ', ' test ',
' Dfdf ', ' apple ', ' PDDF ', ' ind ', ' basic ', ' none ', ' baecr ', ' var ', ' Bana ', ' dd ', ' WRD ']
#list = Dict.fromkeys (list,true)
Print List
Filter = []
For I in Range (1000000):
For find in [' was ', ' hat ', ' new ', ' list ', ' old ', '. ']:
If find not in list:
Filter.append (Find)
Print "Total run time:"
Print time ()-T


The above code will probably need 16.09seconds to run. If the line #list = Dict.fromkeys (list,true) Comment is removed, the list is converted to a dictionary and then run, and the time is approximately 8.375 seconds, which is about half the efficiency increase. Therefore, using dict instead of list is a good choice when multiple data members are needed for frequent lookups or visits.
• Collections (set) and lists (list)
Set union, the intersection,difference operation is faster than the iteration of the list. So if a list intersection is involved, the problem with the set or the difference can be converted to set to operate.

Listing 2. To find the intersection of lists:
Copy CodeThe code is as follows:


From time Import time
T = time ()
LISTA=[1,2,3,4,5,6,7,8,9,13,34,53,42,44]
LISTB=[2,4,6,9,23]
Intersection=[]
For I in Range (1000000):
For a in Lista:
For B in Listb:
If a = = B:
Intersection.append (a)

Print "Total run time:"
Print time ()-T


The operating time of the above program is roughly:
Total run Time:
38.4070000648
Listing 3. Using set to find the intersection
Copy CodeThe code is as follows:


From time Import time
T = time ()
LISTA=[1,2,3,4,5,6,7,8,9,13,34,53,42,44]
LISTB=[2,4,6,9,23]
Intersection=[]
For I in Range (1000000):
List (set (Lista) &set (LISTB))
Print "Total run time:"
Print time ()-T


The running time of the program is reduced to 8.75, and the running time is greatly shortened after the set is changed to 4 times times. The reader can use the other operations in table 1 to test it yourself.
Table 1. Common usage of Set
Syntax operating instructions
Set (List1) | Set (LIST2) union contains a new collection of all data for List1 and List2
Set (List1) & Set (LIST2) intersection contains a new set of elements of List1 and List2
Set (List1)-Set (LIST2) difference The collection of elements appearing in List1 but not appearing in List2

Optimization of the cycle
The principle of the optimization of the cycle is to minimize the amount of computation in the cycle, with multiple loops as far as possible to refer to the upper layer of the calculation of the inner layer. The following examples are compared to improve the performance of the loop optimization. In Listing 4, the approximate run time is approximately 132.375 if the loop optimization is not performed.

Listing 4. Before the loop is optimized
Copy CodeThe code is as follows:


From time Import time
T = time ()
Lista = [1,2,3,4,5,6,7,8,9,10]
Listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01]
For I in Range (1000000):
For a in range (Len (lista)):
For b in range (Len (LISTB)):
X=LISTA[A]+LISTB[B]
Print "Total run time:"
Print time ()-T


Now for the following optimization, the length calculation refers to the loop, range is replaced by xrange, and the third layer of the calculation Lista[a] refers to the second layer of the loop.

Listing 5. After loop optimization
Copy CodeThe code is as follows:


From time Import time
T = time ()
Lista = [1,2,3,4,5,6,7,8,9,10]
Listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01]
Len1=len (Lista)
Len2=len (LISTB)
For I in Xrange (1000000):
For a in xrange (LEN1):
Temp=lista[a]
For b in Xrange (LEN2):
X=TEMP+LISTB[B]
Print "Total run time:"
Print time ()-T


The optimized program has a reduced operating time of 102.171999931. In Listing 4, Lista[a] was counted as 1000000*10*10, while the number of times in the optimized code was 1000000*10, and the number of computations was significantly shortened, resulting in improved performance.
Take advantage of Lazy if-evaluation features
The conditional expression in Python is lazy evaluation, that is, if there is a conditional expression if x and Y, the value of the Y-expression will no longer be evaluated if X is false. Therefore, this feature can be used to improve program efficiency to some extent.

Listing 6. Using the features of the Lazy if-evaluation
Copy CodeThe code is as follows:


From time Import time
T = time ()
abbreviations = [' cf. ', ' e.g ', ' ex. ', ' etc ', ' fig. ', ' i.e ', ' Mr. ', ' vs. ']
For I in Range (1000000):
For W in (' Mr. ', ' Hat ', ' is ', ' chasing ', ' the ', ' black ', ' cat ', '. '):
If w in abbreviations:
#if w[-1] = = '. ' And W in abbreviations:
Pass
Print "Total run time:"
Print time ()-T


The program runs approximately 8.84 before optimization, and if the comment line is used instead of the first if, the run time is approximately 6.17.
Optimization of strings
The string object in Python is immutable, so the manipulation of any string, such as stitching, modification, and so on, will result in a new string object, not based on the original string, so this continuous copy will somewhat affect the performance of Python. Optimization of strings is also an important aspect of performance improvement, especially in cases where there is more text to be processed. The optimization of strings is mainly focused on the following aspects:
1. Use Join () instead of + when using string connections: Using + for string connections in code listing 7 will probably require 0.125 s, while using join shortens to 0.016s. Therefore, the join is faster than + on the operation of the character, so use join instead of + as much as possible.
Listing 7. Use join instead of + connection string
Copy CodeThe code is as follows:


From time Import time
T = time ()
s = ""
list = [' A ', ' B ', ' B ', ' d ', ' e ', ' f ', ' g ', ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n ']
For I in range (10000):
For substr in list:
s+= substr
Print "Total run time:"
Print time ()-T


Also to avoid:
Copy CodeThe code is as follows:


s = ""
For x in list:
s + = func (x)


Instead, you use:
Copy CodeThe code is as follows:


Slist = [Func (ELT) for ELT in Somelist]
s = "". Join (Slist)


1. Select the built-in function when the string can be processed using regular expressions or built-in functions. such as Str.isalpha (), Str.isdigit (), Str.startswith ((' x ', ' yz '), Str.endswith ((' x ', ' YZ '))
2. Formatting characters is faster than reading directly in tandem, so use the
Copy CodeThe code is as follows:

out = "%s%s%s%s"% (head, prologue, query, tail)


and avoid
Copy CodeThe code is as follows:

out = "" + head + Prologue + query + tail + ""


Using list comprehension and generator expressions (generator expression)
List parsing is more efficient than rebuilding a new list in a loop, so we can use this feature to improve the efficiency of the operation.
Copy CodeThe code is as follows:


From time Import time
T = time ()
list = [' A ', ' B ', ' is ', ' python ', ' Jason ', ' Hello ', ' hill ', ' with ', ' phone ', ' test ',
' Dfdf ', ' apple ', ' PDDF ', ' ind ', ' basic ', ' none ', ' baecr ', ' var ', ' Bana ', ' dd ', ' WRD ']
Total=[]
For I in Range (1000000):
For W in list:
Total.append (W)
Print "Total run time:"
Print time ()-T


Use list parsing:
Copy CodeThe code is as follows:


For I in Range (1000000):
A = [w for w in list]


The above code runs approximately 17s, and the run time is reduced to 9.29s after using list resolution instead. Nearly half of the increase. The generator expression is the new content introduced in 2.4, syntax and list parsing is similar, but in large data processing, the advantage of the generator expression is more obvious, it does not create a list, just return a generator, so the efficiency is high. In the above example, the Code a = [w for w in list] is modified to a = (W for w in list), the running time is further reduced, shortening about 2.98s.
Other optimization techniques
1. If you need to swap the values of two variables use a,b=b,a instead of the intermediate variable t=a;a=b;b=t;
Copy CodeThe code is as follows:


>>> from Timeit import Timer
>>> Timer ("t=a;a=b;b=t", "a=1;b=2"). Timeit ()
0.25154118749729365
>>> Timer ("A,b=b,a", "a=1;b=2"). Timeit ()
0.17156677734181258
>>>


1. Use xrange instead of range when looping, use Xrange to save a lot of system memory because Xrange () produces only one integer element per call in the sequence. The range () will return the complete list of elements directly, with unnecessary overhead for looping. Xrange no longer exists in the Python3, and the inside range provides a iterator that can traverse a range of any length.
2. Use local variables to avoid the "global" keyword. Python accesses local variables much faster than global variables, so you can take advantage of this feature to improve performance.
3.if do is not none faster than the statement if done! = None, and the reader can authenticate itself;
4. In a time-consuming cycle, the function invocation can be changed into an inline mode;
5. Use a cascading comparison of "X < Y < Z" instead of "X < Y and y < Z";
6.while 1 is faster than while True (although the latter is better readable);
The 7.build in function is usually faster, and add (a, b) is better than a+b.
Locating program Performance Bottlenecks
The prerequisite for code optimization is to understand where the performance bottleneck is, where the main time of the program is consumed, and for more complex code to be located with tools, Python has built-in rich performance analysis tools such as Profile,cprofile and hotshot. The Profiler is a set of Python-brought programs that describe the performance of the program at runtime and provide various statistics to help the user locate the program's performance bottleneck. The Python standard module offers three types of profilers:cprofile,profile and hotshot.
Profile is very simple to use, just need to import before use. The concrete examples are as follows:

Listing 8. Profiling with profile
Copy CodeThe code is as follows:


Import profile
Def profiletest ():
Total = 1;
For I in range (10):
total=total* (i+1)
Print Total
Return Total
if __name__ = = "__main__":
Profile.run ("Profiletest ()")


The running results of the program are as follows:
Figure 1. Performance Analysis Results

The specific explanations for each column of the output are as follows:

Ncalls: Indicates the number of times the function was called;
Tottime: Indicates the total elapsed time of the specified function, and removes the run time of the calling child function in the function;
Percall: (the first percall) equals tottime/ncalls;
Cumtime: Represents the time that the call to the function and all its child functions runs, that is, when the function starts calling to the return time;
Percall: (the second percall) is the average time that a function is run, equal to Cumtime/ncalls;
Filename:lineno (function): The specific information of each function call;
If you need to save the output in the form of a log, simply add another parameter when you call it. such as Profile.run ("Profiletest ()", "Testprof").

For profile profiling data, if you save the results as a binary file, you can use the Pstats module for Text report analysis, which supports a variety of forms of report output, the text interface is a more practical tool. Very simple to use:
Copy CodeThe code is as follows:


Import pstats
p = pstats. Stats (' testprof ')
P.sort_stats ("name"). Print_stats ()

where the Sort_stats () method can sort the split data, it can accept multiple sort fields, such as Sort_stats (' name ', ' file ') sorted first by function name and then by file name. The common sort fields are calls (the number of times they are called), time (the function's internal runtime), cumulative (total elapsed time), and so on. In addition, Pstats also provides command-line interaction tools that can be used by help to learn more about how to use python–m pstats after execution.

For large applications, if the results of the performance analysis can be presented graphically, it will be very practical and intuitive, common visual tools such as Gprof2dot,visualpytune,kcachegrind, readers can consult the relevant official website, this article does not do a detailed discussion.

Python Performance Optimization Tool

In addition to improving algorithms and choosing the right data structures, Python performance optimization has several key techniques, such as rewriting key Python code parts to C extension modules, or choosing an interpreter that is more optimized for performance, which is called the Optimization tool in this article. Python has a lot of its own optimization tools, such as Psyco,pypy,cython,pyrex, and so on, these optimization tools are different, this section selected several to introduce.

Psyco

Psyco is a just-in-time compiler, it can improve the performance without changing the source code, Psyco to compile the operation into a somewhat optimized machine code, its operation is divided into three different levels, there are "runtime", "compile-time" and "virtual-time" variables. and to increase and decrease the level of variables as needed. Run-time variables are just raw bytecode and object structures that are processed by the regular Python interpreter. Once Psyco compiles the operation into a machine code, the compile-time variable is represented in the machine register and in the memory location where it can be accessed directly. At the same time Python can cache the compiled machine code for future reuse, which saves a little time. But Psyco also has its drawbacks, which itself runs on a larger memory footprint. Currently Psyco is not supported in python2.7 and is no longer available for maintenance and updates, and it is interesting to refer to http://psyco.sourceforge.net/

PyPy

PyPy represents "python-implemented Python", but in fact it is implemented using a Python subset called Rpython, capable of turning Python code into code for languages such as C,. NET, Java, and so on. The PyPy integrates an instant (JIT) compiler. Unlike many compilers and interpreters, it does not care about the lexical analysis and syntax trees of Python code. Because it is written in the Python language, it uses the Python language's Code Object directly. The code Object is the representation of the Python bytecode, meaning that pypy directly parses the bytecode corresponding to the Python code, which is not stored in characters or in some binary format in a file, but in a Python runtime environment. The current version is 1.8. Support for different platform installation, install PyPy on Windows need to download Https://bitbucket.org/pypy/pypy/downloads/pypy-1.8-win32.zip first, then extract to the relevant directory, and add the extracted path to the environment variable path. Run pypy at the command line, if the following error occurs: "MSVCR100.dll is not found, so this application does not start, reinstalling the application may fix the problem", you also need to download the VS. Runtime libraries on Microsoft's official website to resolve the issue. Specific address is http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5555

After successful installation, run PyPy on the command line and the output is as follows:

Copy the Code code as follows:


C:\Documents and Settings\administrator>pypy
Python 2.7.2 (0e28b379d8b3, Feb 09 2012, 18:31:47)
[PyPy 1.8.0 with MSC v.1500 + bit] on Win32
Type "Help", "copyright", "credits" or "license" for more information.
And now for something completely different: "PyPy is vast, and contains
Multitudes "
>>>>


In the example of the loop in Listing 5, using Python and pypy to run separately, the results are as follows:
Copy CodeThe code is as follows:


C:\Documents and settings\administrator\ Desktop \doc\python>pypy loop.py
Total run Time:
8.42199993134
C:\Documents and settings\administrator\ Desktop \doc\python>python loop.py
Total run Time:
106.391000032


Can be seen using PyPy to compile and run the program, its efficiency greatly improved.
Cython
Cython is a Python-implemented language that can be used to write Python extensions, and the libraries it writes out can be loaded with import, faster than Python's. Cython can load Python extensions (such as import math), or load the header file of C's library (for example: cdef extern from "math.h"), or it can be used to write Python code. Rewrite critical sections into C expansion modules
installation of Linux Cpython:
First step: Download
Copy CodeThe code is as follows:


[root@v5254085f259 cpython]# wget-n Http://cython.org/release/Cython-0.15.1.zip
--2012-04-16 22:08:35--Http://cython.org/release/Cython-0.15.1.zip
Resolving cython.org ... 128.208.160.197
Connecting to cython.org|128.208.160.197|:80 ... Connected.
HTTP request sent, awaiting response ... OK
length:2200299 (2.1M) [Application/zip]
Saving to: ' Cython-0.15.1.zip '
100%[======================================>] 2,200,299 1.96m/s in 1.1s
2012-04-16 22:08:37 (1.96 mb/s)-' cython-0.15.1.zip ' saved [2200299/2200299]


Step Two: Unzip
Copy CodeThe code is as follows:


[root@v5254085f259 cpython]# Unzip-o Cython-0.15.1.zip


Step Three: Install
Copy CodeThe code is as follows:


Python setup.py Install


When the installation is complete, enter Cython directly and the installation is successful if the following appears.
Copy CodeThe code is as follows:


[root@v5254085f259 cython-0.15.1]# Cython
Cython (http://cython.org) is a compiler for code written in the
Cython language. Cython is based on Pyrex by Greg Ewing.
Usage:cython [Options] sourcefile. {Pyx,py} ...
Options:
-V,--version Display version number of Cython compiler
-L,--create-listing Write error messages to a listing file
-I,--include-dir Search for include files in named directory
(Multiple include directories is allowed).
-O,--output-file specify name of generated C file
-T,--timestamps only compile newer source files
-F,--force Compile all source files (Overrides implied-t)
-Q,--quiet Don ' t print module names in recursive mode
-V,--verbose is verbose, print file names on multiple compil ation
-P,--embed-positions If specified, the positions in Cython files of each
function definition is embedded in its docstring.
--cleanup
Release interned objects on the Python exit, for memory debugging.
Level indicates aggressiveness, the default 0 releases nothing.
-W,--working
Sets the working directory for Cython (the directory modules is searched from)
--GDB Output Debug information for Cygdb
-D,--no-docstrings
Strip docstrings from the compiled module.
-A,--annotate
Produce a colorized HTML version of the source.
--line-directives
Produce #line directives pointing to the. Pyx Source
--cplus
Output a C + + rather than C file.
--embed[=]
Generate a main () function that embeds the Python interpreter.
-2 Compile based on Python-2 syntax and code seman tics.
-3 Compile based on Python-3 syntax and code seman tics.
--fast-fail Abort The compilation on the first error
--warning-error,-werror make all warnings to errors
--warning-extra,-wextra Enable Extra warnings
-X,--directive =
[,


Installation on other platforms can be referenced in the documentation: http://docs.cython.org/src/quickstart/install.html
Unlike Python, the Cython code must be compiled first, compiled in two stages, compile the Pyx file into a. c file, and then compile the. c file into a. so file. There are several ways to compile:
• Compile by command line:
Suppose you have the following test code, which is compiled to a. c file using the command line.
Copy CodeThe code is as follows:


def sum (int a,int b):
Print A+b
[root@v5254085f259 test]# Cython Sum.pyx
[root@v5254085f259 test]# ls
Total 76
4 Drwxr-xr-x 2 root root 4096 Apr 17 02:45.
4 drwxr-xr-x 4 root root 4096 Apr 16 22:20.
4-rw-r--r--1 root root 17 02:45 1
60-rw-r--r--1 root root 55169 Apr 02:45 sum.c
4-rw-r--r--1 root root 02:45 sum.pyx


Use GCC to compile to. so files on Linux:
Copy CodeThe code is as follows:


[root@v5254085f259 test]# Gcc-shared-pthread-fpic-fwrapv-o2
-wall-fno-strict-aliasing-i/usr/include/python2.4-o sum.so SUM.C
[root@v5254085f259 test]# ls
Total 96
4 Drwxr-xr-x 2 root root 4096 Apr 17 02:47.
4 drwxr-xr-x 4 root root 4096 Apr 16 22:20.
4-rw-r--r--1 root root 17 02:45 1
60-rw-r--r--1 root root 55169 Apr 02:45 sum.c
4-rw-r--r--1 root root 02:45 sum.pyx
20-rwxr-xr-x 1 root root 20307 Apr 02:47 sum.so


• Compile with Distutils
To create a setup.py script:
Copy CodeThe code is as follows:


From Distutils.core Import Setup
From distutils.extension Import extension
From cython.distutils import Build_ext
Ext_modules = [Extension ("Sum", ["Sum.pyx"])]
Setup
name = ' Sum app ',
Cmdclass = {' Build_ext ': Build_ext},
Ext_modules = Ext_modules
)
[root@v5254085f259 test]# python setup.py build_ext--inplace
Running Build_ext
Cythoning Sum.pyx to SUM.C
Building ' sum ' extension
Gcc-pthread-fno-strict-aliasing-fpic-g-o2-dndebug-g-fwrapv-o3
-wall-wstrict-prototypes-fpic-i/opt/activepython-2.7/include/python2.7
-C Sum.c-o BUILD/TEMP.LINUX-X86_64-2.7/SUM.O
Gcc-pthread-shared BUILD/TEMP.LINUX-X86_64-2.7/SUM.O
-o/root/cpython/test/sum.so


After the compilation is complete, you can import into Python using:
Copy CodeThe code is as follows:


[root@v5254085f259 test]# python
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
Python 2.7.2 (default, June 24 2011, 11:24:26)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "Help", "copyright", "credits" or "license" for more information.
>>> import Pyximport; Pyximport.install ()
>>> Import Sum
>>> Sum.sum (1,3)


Here's a simple performance comparison:
Listing 9. Cython Test Code
Copy CodeThe code is as follows:


From time Import time
def test (int n):
cdef int a =0
cdef int I
For I in Xrange (n):
a+= I
Return a
T = time ()
Test (10000000)
Print "Total run time:"
Print time ()-T


Test results:
Copy CodeThe code is as follows:


[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "Help", "copyright", "credits" or "license" for more information.
>>> import Pyximport; Pyximport.install ()
>>> Import CTest
Total run Time:
0.00714015960693


Listing 10. Python Test Code
Copy CodeThe code is as follows:


From time Import time
def test (N):
A = 0;
For I in Xrange (n):
a+= I
Return a
T = time ()
Test (10000000)
Print "Total run time:"
Print time ()-T
[root@v5254085f259 test]# python test.py
Total run Time:
0.971596002579


From the above comparison, we can see that the speed of using Cython has increased by nearly 100 times.
Summarize
This paper discusses the common performance optimization techniques of Python and how to use the tools to locate and analyze the performance bottleneck of the program, and provides some tools or languages that can be optimized for performance, hoping to be more relevant.
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.