20 Tips for Python performance optimization

Last Update:2014-10-25 Source: Internet

Author: User

Tags pow

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Optimization of the algorithm time complexity

The time complexity of the algorithm has the greatest impact on the execution efficiency of the program, and in Python it is possible to optimize the time complexity by selecting the appropriate data structure, such as the time complexity of the list and set to find an element, respectively O (n) and O (1). Different scenarios have different optimization methods, in general, there are divided, branch boundaries, greed, dynamic planning and other ideas.

2. Reduce redundant data
To save a large symmetric matrix in a triangular or triangular way. The sparse matrix representation is used in the matrix of the 0 element majority.

3. Proper use of copy and Deepcopy

For objects such as data structures such as dict and lists, the direct assignment uses a reference method. In some cases, you need to copy the entire object, you can use copy and deepcopy in the copy package, the difference between the two functions is recursive replication. The efficiency is not the same: (The following program runs in Ipython)
Import Copy
A = range (100000)
%timeit-n Copy.copy (a) # run 10 times copy.copy (a)%timeit-n copy.deepcopy (a) loops, best of 3:1.55 ms per LOOP10 loops , Best of 3:151 ms Per loop
The Timeit followed by-n indicates the number of runs, and the next two lines correspond to the output of two Timeit. This shows that the latter is one order of magnitude slower.

4. Find elements using Dict or set

Python dict and set are implemented using a hash table (similar to unordered_map in the C++11 Standard library), and the time complexity of finding elements is O (1)
A = range (1000)
s = Set (a)
D = dict ((i,1) for I in a)
%timeit-n 10000 in D
%timeit-n 10000 s10000 loops, Best of 3:43.5 ns per loop10000 loops, best of 3:49.6 ns per loop
The efficiency of the dict is slightly higher (more space occupied).

5. Rational use of generators (generator) and yield
%timeit-n A = (I for I in range (100000))
%timeit-n B = [I for I in range (100000)]100 loops, Best of 3:1.54 ms per loop100 loops, best of 3:4.56 ms Per loop
using () is a generator object that requires no more memory space than the size of the list, so the efficiency is higher. For specific applications, such as set (I for I in range (100000)) is faster than set ([I for I in range (100000)]).

But for situations where loop traversal is required:
%timeit-n x in (I-I in range (100000)): Pass
%timeit-n x in [i-I in range (100000)]: Pass10 loops, Best of 3:6.51 ms per LOOP10 loops, best of 3:5.54 Ms P ER Loop
The latter is more efficient, but if there is a break in the loop, the benefits of using generator are obvious. Yield is also used to create generator:
def yield_func (LS):
For i in Ls:yield i+1def not_yield_func (LS):
return [i+1 for i in LS]

ls = range (1000000)
%timeit-n for I in Yield_func (LS):p ass%timeit-n-I in-not_yield_func (LS):p ass10 loops, Best of 3:63.8 ms per L OOP10 loops, Best of 3:62.9 ms Per loop
For a list that is not very large in memory, you can return a list directly, but the readability yield is better (a person's preference).

Python2.x has xrange function, Itertools package, etc. built-in generator function.

6. Optimize loops

What you can do outside of the loop does not put in the loop, such as the following optimization can be faster:
A = range (10000)
Size_a = Len (a)
%timeit-n, I in a:k = Len (a)
%time It-n a:k = size_a1000 loops, Best of 3:569µs per loop1000 loops, Best of 3:256µs per loop
7. Optimization consists of multiple The order of the broken expression

The

for and, should be satisfied with a small number of conditions in front, or, to meet the conditions of more than put in front. For example:
A = range (a)
%timeit-n [i-I in a If < I </+ + < I < +]
%timeit-n [i For I in a If < I < + + < I < +]
%timeit-n [I for I in a if I% 2 = = 0 and i > 1900 ]
%timeit-n [I for I in a If i > 1900 and i% 2 = = 0]100 loops, Best of 3:287µs per loop100 loops, best of 3: 214µs per loop100 loops, Best of 3:128µs per loop100 loops, Best of 3:56.1µs per loop
8. Use join to merge strings in iterators
in [1 ]:%%timeit
...: s = '
...: For i in a:
...: s + = i
...: 10000 loops, Best of 3:59.8µs per Loopin [2]:% %timeit
S = ". Join (a)
...: 100000 loops, the best of the 3:11.8µs per loop
Join is about 5 times times higher for the cumulative way.

9. Select the appropriate format character method
S1, s2 = ' ax ', ' bx '%timeit-n 100000 ' abc%s%s '% (S1, S2)
%timeit-n 100000 ' abc{0}{1} '. Format (s1, S2)
%timeit-n 100000 ' abc ' + s1 + s2100000 loops, best of 3:183 ns per loop100000 loops, best of 3:169 ns per loop100000 lo OPS, Best of 3:103 ns per loop
Of the three cases, the% is the slowest, but the difference between the three is not large (both very fast). (Individuals feel the best readability of%)

10. Exchange values of two variables without using intermediate variables
In [3]:%%timeit-n 10000
a,b=1,2
....: c=a;a=b;b=c;
....: 10000 loops, Best of 3:172 ns per Loopin [4]:%%timeit-n 10000a,b=1,2a,b=b,a
....: 10000 loops, Best of 3:86 ns per loop
Use a,b=b,a instead of c=a;a=b;b=c; to exchange a A, a, a, 1 time times faster value.

11. Use if is
A = range (10000)
%timeit-n [i-I in a if i = = True]
%timeit-n [I for I in a if I are true]100 loops, best of 3:531µs per loop100 loops, Best of 3:362µs per loop
Use if is true nearly one times faster than if = true.

12. Using cascading comparisons x < y < Z
X, y, z = 1,2,3%timeit-n 1000000 if x < y < Z:pass
%timeit-n 1000000 if x < y and y < z:pass1000000 loops, best of 3:101 ns per loop1000000 loops, Best of 3:121 NS Per loop
x < Y < Z efficiency is slightly higher and more readable.

13.while 1 faster than while True
Def while_1 ():
n = 100000
While 1:
N-= 1
If n <= 0:breakdef while_true ():
n = 100000
While True:
N-= 1
If n <= 0:break m, n = 1000000, 1000000%timeit-n while_1 ()
%timeit-n while_true () loops, best of the 3:3.69 ms per loop100 loops, best of 3:5.61 ms Per loop
While 1 is much faster than while true because in python2.x, true is a global variable, not a keyword.

14. Use * * instead of POW
%timeit-n 10000 c = POW (2,20)
%timeit-n 10000 c = 2**2010000 loops, best of 3:284 ns per loop10000 loops, best of 3:16.9 ns per loop
* * is faster than 10 times times!

15. Use CProfile, Cstringio and cpickle to achieve the same function with C (respectively, corresponding to the profile, Stringio, pickle) of the package
Import Cpickleimport Pickle
A = range (10000)
%timeit-n x = Cpickle.dumps (a)
%timeit-n x = Pickle.dumps (a) loops, best of 3:1.58 ms per loop100 loops, best of 3:17 ms Per loop
C Implementation of the package, speed faster than 10 times times!

16. Use the best way to deserialize

The following compares the efficiency of Eval, Cpickle, json three for the corresponding string deserialization:
Import Jsonimport Cpickle
A = range (10000)
S1 = str (a)
S2 = Cpickle.dumps (a)
S3 = Json.dumps (a)
%timeit-n x = eval (S1)
%timeit-n x = cpickle.loads (s2)
%timeit-n x = json.loads (s3) loops, best of 3:16.8 ms per loop100 loops, best of 3:2.02 ms per loop100 loops, be St of 3:798µs per loop
The JSON is nearly 3 times times faster than the Cpickle, more than 20 faster than Eval.

17. Using C extension (Extension)

There are currently cpython (the most common way to implement Python) native API, Ctypes,cython,cffi three ways, their role is to make the Python program can invoke C compiled by the dynamic link library, the characteristics are:

CPython Native API: By introducing the Python.h header file, Python's data structure can be used directly in the corresponding C program. The implementation process is relatively cumbersome, but has a relatively large scope of application.

cTYPES: Typically used for encapsulating (wrap) C programs, allowing pure Python programs to invoke functions in a dynamic-link library (DLL in Windows or so files in Unix). If you want to use a C class library in Python already, using cTYPES is a good choice, and with some benchmarks, python2+ctypes is the best way to perform.

Cython:cython is a superset of CPython for simplifying the process of writing C extensions. The advantage of Cython is that the syntax is concise and can be well compatible with NumPy and other libraries that contain a large number of C extensions. The Cython scenario is typically optimized for an algorithm or process in the project. In some tests, you can have hundreds of times times the performance boost.

Cffi:cffi is ctypes in PyPy (see below) in the implementation of the same-in is also compatible with CPython. Cffi provides a way to use Class C libraries in Python, writing C code directly in Python code, and supporting links to existing C-class libraries.

Using these optimizations is generally optimized for existing project performance bottleneck modules, which can greatly improve the efficiency of the entire program in the case of minor changes to the original project.

18. Parallel Programming

Because of the Gil's presence, Python is difficult to take advantage of multicore CPUs. However, there are several parallel modes that can be implemented through the built-in module multiprocessing:

Multi-process: for CPU-intensive programs, you can use multiprocessing Process,pool and other packaged classes to implement parallel computing in a multi-process manner. However, because the communication cost in the process is relatively large, the efficiency of the program that requires a lot of data interaction between processes may not be greatly improved.

Multithreading: For IO-intensive programs, the Multiprocessing.dummy module uses multiprocessing's interface to encapsulate threading, making multithreaded programming very easy (such as the ability to use the pool's map interface for simplicity and efficiency).

Distributed: The managers class in multiprocessing provides a way to share data in different processes, on which a distributed program can be developed.
Different business scenarios can choose one or several of these combinations to achieve program performance optimization.

19. Final stage big kill device: PyPy

PyPy is a python implemented using Rpython (a subset of CPython), which is 6 times times faster than the CPython implementation of Python based on the benchmark data of the website. The reason for this is that the Just-in-time (JIT) compiler, a dynamic compiler, is different from a static compiler (such as GCC,JAVAC, etc.) and is optimized using data from the process that the program is running. The Gil is still in pypy for historical reasons, but the ongoing STM project attempts to turn PyPy into Python without Gil.

If the Python program contains a c extension (non-cffi), the JIT optimization effect will be greatly reduced, even slower than CPython (than NumPy). Therefore, it is best to use pure python or cffi extension in PyPy.

With the improvement of stm,numpy and other projects, I believe PyPy will replace CPython.

20. Using the Performance analysis tool

In addition to the Timeit modules used above in Ipython, there are cprofile. CProfile is also very simple to use: Python-m cProfile filename.py,filename.py is the file name to run the program, you can see in the standard output the number of times each function is called and the elapsed time, so as to find the program's performance bottleneck, It can then be optimized in a targeted manner.

From: Segmentfault
Links: http://bbs.tianya.cn/list-112764-1.shtml

20 Tips for Python performance optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More