20 tricks for your Python to fly !, 20 Python flying!
The article I shared today contains a lot of text and code. It's absolutely cool, and it's cool. I mainly share 20 tips to improve Python performance and teach you how to say goodbye to slow Python. The author of the original article is an open-source, full-stack programmer who uses Python, Java, PHP, and C ++.
1. Optimize the time complexity of the algorithm
The time complexity of the algorithm has the greatest impact on the execution efficiency of the program. In Python, you can select an appropriate data structure to optimize the time complexity, for example, the time complexity of listing and set to search for an element is O (n) and O (1 ). Different scenarios have different optimization methods. In general, there are ideas such as division, branch boundaries, greed, and dynamic planning.
2. Reduce redundant data
For example, you can use a triangle or lower triangle to save a large symmetric matrix. The sparse matrix is used in the matrix where the 0 element occupies the majority.
3. Use copy and deepcopy properly
For data structures such as dict and list, direct value assignment uses the reference method. In some cases, the entire object needs to be copied. In this case, copy and deepcopy in the copy package can be used. The difference between the two functions is that the latter is recursively copied. Different efficiency: (the following programs run in ipython)
Import copya = range (100000) % timeit-n 10 copy. copy (a) # Run copy 10 times. copy (a) % timeit-n 10 copy. deepcopy (a) 10 loops, best of 3: 1.55 MS per loop10 loops, best of 3: 151 MS per loop
-N after timeit indicates the number of running times, and the last two rows correspond to the output of two timeit values, the same below. It can be seen that the latter is an order of magnitude slower.
4. Search for elements using dict or set
Python dict and set are implemented using hash tables (similar to the unordered_map in the c ++ 11 standard library). the time complexity of searching elements is O (1 ).
a = range(1000)s = set(a)d = dict((i,1) for i in a)%timeit -n 10000 100 in d%timeit -n 10000 100 in s10000 loops, best of 3: 43.5 ns per loop10000 loops, best of 3: 49.6 ns per loop
Dict efficiency is slightly higher (more space is occupied ).
5. Use generator and yield reasonably
%timeit -n 100 a = (i for i in range(100000))%timeit -n 100 b = [i for i in range(100000)]100 loops, best of 3: 1.54 ms per loop100 loops, best of 3: 4.56 ms per loop
The result of using () is a generator object. The memory space required is irrelevant to the size of the list, so the efficiency is higher. In specific applications, for example, set (I for I in range (100000) is faster than set ([I for I in range (100000.
However, when loop traversal is required:
%timeit -n 10 for x in (i for i in range(100000)): pass%timeit -n 10 for x in [i for i in range(100000)]: pass10 loops, best of 3: 6.51 ms per loop10 loops, best of 3: 5.54 ms per loop
The latter is more efficient, but if there is break in the loop, the advantage of using generator is obvious. Yield is also used to create generator:
def yield_func(ls): for i in ls: yield i+1def not_yield_func(ls): return [i+1 for i in ls]ls = range(1000000)%timeit -n 10 for i in yield_func(ls):pass%timeit -n 10 for i in not_yield_func(ls):pass10 loops, best of 3: 63.8 ms per loop10 loops, best of 3: 62.9 ms per loop
If the memory is not a very large list, a list can be directly returned, but the readability of yield is better (personal preference ).
Python2.x has built-in generator functions such as xrange function and itertools package.
6. optimized the cycle
Do not place the tasks that can be done outside the loop in the loop. For example, the following optimization can be doubled:
a = range(10000)size_a = len(a)%timeit -n 1000 for i in a: k = len(a)%timeit -n 1000 for i in a: k = size_a1000 loops, best of 3: 569 µs per loop1000 loops, best of 3: 256 µs per loop
7. optimized the order of multiple judgment expressions
For "and", we should put those that meet fewer conditions in front, and for "or", put those that meet more conditions in front. For example:
a = range(2000) %timeit -n 100 [i for i in a if 10 < i < 20 or 1000 < i < 2000]%timeit -n 100 [i for i in a if 1000 < i < 2000 or 100 < i < 20] %timeit -n 100 [i for i in a if i % 2 == 0 and i > 1900]%timeit -n 100 [i for i in a if i > 1900 and i % 2 == 0]100 loops, best of 3: 287 µs per loop100 loops, best of 3: 214 µs per loop100 loops, best of 3: 128 µs per loop100 loops, best of 3: 56.1 µs per loop
8. Use join to merge strings in the iterator
In [1]: %%timeit ...: s = '' ...: for i in a: ...: s += i ...:10000 loops, best of 3: 59.8 µs per loopIn [2]: %%timeits = ''.join(a) ...:100000 loops, best of 3: 11.8 µs per loop
The join method is increased by about five times for the accumulative method.
9. Select the appropriate character format
s1, s2 = 'ax', 'bx'%timeit -n 100000 'abc%s%s' % (s1, s2)%timeit -n 100000 'abc{0}{1}'.format(s1, s2)%timeit -n 100000 'abc' + s1 + s2100000 loops, best of 3: 183 ns per loop100000 loops, best of 3: 169 ns per loop100000 loops, best of 3: 103 ns per loop
In the three cases, % is the slowest, but the gap between the three is not big (both are very fast ). (I personally think % is the most readable)
10. Exchange Values of two variables without using intermediate Variables
In [3]: %%timeit -n 10000 a,b=1,2 ....: c=a;a=b;b=c; ....:10000 loops, best of 3: 172 ns per loopIn [4]: %%timeit -n 10000a,b=1,2a,b=b,a ....:10000 loops, best of 3: 86 ns per loop
Use a, B = B, a instead of c = a; a = B; B = c; to exchange the values of a and B, which can be more than doubled.
11. Use if is
a = range(10000)%timeit -n 100 [i for i in a if i == True]%timeit -n 100 [i for i in a if i is True]100 loops, best of 3: 531 µs per loop100 loops, best of 3: 362 µs per loop
If is True is twice faster than if = True.
12. Use cascading comparison x <y <z
x, y, z = 1,2,3%timeit -n 1000000 if x < y < z:pass%timeit -n 1000000 if x < y and y < z:pass1000000 loops, best of 3: 101 ns per loop1000000 loops, best of 3: 121 ns per loop
X <y <z is slightly more efficient and easier to read.
13. while 1 is faster than while True
def while_1(): n = 100000 while 1: n -= 1 if n <= 0: breakdef while_true(): n = 100000 while True: n -= 1 if n <= 0: breakm, n = 1000000, 1000000%timeit -n 100 while_1()%timeit -n 100 while_true()100 loops, best of 3: 3.69 ms per loop100 loops, best of 3: 5.61 ms per loop
While 1 is much faster than while true because in python2.x, True is a global variable rather than a keyword.
14. Use ** instead of pow
%timeit -n 10000 c = pow(2,20)%timeit -n 10000 c = 2**2010000 loops, best of 3: 284 ns per loop10000 loops, best of 3: 16.9 ns per loop
** It is faster than 10 times!
15. Use cProfile, cStringIO, and cPickle to implement the same functions(Corresponding to profile, StringIO, and pickle respectively) packages
import cPickleimport picklea = range(10000)%timeit -n 100 x = cPickle.dumps(a)%timeit -n 100 x = pickle.dumps(a)100 loops, best of 3: 1.58 ms per loop100 loops, best of 3: 17 ms per loop
Packages implemented by c are faster than 10 times!
16. Use the best deserialization Method
The efficiency of the eval, cPickle, and json methods for deserialization of corresponding strings is compared below:
import jsonimport cPicklea = range(10000)s1 = str(a)s2 = cPickle.dumps(a)s3 = json.dumps(a)%timeit -n 100 x = eval(s1)%timeit -n 100 x = cPickle.loads(s2)%timeit -n 100 x = json.loads(s3)100 loops, best of 3: 16.8 ms per loop100 loops, best of 3: 2.02 ms per loop100 loops, best of 3: 798 µs per loop
Json is nearly three times faster than cPickle, and more than 20 times faster than eval.
17. Use C Extension (Extension)
Currently, CPython (the most common implementation method of python) supports native APIs, ctypes, Cython, and cffi, their role is to allow Python programs to call Dynamic Link Libraries compiled by C, which have the following features:
CPython native API:By introducing the Python. h header file, the corresponding C program can directly use the Python data structure. The implementation process is relatively cumbersome, but it has a large scope of application.
Ctypes:It is usually used to encapsulate (wrap) C Programs, so that pure Python programs call functions in the dynamic link library (dll in Windows or so files in Unix. If you want to use a C Class Library in python, using ctypes is a good choice. In some benchmark tests, python2 + ctypes is the best performance method.
Cython:Cython is a superset of CPython, which is used to simplify the process of writing C extensions. The advantage of Cython is its concise syntax and its compatibility with numpy and other libraries that contain a large number of C extensions. Cython generally optimizes an algorithm or process in a project. In some tests, performance can be improved several hundred times.
Cffi:Cffi is the implementation of ctypes in pypy (see below). It is also compatible with CPython. Cffi allows you to use the C Class Library in python. You can directly write C code in python code and link to the existing C class library.
These optimization methods are generally used to optimize the performance bottleneck modules of existing projects, and greatly improve the running efficiency of the entire program with a few changes to the original project.
18. Parallel Programming
Because of GIL, it is difficult for Python to take full advantage of the advantages of multi-core CPU. However, you can implement the following parallel modes through the built-in module multiprocessing:
Multi-process:For CPU-intensive programs, you can use encapsulated classes such as Process and Pool of multiprocessing to implement parallel computing through multi-Process. However, because the communication costs in processes are relatively large, the efficiency of programs that require a large amount of data interaction between processes may not be greatly improved.
Multithreading:For IO-intensive programs, the multiprocessing. dummy module uses the multiprocessing interface to encapsulate threading, making multithreaded programming very easy (for example, you can use the Pool map interface, which is concise and efficient ).
Distributed:The Managers class in multiprocessing provides a way to share data with different processes, on which distributed programs can be developed.
You can choose one or more combinations of different business scenarios to optimize program performance.
19. Ultimate killer: PyPy
PyPy is Python implemented by using RPython (a subset of CPython). According to the benchmark test data on the official website, it is more than 6 times faster than Python implemented by CPython. The reason is that the Just-in-Time (JIT) compiler is used, that is, the dynamic compiler, which is different from the static Compiler (such as gcc and javac, it is used to optimize the data in the process of running the program. For historical reasons, GIL is still retained in pypy, but the ongoing STM project attempts to convert PyPy to Python without GIL.
If the python program contains C extensions (non-cffi), the JIT optimization effect will be greatly reduced, or even slower than CPython (than Numpy ). Therefore, PyPy is best suited to pure Python or cffi extensions.
With the improvement of projects such as STM and Numpy, we believe PyPy will replace CPython.
20. Use Performance Analysis Tools
Besides the timeit module used in ipython, cProfile is also available. CProfile is easy to use: python-m cProfile filename. py, filename. py is the name of the file to run the program. You can view the number of calls and running time of each function in the standard output, locate the performance bottleneck of the program, and then optimize it accordingly.
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.