20 strokes to get your Python flying!_python

Source: Internet
Author: User
Tags data structures generator pow in python advantage

Today to share this article, the text is not much, the main code. Absolute dry, fair trade, a major share of 20 tips to improve Python performance, and teach you how to say goodbye to slow python. Original author Kaiyuan, full stack programmer, using Python, Java, PHP and C + +.

1. Optimization algorithm time complexity

The time complexity of the algorithm has the greatest influence on the execution efficiency of the program, in Python, the time complexity can be optimized by selecting the appropriate data structure, such as the time complexity of the list and set to find an element is O (n) and O (1) respectively. Different scenarios have different ways to optimize, in general, there are generally divided into governance, branch boundaries, greed, dynamic planning and other ideas.

2. Reduce redundant data

such as the upper triangular or lower triangular way to save a large symmetric matrix. A sparse matrix representation is used in the 0-element matrix.

3. Reasonable use of copy and Deepcopy

For objects of data structures such as dict and lists, direct assignment uses the reference method. In some cases it is necessary to copy the entire object, and then you can use copy and deepcopy in the copy package, which is the difference between the two functions that are recursively replicated. Efficiency is not the same: (The following program runs in Ipython)

Import Copy
a = range (100000)
%timeit-n copy.copy (a) # run 10 times copy.copy (a)

%timeit-n copy.deepcopy (a) 
   10 loops, best 3:1.55 ms per loop
loops, best of 3:151 ms Per loop

The Timeit-n indicates the number of runs, and the last two lines correspond to the output of two Timeit, the same below. This shows that the latter is one order of magnitude slow.

4. Find elements using Dict or set

Python Dict and set are implemented using a hash table (similar to the C++11 Standard library unordered_map), and the time complexity of finding elements is O (1).

A = range (1000)
s = Set (a)
d = dict ((i,1) for I in a)
%timeit-n 10000-D
%timeit-n 10000-in s< c6/>10000 loops, best 3:43.5 ns/Loop
10000 loops, Best of 3:49.6 ns per loop

The dict is slightly more efficient (and takes up more space).

5. Reasonable use of generators (generator) and yield

%timeit-n A = (I for I in range (100000))
%timeit-n B = [I for I in range (100000)]
loops, best 3 : 1.54 ms per loop
loops, best of 3:4.56 ms Per loop

using () Gets a generator object, the required memory space is independent of the size of the list, so the efficiency is higher. For specific applications, such as set (I for I in range (100000)) is faster than set ([I for I in range (100000)]).

However, for situations that require circular traversal:

%timeit-n for x in (I for I in range (100000)): Pass
%timeit-n for x in [I for I in range (100000)]: Pass

10 Loops, Best of 3:6.51 ms Per loop
loops, best of 3:5.54 ms Per loop

The latter is more efficient, but if there is a break in the loop, the benefits of using generator are obvious. Yield is also used to create generator:

def yield_func (LS): for
  i in LS:
    yield i+1

def not_yield_func (LS): return
  [i+1 to I in ls]

ls = rang E (1000000)
%timeit-n to I in Yield_func (ls):p

%timeit-n to I in Not_yield_func (ls):p

PS, Best of 3:63.8 ms Per loop
loops, best of 3:62.9 ms Per loop

For a list that is not very large in memory, you can return a list directly, but the readability is yield better (human preference).
python2.x built-in generator functions have xrange functions, itertools packets and so on.

6. Optimization cycle

What you can do outside the loop is not within the loop, for example, the following optimizations can be as fast as one:

A = Range (10000)
size_a = Len (a)
%timeit-n 1000 for i-a:k = Len (a)
%timeit-n 1000 for i-in a:k = Size_a
1000 loops, best 3:569µs per loop
1000 loops, Best of 3:256µs per loop

7. Optimize the order containing multiple judgment expressions

For and, you should put less than enough in front, for or, put more than enough to meet the condition. Such as:

A = Range (%timeit-n) 
[I for I in a If < I < I/1000 < I <]
%timeit-n [I for I in a If 1000 < I < or M < I <]   
%timeit-n [I for I in a if I% 2 = 0 and i > 1900]
   %timeit-n [I for I in a If i > 1900 and i% 2 = 0]
loops, best 3:287µs per loop
loops, Best 3:214µs per loop
loops, Best of 3:128µs/loop
loops, Best of 3:56.1µs per loop

8. Using a join to merge the strings in the iterator

In [1]:%%timeit
  ...: s = '
  ...: For i in a:
  ...:     s = = i ...
  :
10000 loops, Best of 3:59.8µs per Loop

in [2]:%%timeit
s = '. Join (a) ...
  :
100000 loops, best 3:11.8µs per loop

Joins are about 5 times times more likely to increase in cumulative ways.

9. Choose the appropriate format character mode

S1, s2 = ' ax ', ' bx '

%timeit-n 100000 ' abc%s%s '% (S1, S2)%timeit-n 100000
' Abc{0}{1} '. Format (s1, S2)
%ti Meit-n 100000 ' abc ' + s1 + s2
100000 loops, best 3:183 ns per loop
100000 loops, Best of 3:169 ns per loop
100000 loops, best 3:103 ns per loop

In three cases, the% is the slowest, but the gap between the three is not big (very fast). (Personally think% of the best readability)

10. Exchange the value of two variables without the help of intermediate variables

In [3]:%%timeit-n 10000
  a,b=1,2 ...
  : c=a;a=b;b=c;
  ...:
10000 loops, Best of 3:172 ns/loop in

[4]:%%timeit-n 10000 a,b=1,2a,b=b,a
  ...:
10000 loo PS, best 3:86 ns per loop

Use A,b=b,a instead of c=a;a=b;b=c to exchange a,b values for up to 1 time times faster.

11. Use if is

A = Range (10000)
%timeit-n [I for I in a if i = = True]
%timeit-n [I for I in a if I am true]
100 Loops, best 3:531µs per loop
loops, Best of 3:362µs per loop

Use if is true to be nearly one times faster than if = = True.

12. Use cascading comparison x < Y < Z

X, y, z = 1,2,3

%timeit-n 1000000 if x < y < Z:pass
%timeit-n 1000000 if x < y and y < z:pass
   
    1000000 loops, best 3:101 ns/loop
1000000 loops, Best of 3:121 ns per loop


   

x < Y < Z efficiency is slightly higher and readability is better.

While 1 is faster than while True

Def while_1 ():
  n = 100000 while
  1:
    N-= 1
    if n <= 0:break

def while_true ():
  n = 100000
  whi Le True:
    N-= 1
    if n <= 0:break

m, n = 1000000, 1000000

%timeit-n-while_1 ()
%timeit-n 10 0 while_true ()
loops, best 3:3.69 ms per loop
loops, best of 3:5.61 ms Per loop

While 1 is much faster than while true because in python2.x, true is a global variable, not a keyword.

14. Use of * * rather than POW

%timeit-n 10000 c = POW (2,20)
%timeit-n 10000 c = 2**20

10000 loops, best 3:284 ns per loop
10000 Loo PS, best 3:16.9 ns per loop

* * * is faster than 10 times times!

15. Use C CProfile, Cstringio and cpickle to achieve the same function (respectively corresponding to profile, Stringio, pickle) package

Import cpickle
Import pickle
a = range (10000)
%timeit-n x = Cpickle.dumps (a)
%timeit-n x = Pickl E.dumps (a)
loops, best 3:1.58 ms per loop
loops, best of 3:17 ms Per loop

The package implemented by C, faster than 10 times times!

16. Using the best way to deserialize

The following compares Eval, Cpickle, and JSON methods for three of the efficiency of deserializing the corresponding string:

Import JSON
import cpickle
a = range (10000)
S1 = str (a)
s2 = Cpickle.dumps (a)
s3 = Json.dumps (A )
%timeit-n x = eval (S1)
%timeit-n x = cpickle.loads (s2)
%timeit-n x = json.loads (S3)
1 Loops, Best of 3:16.8 ms Per loop
loops, Best of 3:2.02 ms/Loop
loops, Best of 3:798µs per lo Op

You can see that JSON is nearly 3 times times faster than Cpickle, and is faster than eval by more than 20.

17. Use c extension (Extension)

There are currently cpython (the most common implementations of Python) native APIs, Ctypes,cython,cffi three ways, and their role is to enable Python programs to invoke the dynamic link library compiled by C, which is characterized by:

CPython Native API: by introducing the Python.h header file, the corresponding C program can use the Python data structure directly. The implementation process is relatively cumbersome, but has a relatively large scope of application.
ctypes: typically used to encapsulate (wrap) C programs, so that pure Python programs call functions in dynamic-link libraries (DLLs in Windows or so files in Unix). If you want to use a C-class library in Python, using ctypes is a good choice, with some benchmarks, python2+ctypes is the best way to perform.
Cython: Cython is a superset of CPython that simplifies the process of writing C extensions. The advantage of Cython is that the syntax is concise and can be well compatible with NumPy and other libraries that contain a large number of C extensions. The Cython makes the scene generally an optimization of an algorithm or process in a project. In some tests, you can have hundreds of times times the performance boost.
Cffi: cffi is ctypes in PyPy (see below) in the implementation, and also compatible with the CPython. Cffi provides a way to use the C class library in Python to write C code directly in Python code, while supporting linking to existing Class C libraries.
Using these optimization methods is generally aimed at the existing project performance bottleneck module optimization, can be a small number of changes in the original project, greatly improve the operation efficiency of the whole program.

18. Parallel Programming

Because of the Gil, Python is hard to take advantage of multi-core CPUs. However, there are several parallel modes that can be implemented through the built-in module multiprocessing:

Multi-process: for CPU-intensive programs, you can use packaged classes such as multiprocessing Process,pool to implement parallel computations in a multiple-process way. However, because of the high cost of communication in the process, the program efficiency that needs a lot of data interaction between processes may not be greatly improved.
Multithreading: for IO-intensive programs, the Multiprocessing.dummy module uses Multiprocessing interface encapsulation threading, making multithreaded programming easy (for example, using the pool's map interface , simple and efficient).
distributed:The Managers class in multiprocessing provides a way to share data in different processes, and a distributed program can be developed on this basis.
Different business scenarios can choose one or several of these combinations to achieve program performance optimization.

19. End-Stage large kill device: PyPy

PyPy is a python implemented with Rpython (a subset of CPython), which is 6 times times faster than the CPython implemented Python, based on the benchmark data from the official website. The quick reason is that the just-in-time (JIT) compiler, a dynamic compiler, differs from a static compiler, such as a gcc,javac, to optimize the data that is used to run the program. For historical reasons, the Gil is still in the pypy, but the ongoing STM project is trying to turn pypy into Python without Gil.

If a Python program contains a C extension (Cffi), the JIT optimization effect can be significantly reduced, or even slower than CPython (NumPy). So it's best to use pure python or Cffi extensions in PyPy.

With the improvement of stm,numpy and other projects, I believe PyPy will replace CPython.

20. Using Performance analysis tools

In addition to the Timeit modules used in Ipython, there are cprofile. CProfile is also very simple to use: Python-m cProfile filename.py,filename.py is the file name to run the program, you can see in the standard output of each function is called the number of times and the running time, so as to find the performance bottleneck of the program, Then it can be targeted to optimize.

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.