How to improve python performance

Source: Internet
Author: User
This article mainly introduces several methods for improving python performance, with code reference examples. For more information, see Some solutions for improving python performance.

I. function call optimization (spatial span to avoid memory access)

The core of program optimization is to minimize the Operation Span, including the code execution time span and the memory space span.

1. sum the big data and use sum

a = range(100000)%timeit -n 10 sum(a)10 loops, best of 3: 3.15 ms per loop%%timeit  ...: s = 0  ...: for i in a:  ...:  s += i  ...:100 loops, best of 3: 6.93 ms per loop

2. sum small data to avoid using sum

% Timeit-n 1000 s = a + B + c + d + e + f + g + h + I + j + k # the data volume increases by 1000 loops faster than the hour, best of 3: 571 ns per loop % timeit-n 1000 s = sum ([a, B, c, d, e, f, g, h, I, j, k]) # The sum function is called for small data volumes, reducing the space efficiency by 1000 loops, best of 3: 669 ns per loop

Conclusion: the efficiency of big data sum is high, and the efficiency of small data sum is high.

II. elements for loop optimization (use stacks or registers to avoid access to memory)

For lst in [(1, 2, 3), (4, 5, 6)]: # Additional overhead pass

Avoid using indexes whenever possible.

for a, b, c in [(1, 2, 3), (4, 5, 6)]: # better  pass

It is equivalent to directly assigning values to each element.

def force(): lst = range(4) for a1 in [1, 2]:   for a2 in lst:     for a3 in lst:       for b1 in lst:         for b2 in lst:           for b3 in lst:             for c1 in lst:               for c2 in lst:                 for c3 in lst:                   for d1 in lst:                     yield (a1, a2, a3, b1, b2, b3, c1, c2, c3, d1)                      %%timeit -n 10for t in force():  sum([t[0], t[1], t[2], t[3], t[4], t[5], t[6], t[7], t[8], t[9]])10 loops, best of 3: 465 ms per loop%%timeit -n 10for a1, a2, a3, b1, b2, b3, c1, c2, c3, d1 in force():  sum([a1, a2, a3, b1, b2, b3, c1, c2, c3, d1])10 loops, best of 3: 360 ms per loop

III. generator optimization (Table Query instead of calculation)

Def force (start, end): # brute force password cracking program for I in range (start, end): now = I sublst = [] for j in range (10 ): sublst. append (I % 10) # large division overhead, greater than multiplication I // = 10 sublst. reverse () yield (tuple (sublst), now)

def force(): # better lst = range(5) for a1 in [1]:   for a2 in lst:     for a3 in lst:       for b1 in lst:         for b2 in lst:           for b3 in lst:             for c1 in lst:               for c2 in lst:                 for c3 in lst:                   for d1 in lst:                     yield (a1, a2, a3, b1, b2, b3, c1, c2, c3, d1)  

R0 = [1, 2] # readability and flexibility r1 = range (10) r2 = r3 = r4 = r5 = r6 = r7 = r8 = r9 = r1force = (a0, a1, a2, a3, a4, a5, a6, a7, a8, a9) for a0 in r0 for a1 in r1 for a2 in r2 for a3 in r3 for a4 in r4 for a5 in r5 for a6 in r6 for a7 in r7 for a8 in r8 for a9 in r9)

4. power operation optimization (pow (x, y, z ))

Def isprime (n): if n & 1 = 0: return False k, q = find_kq (n) a = randint (1, n-1) if pow (, q, n) = 1: # returns True for j in range (k): if pow (a, pow (2, j) * q, n) = n-1: # a ** (2 ** j) * q) % n return True return False

Conclusion: pow (x, y, z) is better than x ** y % z.

V. division optimization

In [1]: from random import getrandbits In [2]: x = getrandbits(4096) In [3]: y = getrandbits(2048) In [4]: %timeit -n 10000 q, r = pmod(x, y)10000 loops, best of 3: 10.7 us per loop In [5]: %timeit -n 10000 q, r = x//y, x % y10000 loops, best of 3: 21.2 us per loop

Conclusion: pmod is better than // and %.

6. optimize algorithm Time complexity 

The time complexity of the algorithm has the greatest impact on the execution efficiency of the program. in python, you can select an appropriate data structure to optimize the time complexity, for example, the time complexity of listing and set to search for an element is O (n) and O (1 ). Different scenarios have different optimization methods. In general, there are ideas such as division and governance, branch and demarcation, and greedy dynamic planning.

VII. rational use of copy and deepcopy

For data structures such as dict and list, direct value assignment uses the reference method. In some cases, the entire object needs to be copied. in this case, copy and deepcopy in the copy package can be used. The difference between the two functions is that deepcopy is a recursive copy. Different efficiency:

In [23]: import copyIn [24]: %timeit -n 10 copy.copy(a)10 loops, best of 3: 606 ns per loopIn [25]: %timeit -n 10 copy.deepcopy(a)10 loops, best of 3: 1.17 us per loop

-N after timeit indicates the number of running times, and the last two rows correspond to the output of two timeit values, the same below. It can be seen that the latter is an order of magnitude slower.

An example of copy:

>>> lists = [[]] * 3>>> lists[[], [], []]>>> lists[0].append(3)>>> lists[[3], [3], [3]]

This is the case. [[] is a list containing only one element in an empty list. Therefore, all three elements of [[] * 3 are (pointing) this empty list. Modify the lists list for any element. High modification efficiency.

8. use dict or set to find elements

Python dictionaries and collections are implemented using hash tables (similar to the c ++ Standard Library unordered_map). The time complexity of searching elements is O (1 ).

In [1]: r = range (10 ** 7) In [2]: s = set (r) # occupies 588 MB of memory In [3]: d = dict (I, 1) for I in r) # Occupy 716 MB of memory In [4]: % timeit-n 10000 (10 ** 7) -1 in r10000 loops, best of 3: 291 ns per loopIn [5]: % timeit-n 10000 (10 ** 7)-1 in s10000 loops, best of 3: 121 ns per loopIn [6]: % timeit-n 10000 (10 ** 7)-1 in d10000 loops, best of 3: 111 ns per loop

Conclusion: the memory usage of the set is the minimum, and the dict running time is the lowest.

9. Reasonable Use (generator) and yield (memory saving)

In [1]: % timeit-n 10 a = (I for I in range (10 ** 7) # The generator usually traverses 10 loops, best of 3: 933 ns per loopIn [2]: % timeit-n 10 a = [I for I in range (10 ** 7)] 10 loops, best of 3: 916 MS per loopIn [1]: % timeit-n 10 for x in (I for I in range (10*7): pass10 loops, best of 3: 749 MS per loopIn [2]: % timeit-n 10 for x in [I for I in range (10*7)]: pass10 loops, best of 3: 1.05 s per loop

Conclusion: use the generator to traverse data.

The above are some of the solutions for improving python performance. you can refer to them for further supplements.

For more articles on how to improve python performance, refer to PHP Chinese network!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.