First move: Snake 7-inch: locate the bottleneck
First, the first step is to locate the bottleneck. For example, a function can be optimized from 1 second to 0.9 seconds, and another function can be optimized from 1 minute to 30 seconds. If the cost is the same, in addition, only one time limit can be set. Which one can be used? Choose the second one based on the short board principle.
An experienced programmer will hesitate to wait? Function? So, how many calls should we consider? If the first function needs to be called 100000 times in the whole program, and the second function is called once in the whole program, this is not necessary. For example, I want to explain that the program bottleneck may not be visible at a glance. In the above selection, you should feel like a programmer. In most cases: A function optimized from one minute to 30 seconds is easier to capture than a function optimized from one second to 0.9 seconds, because there is a lot of room for improvement.
So, after talking so much nonsense, we offer the first move, profile. This is a tool that comes with python to locate program bottlenecks! Although it provides three options, profile, cProfile, and hotshot. It is also divided into built-in and external. However, I personally think that an external cProfile is sufficient. The idea is as follows:
Python-m profile. py
The results of this trick will output a series of things, such as the number of calls to the function, the total time, the amount of time consumed by the sub-function of the function, and the amount of time spent each time. Well, a picture is better than a thousand words:
Filename: lineno (function): File Name: Row (function name)
Ncballs: This product has been called several times.
Tottime: the total time spent on this product, that is, the total time spent by the inner function team.
Percall: Average time spent on each call. The tottime is divided by ncils.
Cumtime: the total cost of all its internal functions
Percall: similar to the preceding percall, but cumtime is divided by ncils.
Find the point that deserves optimization, and then do it.
Second TRICK: one snake Zen: Only one trick
I remember when I first came into contact with Python, a senior student told me that Python has an amazing ideal. It hopes that everyone who uses it can write exactly the same program. Python Zen cloud:
There shoshould be one -- and preferably only one -- obvious way to do it
Therefore, the Python professional Zen masters provide some common functions for writing only one statements. I read the legendary PythonWiKi: PerformanceTips and summarized several "do not have soy sauce purple" and "want soy sauce purple 」.
When merging strings, do not paste purple:
s = "" for substring in list: s += substring
Soy Sauce purple:
s = "".join(slist)
When formatting a string, do not make it purple:
out = "
Soy Sauce purple:
out = "
You do not need to use loops, such as soy sauce:
newlist = [] for word in oldlist: newlist.append(word.upper())
Soy Sauce purple:
newlist = map(str.upper, oldlist)
Or sauce purple:
newlist = [s.upper() for s in oldlist]
Dictionary initialization, commonly used:
wdict = {} for word in words: if word not in wdict: wdict[word] = 0 wdict[word] += 1
If there are too many duplicate words, you can consider using the soy sauce purple mode to save a lot of judgment:
wdict = {} for word in words: try: wdict[word] += 1 except KeyError: wdict[word] = 1
Reduce the number of function calls as much as possible and use an internal loop instead. For example, do not use soy sauce:
x = 0 def doit1(i): global x x = x + i list = range(100000) t = time.time() for i in list: doit1(i)
Soy Sauce purple:
x = 0 def doit2(list): global x for i in list: x = x + i list = range(100000) t = time.time() doit2(list)
Third move: Snake sniper: High-Speed Search
This is partly from IBM: The Python code performance optimization technique. The highest level of search algorithm is the complexity of O (1. That is, Hash Table. I learned some data structures during my undergraduate course. I know that the Python list is implemented using a method similar to a linked list. If the list is too large, it is very inefficient to use if X in list_a for search and judgment in numerous items.
Python tuple is rarely used and is not commented on. The other two are set and dict. The two are implementation methods similar to Hash Table.
So try not to make soy sauce purple:
k = [10,20,30,40,50,60,70,80,90] for i in xrange(10000): if i in k: #Do something continue
Soy Sauce purple:
``` k = [10,20,30,40,50,60,70,80,90] k_dict = {i:0 for i in k}
First convert list to dictionary
for i in xrange(10000): if i in k_dict: #Do something continue ```
Look for the intersection of list, do not sauce purple:
list_a = [1,2,3,4,5]list_b = [4,5,6,7,8]list_common = [a for a in list_a if a in list_b]
Soy Sauce purple:
list_a = [1,2,3,4,5]list_b = [4,5,6,7,8]list_common = set(list_a)&set(list_b)
Step 4: Snake and snake ...... : I can't think of any other small Tips.
Variable exchange does not require intermediate variables: a, B = B, a (there is a trap here, so far the memory is profound: True, False = False, True)
If Python2.x is used, xrange is used instead of range. If Python3.x is used, range is already xrange, and xrange is already available. Xrange does not generate a list like range, but an iterator to save memory.
You can use x> y> z to replace x> y and y> z. Higher efficiency and better readability. Of course, theoretically, x> y
Add (x, y) is generally faster than a + B? I am skeptical about this. After an experiment, add cannot be used directly. import operator is required. Second, my experiment results indicate that add (x, y) is completely less fast than a + B, it also sacrifices readability.
While 1 is indeed a little faster than while True. I did two experiments, about 15% faster.
Fifth TRICK: No snake wins no snake: Performance outside the code
Besides the Code, except for the hardware, it is the compiler. Here we strongly recommend pypy. Pypy is an instant compiler called just-in-time. The characteristic of this compiler is to compile and run a sentence, which is different from the static compiler. I can see a very vivid metaphor in zhihu:
Assuming that you are a director, static compilation is to let the actors thoroughly understand the entire script, and then perform the performances for an hour in a row. Dynamic compilation is to let the actors perform the performances for two minutes, then think about it, read the script again, and then perform the performances for two minutes ......
Dynamic compilation and static compilation have their own advantages. You can see whether you are playing a movie or drama.
In addition, Cython can build some C code in python. I use very few, but the key point is indeed effective.