If you select a scripting language, you have to endure its speed. this statement illustrates to some extent the shortcomings of python as a script, that is, the execution efficiency and performance are not ideal, especially on machines with poor performance, the speed of selecting a script language must be tolerated. this statement illustrates to some extent the shortcomings of python as a script, that is, the execution efficiency and performance are not ideal, especially on machines with poor performance. Therefore, it is necessary to optimize the code to improve the execution efficiency of the program. How to optimize Python performance is the main issue discussed in this article. This article involves common code optimization methods, use of performance optimization tools, and how to diagnose code performance bottlenecks. it is hoped that this article can be provided to Python developers for reference.
Common Python code optimization skills
Code optimization can make the program run faster. it makes the program run more efficiently without changing the program running result. according to the 80/20 principle, it usually takes 80% of the workload to implement program refactoring, optimization, expansion, and documentation-related tasks. Optimization usually involves two aspects: reducing the size of the code and improving the code running efficiency.
Improve algorithms and select appropriate data structures
A good algorithm plays a key role in performance. Therefore, the primary point of performance improvement is to improve the algorithm. Sort the time complexity of the algorithm in sequence:
O (1)-> O (lg n)-> O (n ^ 2)-> O (n ^ 3) -> O (n ^ k)-> O (k ^ n)-> O (n !)
Therefore, if the algorithm can be improved in terms of time complexity, the performance improvement is self-evident. However, the improvement of specific algorithms does not fall within the scope of this article. readers can refer to this information on their own. The following content will focus on the selection of data structures.
Dictionary and list)
The hash table is used in the Python dictionary, so the complexity of the query operation is O (1), while the list is actually an array. in the list, the query needs to traverse the entire list, the complexity is O (n), so the operation dictionaries such as searching and accessing members are faster than list.
Listing 1. code dict. py
from time import time t = time() list = ['a','b','is','python','jason','hello','hill','with','phone','test', 'dfdf','apple','pddf','ind','basic','none','baecr','var','bana','dd','wrd'] #list = dict.fromkeys(list,True) print list filter = [] for i in range (1000000): for find in ['is','hat','new','list','old','.']: if find not in list: filter.append(find) print "total run time:" print time()-t
The above code requires about 16.09 seconds to run. If you remove the comments # list = dict. fromkeys (list, True) from the row, convert the list into a dictionary and run it again. The time is about 8.375 seconds, which improves the efficiency by about half. Therefore, it is a good choice to use dict instead of list when multiple data members need to perform frequent searches or accesses.
Set and list)
The union, intersection, and difference operations of set are faster than the list iteration. Therefore, if the list intersection is involved, the union or difference problem can be converted to the set operation.
Listing 2. intersection of list:
from time import time t = time() lista=[1,2,3,4,5,6,7,8,9,13,34,53,42,44] listb=[2,4,6,9,23] intersection=[] for i in range (1000000): for a in lista: for b in listb: if a == b: intersection.append(a) print "total run time:" print time()-t
The running time of the above program is:
total run time: 38.4070000648
Listing 3. using set to calculate the intersection
from time import time t = time() lista=[1,2,3,4,5,6,7,8,9,13,34,53,42,44] listb=[2,4,6,9,23] intersection=[] for i in range (1000000): list(set(lista)&set(listb)) print "total run time:" print time()-t
After changing to set, the running time of the program is reduced to 8.75, which is more than 4 times higher and the running time is greatly shortened. You can use Table 1 for testing.
Table 1. Common set usage
Loop optimization
The principle for loop optimization is to minimize the amount of computation in the cycle process. if there are multiple cycles, we should try to refer the calculation of the inner layer to the previous layer. The following examples are used to compare the performance improvements brought about by loop optimization. In program List 4, if loop optimization is not performed, the approximate running time is about 132.375.
Listing 4. before cyclic optimization
from time import time t = time() lista = [1,2,3,4,5,6,7,8,9,10] listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01] for i in range (1000000): for a in range(len(lista)): for b in range(len(listb)): x=lista[a]+listb[b] print "total run time:" print time()-t
Now we will perform the following optimization: we will take the length calculation out of the loop, the range should be replaced by xrange, and the calculation of the third layer lista [a] will be mentioned to the second layer of the loop.
Listing 5. After loop optimization
from time import time t = time() lista = [1,2,3,4,5,6,7,8,9,10] listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01] len1=len(lista) len2=len(listb) for i in xrange (1000000): for a in xrange(len1): temp=lista[a] for b in xrange(len2): x=temp+listb[b] print "total run time:" print time()-t
The running time of the optimized program is reduced to 102.171999931. In listing 4, the number of lista [a] computations is 1000000*10*10, and the number of times calculated in the optimized code is 1000000*10. the number of computations is greatly reduced, therefore, the performance has been improved.
Make full use of the features of Lazy if-evaluation
In python, the conditional expression is lazy evaluation. that is to say, if the conditional expression if x and y exists, the value of the y expression will not be calculated if x is false. Therefore, this feature can be used to improve program efficiency to a certain extent.
Listing 6. use the features of Lazy if-evaluation
from time import time t = time() abbreviations = ['cf.', 'e.g.', 'ex.', 'etc.', 'fig.', 'i.e.', 'Mr.', 'vs.'] for i in range (1000000): for w in ('Mr.', 'Hat', 'is', 'chasing', 'the', 'black', 'cat', '.'): if w in abbreviations: #if w[-1] == '.' and w in abbreviations: pass print "total run time:" print time()-t
The running time of the program before optimization is about 8.84. if you use the comment line to replace the first if, the running time is about 6.17.
String optimization
The string object in python cannot be changed. Therefore, any string operations, such as concatenation or modification, will generate a new string object instead of based on the original string, therefore, this continuous copy will affect python performance to a certain extent. Optimizing strings is also an important aspect to improve performance, especially when there are many texts. String optimization mainly involves the following aspects:
Try to use join () instead of + for string connection. in code listing 7, it takes about 0.125 s to use + for string connection, and 0.016 s to use join. Therefore, join is faster than + in character operations. therefore, try to use join instead of +.
Listing 7. use join instead of join strings
from time import time t = time() s = "" list = ['a','b','b','d','e','f','g','h','i','j','k','l','m','n'] for i in range (10000): for substr in list: s+= substr print "total run time:" print time()-t
Avoid:
s = "" for x in list: s += func(x)
Instead, use:
slist = [func(elt) for elt in somelist] s = "".join(slist)
When you can use regular expressions or built-in functions to process strings, select built-in functions. Such as str. isalpha (), str. isdigit (), str. startswith ('X', 'yz'), str. endswith ('X', 'yz '))
Formatting character is faster than directly reading characters in series. Therefore, you must use
out = "%s%s%s%s" % (head, prologue, query, tail)
Avoid
out = "" + head + prologue + query + tail + ""
Use list comprehension and generator expression)
List parsing is more efficient than rebuilding a new list in a loop, so we can use this feature to improve the running efficiency.
from time import time t = time() list = ['a','b','is','python','jason','hello','hill','with','phone','test', 'dfdf','apple','pddf','ind','basic','none','baecr','var','bana','dd','wrd'] total=[] for i in range (1000000): for w in list: total.append(w) print "total run time:" print time()-t
Use list parsing:
for i in range (1000000): a = [w for w in list]
It takes about 17 s to run the above code directly. Instead, after list resolution is used, the running time is shortened to 9.29 s. It is nearly halved. The generator expression is the new content introduced in 2.4. The syntax is similar to list parsing. However, when processing large data volumes, the generator expression has obvious advantages and does not create a list, it only returns a generator, so it is more efficient. In the preceding example, the code a = [w for w in list] is changed to a = (w for w in list), and the running time is reduced to about 2.98 s.
Other optimization skills
If you need to exchange the values of two variables, use a, B = B, a instead of using the intermediate variable t = a; a = B; B = t;
>>> from timeit import Timer >>> Timer("t=a;a=b;b=t","a=1;b=2").timeit() 0.25154118749729365 >>> Timer("a,b=b,a","a=1;b=2").timeit() 0.17156677734181258 >>>
Xrange rather than range is used in the loop. xrange can save a lot of system memory, because xrange () generates only one integer element each call in the sequence. Range () directly returns the complete list of elements, which is unnecessary during loops. In python3, xrange no longer exists. in it, range provides an iterator that can traverse any range of length.
Use local variables to avoid the keyword "global. Python accesses local variables much faster than global variables, because this feature can be used to improve performance.
If done is not None than the statement if done! = None is faster, and readers can verify it by themselves;
In a time-consuming loop, you can change the function call to an inline method;
Use cascade to compare "x <y <z" instead of "x <y and y <z ";
While 1 is faster than while True (of course the latter is more readable );
The build in function is usually faster. add (a, B) is better than a + B.
Locate program performance bottlenecks
The premise of code optimization is that you need to know where the performance bottleneck is and where the main time for running the program is consumed. you can use some tools to locate complicated code, python has a variety of built-in performance analysis tools, such as profile, cProfile, and hotshot. Profiler is a set of python programs that can describe the performance when the program runs and provide various statistics to help you locate the performance bottleneck of the program. The Python Standard module provides three types of profilers: cProfile, profile, and hotshot.
The use of profile is very simple. you only need to import it before use. The specific example is as follows:
Listing 8. using profile for performance analysis
import profile def profileTest(): Total =1; for i in range(10): Total=Total*(i+1) print Total return Total if __name__ == "__main__": profile.run("profileTest()")
The program running result is as follows:
Figure 1. performance analysis results
The specific explanations of each output column are as follows:
Ncballs: number of function calls;
Tottime: specifies the total running time of the function, removing the running time of the function that calls the sub-function;
Percall: (the first percall) is equal to tottime/ncils;
Cumtime: indicates the time when the function and all its subfunctions call and run, that is, the time when the function starts to call and returns;
Percall: (the second percall) indicates the average time of a function running, which is equal to cumtime/nccalls;
Filename: lineno (function): The details of each function call;
To save the output as a log, you only need to add another parameter during the call. For example, profile. run ("profileTest ()", "testprof ").
If profile profiling data is saved as a binary file, you can use the pstats module to analyze text reports. it supports multiple forms of report output, is a more practical tool on the text interface. Easy to use:
import pstats p = pstats.Stats('testprof') p.sort_stats("name").print_stats()
The sort_stats () method can sort the split data and accept multiple sorting fields. for example, sort_stats ('name', 'file') will first sort the data by function name, then sort by file name. Common sorting fields include cils, time, and cumulative. In addition, pstats provides command line interaction tools. after running python-m pstats, you can learn more about the usage through help.
For large applications, it is very practical and intuitive to present performance analysis results in a graphical manner. common visual chemicals include Gprof2Dot, visualpytune, KCacheGrind, etc, you can check the official website on your own. This article will not discuss it in detail.