Introduction to Python performance analysis tools
Introduction to performance analysis and tuning tools
There will always be a time when you want to improve the program execution efficiency, want to see which part of the time consumed has become a bottleneck, and want to know the memory and CPU usage during the program running. At this time, you will need some methods to analyze and optimize the program performance.
By Context Manager
You can use the context manager to implement a timer by yourself. refer to the previous introduction to timeit, you can use the _ enter _ and _ exit _ methods of classes to time the managed functions, such:
# Timer. pyimport timeclass Timer (object): def _ init _ (self, verbose = False): self. verbose = verbose def _ enter _ (self): self. start = time. time () return self def _ exit _ (self, * args): self. end = time. time () self. secs = self. end-self. start self. msecs = self. secs * 1000 # millisecond if self. verbose: print 'elapsed time: % FMs' % self. msecs
The usage is as follows:
from timer import Timerwith Timer() as t: foo()print "=> foo() spends %s s" % t.secs
By Decorator
However, I think the decoration is more elegant.
import timefrom functools import wrapsdef timer(function): @wraps(function) def function_timer(*args, **kwargs): t0 = time.time() result = function(*args, **kwargs) t1 = time.time() print ("Total time running %s: %s seconds" % (function.func_name, str(t1-t0)) ) return result return function_timer
It is easy to use:
@timerdef my_sum(n): return sum([i for i in range(n)])if __name__ == "__main__": my_sum(10000000)
Running result:
➜ python profile.pyTotal time running my_sum: 0.817697048187 seconds
Built-in time command
Example:
➜ time python profile.pyTotal time running my_sum: 0.854454040527 secondspython profile.py 0.79s user 0.18s system 98% cpu 0.977 total
The above results show that the execution of the script consumes 0.79sCPU time and 0.18 seconds for kernel function execution. The total time is 0.977s.
Total time-(user time + system time) = the time consumed when the input and output and other tasks are executed by the system
Python timeit module
It can be used for benchmark, so that you can easily repeat the number of times a program is executed and check that the program can run multiple blocks. For more information, see the previous article.
CProfile
For more information, see the example with annotations.
# Coding = utf8def sum_num (max_num): total = 0 for I in range (max_num): total + = I return totaldef test (): total = 0 for I in range (40000): total + = I t1 = sum_num (100000) t2 = sum_num (200000) t3 = sum_num (300000) t4 = sum_num (400000) t5 = sum_num (500000) test2 () return totaldef test2 (): total = 0 for I in range (40000): total + = I t6 = sum_num (600000) t7 = sum_num (700000) return totalif _ name _ = "_ main _": import cProfile # print the analysis result directly to the console # cProfile. run ("test ()") # save the analysis result to the file # cProfile. run ("test ()", filename = "result. out ") # add the sorting method cProfile. run ("test ()", filename = "result. out ", sort =" cumulative ")
CProfile saves the analysis results to the result. out file, but stores them in binary format. if you want to view them directly, use the pstats provided for viewing.
Import pstats # create Stats object p = pstats. stats ("result. out ") # strip_dirs (): Remove irrelevant path information # sort_stats (): sorting, supported in the same way as above # print_stats (): print analysis results, you can specify the first few lines to print # and run cProfile directly. run ("test ()") returns the same p. strip_dirs (). sort_stats (-1 ). print_stats () # Sort by function name. only the information of the first three rows of functions is printed. the parameter can also be a decimal number, indicating the function information of the first few percent p. strip_dirs (). sort_stats ("name "). print_stats (3) # Sort By run time and function name p. strip_dirs (). sort_stats ("cumulative", "name "). print_stats (0.5) # If you want to know which functions have called sum_nump.print_callers (0.5, "sum_num") # check which functions are called in the test () function. print_callees ("test ")
Take an output example to check which functions are called by test:
➜ python python profile.py Random listing order was used List reduced from 6 to 2 due to restriction <'test'>Function called... ncalls tottime cumtimeprofile.py:24(test2) -> 2 0.061 0.077 profile.py:3(sum_num) 1 0.000 0.000 {range}profile.py:10(test) -> 5 0.073 0.094 profile.py:3(sum_num) 1 0.002 0.079 profile.py:24(test2) 1 0.001 0.001 {range}
Profile. Profile
CProfile also provides customizable classes for more detailed analysis. For more information, see the document.
Format: class profile. Profile (timer = None, timeunit = 0.0, subcils = True, builtins = True)
The following example is from the official document:
import cProfile, pstats, StringIOpr = cProfile.Profile()pr.enable()# ... do something ...pr.disable()s = StringIO.StringIO()sortby = 'cumulative'ps = pstats.Stats(pr, stream=s).sort_stats(sortby)ps.print_stats()print s.getvalue()
Lineprofiler
Lineprofiler is a tool for row-by-row performance analysis of functions, see github project description, address: https://github.com/rkern/line...
Example
# Coding = utf8def sum_num (max_num): total = 0 for I in range (max_num ): total + = I return total @ profile # Add @ profile to indicate which function to analyze def test (): total = 0 for I in range (40000 ): total + = I t1 = sum_num (10000000) t2 = sum_num (200000) t3 = sum_num (300000) t4 = sum_num (400000) t5 = sum_num (500000) test2 () return totaldef test2 (): total = 0 for I in range (40000): total + = I t6 = sum_num (600000) t7 = sum_num (700000) return totaltest ()
Run the kernprof command to inject the analysis. the running result is as follows:
➜ kernprof -l -v profile.pyWrote profile results to profile.py.lprofTimer unit: 1e-06 sTotal time: 3.80125 sFile: profile.pyFunction: test at line 10Line # Hits Time Per Hit % Time Line Contents============================================================== 10 @profile 11 def test(): 12 1 5 5.0 0.0 total = 0 13 40001 19511 0.5 0.5 for i in range(40000): 14 40000 19066 0.5 0.5 total += i 15 16 1 2974373 2974373.0 78.2 t1 = sum_num(10000000) 17 1 58702 58702.0 1.5 t2 = sum_num(200000) 18 1 81170 81170.0 2.1 t3 = sum_num(300000) 19 1 114901 114901.0 3.0 t4 = sum_num(400000) 20 1 155261 155261.0 4.1 t5 = sum_num(500000) 21 1 378257 378257.0 10.0 test2() 22 23 1 2 2.0 0.0 return total
There is a lot of room for optimization when the values of hits (execution times) and time (time consumed) are high.
Memoryprofiler
Similar to the "lineprofiler" module for row-based analytics program memory usage. Github: https://github.com/fabianp/me .... Ps: install psutil to make analysis faster.
Similarly, the code in "lineprofiler" above is run python-m memory_profiler profile. py command to generate the following results:
➜ python -m memory_profiler profile.pyFilename: profile.pyLine # Mem usage Increment Line Contents================================================ 10 24.473 MiB 0.000 MiB @profile 11 def test(): 12 24.473 MiB 0.000 MiB total = 0 13 25.719 MiB 1.246 MiB for i in range(40000): 14 25.719 MiB 0.000 MiB total += i 15 16 335.594 MiB 309.875 MiB t1 = sum_num(10000000) 17 337.121 MiB 1.527 MiB t2 = sum_num(200000) 18 339.410 MiB 2.289 MiB t3 = sum_num(300000) 19 342.465 MiB 3.055 MiB t4 = sum_num(400000) 20 346.281 MiB 3.816 MiB t5 = sum_num(500000) 21 356.203 MiB 9.922 MiB test2() 22 23 356.203 MiB 0.000 MiB return total