Python Performance analysis
Read Catalogue
- Introduction to Tuning
- Simple example code for the Python event-based performance analyzer
- Analysis results of the Linux Statistical Performance Analyzer oprofile (http://oprofile.sourceforge.net/news/):
- The importance of performance analysis
- Content of performance analysis
- Memory consumption and memory leaks
- Risk of premature optimization
- Complexity of running time
- Performance Analysis Best Practices
Back to top tuning introduction
What is performance analysis
Programs that are not optimized typically consume most of the CPU cycle (CPU cycles) on some subroutines (subroutine). Performance analysis is the relationship between the analysis code and the resources it is using.
For example, profiling can tell you how much CPU time an instruction takes, or how much memory the entire program consumes.
Performance analysis is done by using a tool called a Performance Analyzer (Profiler) that adjusts the source code of a program or binary executable (if one can get it).
Performance analysis software has two types of methodologies: event-based performance analysis (event-based profiling) and statistical performance analysis (statistical profiling).
There are several main programming languages that support this type of event-based performance analysis.
The JAVA:JVMTI (JVM Tools INTERFACE,JVM Tool Interface) provides hooks for performance analyzers to track events such as function calls, thread-related events, class loads, and so on.. NET: Like java. NET runtime provides event tracing functionality (Https://en.wikibooks.org/wiki/Intro-duction_to_Software_Engineering/Testing/Profiling#Methods_ of_data_gathering). Python: Developers can use Sys.setprofile functions to track events such as Python_[call|return|exception] or c_[call|return|exception].
The event-based Performance Analyzer (Event-based Profiler, also known as the trajectory Performance analyzer, tracing Profiler) works by collecting specific events during program execution.
These performance analyzers produce a lot of data. Basically, the more events they need to listen to, the greater the amount of data they will generate. This makes them less practical and is not the first choice when profiling a program.
However, when other performance analysis methods are not sufficient or inaccurate, they can be used as a final choice.
Back to the top Python simple sample code for event-based performance analyzer
Import sys def profiler (frame, event, arg): print ' Profiler:%r%r '% (event, Arg) Sys.setprofile (Profiler) #simpl E (and very ineficient) example of how to calculate the Fibonacci sequence for a number.def fib (n): if n = = 0: ret Urn 0 elif n = = 1: return 1 else: return fib (n-1) + fib (n-2) def fib_seq (n): seq = [] if N & Gt 0: seq.extend (Fib_seq (n-1)) seq.append (FIB (n)) return seq print fib_seq (2)
Execution Result:
View Code
The Statistical Performance Analyzer samples The program counters at fixed intervals (counter). This allows the developer to master the time spent by the target program on each function.
Because it samples the program counter, the data result is a statistical approximation of the true value. However, such software is sufficient to glimpse the performance details of the program being analyzed and to isolate the performance bottleneck.
It uses a sampling method (interrupted by the operating system), analyzes less data, and has less impact on performance.
Back to the top Linux statistical Performance Analyzer oprofile (http://oprofile.sourceforge.net/news/) analysis results:
Function name,file name,times encountered,percentage "func80000", "statistical_profiling.c", 30760,48.96% "func40000" , "Statistical_profiling.c", 17515,27.88% "func20000", "STATIC_FUNCTIONS.C", 7141,11.37% "func10000", "static_ FUNCTIONS.C ", 3572,5.69%" func5000 "," STATIC_FUNCTIONS.C ", 1787,2.84%" func2000 "," STATIC_FUNCTIONS.C ", 768,1.22% Func1500 "," statistical_profiling.c ", 701,1.12%" func1000 "," STATIC_FUNCTIONS.C ", 385,0.61%" func500 "," Statistical_ Profiling.c ", 194,0.31%
Below we use STATPROF for analysis:
Import Statprofdef Profiler (frame, event, arg): print ' Profiler:%r%r '% (event, Arg) #simple (and very ineficien T) example of how to calculate the Fibonacci sequence for a number.def fib (n): if n = = 0: return 0 elif n = = 1 : return 1 else: return fib (n-1) + fib (n-2) def fib_seq (n): seq = [] if n > 0: seq.exte nd (FIB_SEQ (n-1)) seq.append (FIB (n)) return seq Statprof.start () Try: print Fib_seq finally: statprof.stop () Statprof.display ()
Execution Result:
$ python test.py [0, 1, 1, 2, 3, 5, 8,,, 233, 377, 610, 987, 1597, 2584, 4181, 6765] % Cumu Lative self time seconds seconds name 100.00 0.01 0.01 test.py:15:fib 0.00 0.01 0.00 test.py:21:fib_seq 0.00 0.01 0.00 Test.py:20:fib_seq 0.00 0.01 0.00 test.py:27:<module>---Sample count:2total time:0.010000 seconds
Note that the above code we change the calculation fib_seq parameter from 2 to 20, because the execution time is too fast, statprof is not getting any information.
Back to top of the importance of performance analysis
Performance analysis is not something that every program has to do, especially for small software, not as much as the killer embedded software or the profiling program that is dedicated to presentation. Performance analysis takes time and is useful only when errors are found in the program. However, it is still possible to perform performance analysis before this, capturing potential bugs, which can save later program debugging time.
Why do we need performance analysis when we already have test-driven development, code review, pair programming, and other tools that make your code more reliable and predictable?
As the programming language we use is getting more advanced (we've evolved from assembler to JavaScript in a few years), we're increasingly not concerned with the underlying details of CPU cycles, memory configurations, CPU registers, and so on. New generations of programmers learn programming techniques in high-level languages because they are easier to understand and out of the box. But they are still abstractions of hardware and interaction with hardware. As this trend grows, new developers are increasingly not going to use performance analysis as a step in software development.
Today, you can get thousands of users by simply developing a software. If it is promoted through social networks, users may grow exponentially at once. Once the user volume surges, the program usually crashes, or becomes unusually slow and eventually abandoned by the customer.
The above scenario is obviously due to poor software design and a lack of extensibility architectures. After all, a server's limited memory and CPU resources can also be a bottleneck for software. However, another possible cause, which has been proven many times, is that our program has not been tested for stress. We have not considered the consumption of resources; we only guarantee that the tests have passed, and that we are bored.
Performance analysis can help us avoid project crashes, because it can fairly accurately show us how the program is running, regardless of the load. Therefore, this gives us a hint if, at a very low load, the software consumes 80% of the time on I/O operations through performance analysis. When a product is overloaded, a memory leak can occur. Performance analysis can provide us with enough evidence to uncover such pitfalls before the load is really too heavy.
Back to the top performance analysis content
Run time
If you have some experience with running programs (say you are a web developer and are using a network framework), it may be clear that the run time is not too long.
For example, a simple Web server that queries the database, responds to the results, and feeds back to the client takes 100 milliseconds altogether. However, if the program runs slowly and it takes 60 seconds to do the same thing, you have to think about performance analysis.
Import datetime Tstart = Nonetend = None def start_time (): global tstart Tstart = Datetime.datetime.now () def Get_delta (): global tstart tend = datetime.datetime.now () return Tend-tstart def fib (n): return N if n = = 0 or N = = 1 Else fib (n-1) + fib (n-2) def fib_seq (n): seq = [] if n > 0: seq.extend (Fib_seq (n-1)) s Eq.append (FIB (n)) return seq start_time () print "about-calculate the Fibonacci sequence for the number" DELTA1 = Get_delta () start_time () seq = fib_seq (+) Delta2 = Get_delta () print "Now we print the numbers:" Start_time () for N in seq:< C12/>print ndelta3 = Get_delta () print "====== Profiling results =======" print "time required to print a simple message:% (delta1) S "% locals () print" Time required to calculate Fibonacci:% (DELTA2) s "% locals () print" Time required to iterate an D Print the numbers:% (DELTA3) S "%locals () print" ====== ======= "
Execution Result:
123456789101112131415161718192021222324252627282930313233343536373839 |
$ python test.py
About to calculate the fibonacci sequence
for
the number 30
Now we print the numbers:
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
====== Profiling results =======
Time required to print a simple message: 0:00:00.000064
Time required to calculate fibonacci: 0:00:01.430740
Time required to iterate and print the numbers: 0:00:00.000075
====== =======
|
The visible calculation part is the most time consuming.
Discover bottlenecks
As long as you measure the running time of the program, you can move your attention to a slow-running link for performance analysis. The general bottleneck is made up of one or more of the following reasons:
* Heavy I/O operations, such as reading and parsing large files, executing database queries for long periods of time, invoking external services (such as HTTP requests), etc. * Now the memory leaks, consumes all the memory, causes the later program does not have the memory to perform normally. * Non-optimized code is frequently executed. * Can be cached when dense operations are not cached and consume a large amount of resources.
I/O associated code (file read/write, database query, etc.) is difficult to optimize because optimization has the potential to alter how the program performs I/O operations (usually the core function of the language is operating I/O). Conversely, it is easier to improve performance (not necessarily simple) by optimizing the code that calculates the association (such as a bad algorithm used by the program). This is because the code that optimizes the calculation association is the rewrite program.
Back to top memory consumption and memory leaks
Memory consumption is not just about how much memory the program uses, it should also consider the amount of memory that the control is using. Tracking program memory consumption is relatively simple. The most basic approach is to use the operating system's task Manager.
It displays a lot of information, including the amount of memory that the program consumes or the percentage of total memory consumed. Task Manager is also a good tool for checking CPU time usage.
In the top below, you will find a simple Python program (the one in front of it) that consumes almost all of the CPU (99.8%) and only uses 0.1% of the memory.
When the run process starts, memory consumption increases in a range. If the increase is found to be out of range and
After the increase in consumption has not come down, you can determine the memory leak.
Back to top risk of premature optimization
Optimization is often considered a good habit. However, if blindly optimization is contrary to the design principles of the software is not good. When you start developing a new software, the mistakes developers make often are premature optimizations (permature optimization). If you optimize your code prematurely, the results may be quite different from the original code. It may only be part of a complete solution, and may contain errors due to optimization-driven design decisions.
One rule of thumb is if you haven't done any measurement (performance analysis) of the code.
Optimization is often not a good idea. First, you should focus on completing your code, then discovering real performance bottlenecks through performance analysis and finally optimizing your code.
Back to top run time complexity
The runtime complexity (Running time COMPLEXITY,RTC) is used to quantify the run time of the algorithm. It is the result of mathematical approximation to the running time of the algorithm under a certain number of input conditions. Because it is a mathematical approximation, we can use these values to classify the algorithm.
The common representation of RTC is the large o mark (Big O notation). Mathematically, the large O mark is used to denote an infinite number of
The finite characteristics of the function (similar to the Taylor expansion). If this concept is used in computer science, the algorithm can be run
Time is described as a gradual finite feature (order of magnitude).
The main models are:
Constant time--o (1): For example, to determine whether a number is odd or even, printed with standard output information. For more complex operations in theory, such as finding a key value in a dictionary (or hash table), if the algorithm is reasonable, it can be done in constant time. Technically, the time to find an element in a hash table is an O (1) Average time, which means that the average time of each operation (regardless of special circumstances) is a fixed value O (1).
Linear time--o (n): For example, find the smallest element in the unordered list, compare two strings, delete the last item in the list
Logarithmic time--o (LOGN): The algorithm of the logarithmic time (logarithmic times) complexity, which indicates that the algorithm runs at a fixed limit as the number of inputs increases. As the number of inputs increases, the logarithm function starts to grow quickly and slows down slowly. It does not stop growing, but the more slowly it grows, the slower it can even be ignored. For example: Binary search, calculate Fibonacci sequence (multiply by matrix).
Linear logarithmic time--o (NLOGN): The combination of the previous two time types becomes a linear logarithmic time (linearithmic times). As x increases, the running time of the algorithm increases rapidly. such as merge sort, heap sorting (heap sort), quick sort (quick sort, at least average run time)
Factorial time--o (n!) : The factorial time (factorial times) complexity algorithm is the worst algorithm. Its time is particularly fast, and the drawings are hard to draw. For example: Use brute force search method to solve traveling salesman problem (traverse all possible paths).
Squared Time--o (n 2): The squared time is another fast-growing time complexity. The more you enter, the longer it will take (most algorithms are like this, especially for this type of algorithm). The operation efficiency of the square time complexity is slower than the linear time complexity. such as bubble sort (bubble sort), traverse two-dimensional array, insert sort (insertion sort)
Speed: logarithmic > Linear > Linear logarithmic > Squared > Factorial, consider best case, normal, and worst case scenarios.
Back to top performance analysis best practices
Build regression test suites, think about code structures, be patient, collect as much data as possible (other data resources such as system logs for network applications, custom logs, system resource snapshots (such as operating system Task Manager), data preprocessing, data visualization
The most famous performance analysis library in Python: CProfile, Line_profiler.
The former is the standard library: Https://docs.python.org/2/library/profile.html#module-cProfile.
The latter see also: Https://github.com/rkern/line_profiler.
Focus on CPU time.
Disclaimer: Original Link: my.oschina.net/u/1433482/blog/709219
Analyzing Python performance