Project Address: Ruby-prof
In the previous Profiling tool in Ruby, we listed several of the most commonly used profilers, but just a brief introduction, this time detailing how ruby-prof is used.
Ruby-prof is a powerful, support CPU, memory use, object allocation and so on performance analysis, and provides a lot of friendly output format, not only have text-based, HTML format, but also output Graphviz format dot file, and applicable with Kcachegrindcall treeformat, in fact, this format is based on Valgrind, this tool is very good, we can go to the official website to understand.
There are two ways to runruby-prof, one is theruby-profstart and stop code that needs to be inserted in the source:
require ‘ruby-prof’
RubyProf.start
# Write the code for performance analysis here result = RubyProf.stop
# Select a Printer printer = RubyProf :: FlatPrinter.new (result) printer.print (STDOUT)
Another is to run directly on the command line, after the Gem package is installed, theruby-profcommand is also installedruby-prof, using the following:
ruby-prof -p flat test.rb
This method is more flexible and we use this method to illustrateruby-profthe use of the method.
Directly runruby-prof -hgetruby-profhelp information, because too much, here is not listed, you can do their own in the system to see.
Where the-pparameters are output format, the following will describe the format of each Printer, the significance of the exponent and the use of related display tools. The use of several other parameters is also described in the process of introducing the output format.
Output format type
flat - Prints a flat profile as text (default).
flat_with_line_numbers - same as flat, with line numbers.
graph - Prints a graph profile as text.
graph_html - Prints a graph profile as html.
call_tree - format for KCacheGrind
call_stack - prints a HTML visualization of the call tree
dot - Prints a graph profile as a dot file
multi - Creates several reports in output directory
Sample Program
def m1
"string" * 1000
end
def m2
"string" * 10000
end
def start
n = 0
n = n + 1 while n < 100_000
10000.times do
m1
m2
end
end
start
This is the most basic test program, and we'llruby-profadd additional code to the presentation as we introduce the functionality.
The impact of GC on profiling
Performance analysis of the time the GC will always have a large impact on the results, here we do not consider it, we will have another article to do a special introduction.
The simplest output format-flat
ruby-prof -p flat test.rb
Measure Mode: wall_time
Thread ID: 12161840
Fiber ID: 19223800
Total: 0.206998
Sort by: self_time
%self total self wait child calls name
68.50 0.142 0.142 0.000 0.000 20000 String#*
10.45 0.207 0.022 0.000 0.185 1 Object#start
6.82 0.014 0.014 0.000 0.000 100001 Fixnum#<
6.46 0.013 0.013 0.000 0.000 100000 Fixnum#+
2.84 0.158 0.006 0.000 0.152 1 Integer#times
2.52 0.128 0.005 0.000 0.123 10000 Object#m2
2.40 0.024 0.005 0.000 0.019 10000 Object#m1
0.01 0.207 0.000 0.000 0.207 2 Global#[No method]
0.01 0.000 0.000 0.000 0.000 2 IO#set_encoding
0.00 0.000 0.000 0.000 0.000 3 Module#method_added
* indicates recursively called methods
First, one by one. Explain the meaning of the indicators:
Indicator |
explanation |
%self |
The time taken by the method itself, excluding the other method execution time of the call |
Total |
Total time of the method execution, including the execution time of the other method being called |
Self |
The time that the method itself executes, not including the other method execution time of the call |
Wait |
Multi-threaded, waiting for other threads in the time, in a single-threaded program, always 0 |
Child |
Total time of other methods called by the method |
Calls |
Number of calls to the method |
The basic relationship between them is:
total = self + wait + child
Specifically,String#*This method occupies 68.5% of the program's running time, takes 0.142 seconds, executes 20,000 times, and theObject#startmethod is defined in the codestart, which occupies 10.45% of the program run time, takes 0.022 seconds, calls the The method took 0.185 seconds, called 1 times, the total time spent is 0.022 + 0.185 = 0.207, I believe we can now name the meaning of the index white.
Now that we understand the indicator meaning of this output, if this program is a performance problem, then what is the problem with the data? Normally, we need to look at two indicators,%self and calls, simply look at%self sometimes useless, the above example, its time-consuming method isString#*that we are not likely to improve the language itself, in this case, we found that the value of calls is larger, Then try to reduce theString#*call to the method.
Using the flat output format, you can only find such a simple problem, if you want to reduceString#*the method call, you need to know who called it, and this output format is not shown, you need to choose other output format.
Simple call relationship output-graph
ruby-prof -p graph test.rb
Measure Mode: wall_time
Thread ID: 17371960
Fiber ID: 24397420
Total Time: 0.21026015281677246
Sort by: total_time
%total %self total self wait child calls Name
--------------------------------------------------------------------------------
99.99% 0.01% 0.210 0.000 0.000 0.210 2 Global#[No method]
0.210 0.022 0.000 0.188 1/1 Object#start
0.000 0.000 0.000 0.000 3/3 Module#method_added
--------------------------------------------------------------------------------
0.210 0.022 0.000 0.188 1/1 Global#[No method]
99.98% 10.34% 0.210 0.022 0.000 0.188 1 Object#start
0.161 0.006 0.000 0.155 1/1 Integer#times
0.014 0.014 0.000 0.000 100001/100001 Fixnum#<
0.014 0.014 0.000 0.000 100000/100000 Fixnum#+
--------------------------------------------------------------------------------
0.161 0.006 0.000 0.155 1/1 Object#start
76.48% 2.68% 0.161 0.006 0.000 0.155 1 Integer#times
0.130 0.005 0.000 0.125 10000/10000 Object#m2
0.025 0.005 0.000 0.020 10000/10000 Object#m1
--------------------------------------------------------------------------------
0.020 0.020 0.000 0.000 10000/20000 Object#m1
0.125 0.125 0.000 0.000 10000/20000 Object#m2
69.23% 69.23% 0.146 0.146 0.000 0.000 20000 String#*
--------------------------------------------------------------------------------
0.130 0.005 0.000 0.125 10000/10000 Integer#times
61.81% 2.28% 0.130 0.005 0.000 0.125 10000 Object#m2
0.125 0.125 0.000 0.000 10000/20000 String#*
--------------------------------------------------------------------------------
0.025 0.005 0.000 0.020 10000/10000 Integer#times
11.99% 2.28% 0.025 0.005 0.000 0.020 10000 Object#m1
0.020 0.020 0.000 0.000 10000/20000 String#*
--------------------------------------------------------------------------------
0.014 0.014 0.000 0.000 100001/100001 Object#start
6.73% 6.73% 0.014 0.014 0.000 0.000 100001 Fixnum#<
--------------------------------------------------------------------------------
0.014 0.014 0.000 0.000 100000/100000 Object#start
6.42% 6.42% 0.014 0.014 0.000 0.000 100000 Fixnum#+
--------------------------------------------------------------------------------
0.01% 0.01% 0.000 0.000 0.000 0.000 2 IO#set_encoding
--------------------------------------------------------------------------------
0.000 0.000 0.000 0.000 3/3 Global#[No method]
0.00% 0.00% 0.000 0.000 0.000 0.000 3 Module#method_added
* indicates recursively called methods
The content of this output is more abundant, but it may also make the head a bit dizzy. Let's take a slow look at the analysis.
First of all, the sorting method is different, is sorted according to Total_time, flat output format is sorted by Self_time. The entire report is divided into several points, each of which describes the number of method invocation information, but note that the leftmost two columns,%total,%self the two columns are not empty,
First look at the second part:
--------------------------------------------------------------------------------
0.210 0.022 0.000 0.188 1/1 Global#[No method]
99.98% 10.34% 0.210 0.022 0.000 0.188 1 Object#start
0.161 0.006 0.000 0.155 1/1 Integer#times
0.014 0.014 0.000 0.000 100001/100001 Fixnum#<
0.014 0.014 0.000 0.000 100000/100000 Fixnum#+
--------------------------------------------------------------------------------
Object#startThe execution of the method took 99.98% of the total time, not including the child method calls, took 10.34% of the time, was called once, andstartalso calledInteger#times,Fixnum#<andFixnum#+three methods in the method.
Then the second right number (calls), is a delimited number of/two, the number on the left is how many times this method is called at this levelObject#start, the number on the right is theObject#starttotal number of runs during this program. TheObject#startthree methods that are calledcallsare listed in theObject#startnumber of executions in, and the total number of executions.
In the first part there are two methods: theGlobal#[No method]representative does not have a caller, it can be understood that Ruby is preparing the execution environment,Module#method_addedwhen there is an instance method added, this method will be triggered.
So what can this output format explain? In the flat output format we have positionedString#*too many calls to the problem, then according to the output format of this graph format we should be able to find out who caused the problem.
First cut out the parts that can find the problem:
--------------------------------------------------------------------------------
0.020 0.020 0.000 0.000 10000/20000 Object#m1
0.125 0.125 0.000 0.000 10000/20000 Object#m2
69.23% 69.23% 0.146 0.146 0.000 0.000 20000 String#*
--------------------------------------------------------------------------------
0.130 0.005 0.000 0.125 10000/10000 Integer#times
61.81% 2.28% 0.130 0.005 0.000 0.125 10000 Object#m2
0.125 0.125 0.000 0.000 10000/20000 String#*
--------------------------------------------------------------------------------
0.025 0.005 0.000 0.020 10000/10000 Integer#times
11.99% 2.28% 0.025 0.005 0.000 0.020 10000 Object#m1
0.020 0.020 0.000 0.000 10000/20000 String#*
--------------------------------------------------------------------------------
The first part shows thatString#*inObject#m1andObject#m1in each was called 10,000 times, a total of 20,000 times, the same number of times, then look at the following, also 10,000 times, in theObject#m2time spent is 0.125 seconds, and inObject#m1The time spent is 0.020 seconds, more than 0.105 seconds, so that we can locate the problem isObject#m2here.
Graph can be output to HTML format, here just demonstrates the plain text version, HTML format more easy to interact, need to add parameter-F to specify the path and file name of the output.
GraphViz Dot-dot
ruby-prof -p dot test.rb -f dot.dot
There are tools to convert dot files to PDF viewing, as well as tools that specialize in viewing dot files, such as XDot on Ubuntu.
This picture also clearly shows that the problem isObject#m2here.
Interactive Call relationships-Call_stack
ruby-prof -p call_stack test.rb -f callstack.html
Here is really a picture win thousand words, at a glance,Object#m2in theString#*10,000 calls spent 60.52% of the time, do not have to explain, hurry up and try it yourself.
Ultimate Universal Full View-Call_tree
First installedKCacheGrind, Ubuntu under directsudo apt-get install kcachegrind
ruby-prof -p call_tree test.rb -f call_tree.out
OpenKCacheGrind, then opencall_tree.out(file type Select all), this magical tool can present to you all the truth.
With the output format described above, it is easy to understand this, we will introduce, but in another article, because this is a bit too long, the next article will detail the use of kcachegrind.
This article is compiled and collated by ONEAPM engineers. ONEAPM is the emerging leader in China's basic software industry, helping enterprise users and developers easily implement slow program code and real-time crawling of SQL statements. To read more technical articles, please visit the ONEAPM Official technology blog.
Ruby Profiler detailed ruby-prof (I)