Measure Java performance–sampling or instrumentation

Last Update:2014-09-03 Source: Internet

Author: User

Tags change settings

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copy from https://blog.codecentric.de/en/2011/10/measure-java-performance-sampling-or-instrumentation/

In recent discussions, I noticed a confusion on the differences between measuring with sampling andinstrum Entation.
I hear about which one should was better than others, but I think it was important to understand what they work. Some Tools even ask to choose between those both directly at startup, like Jprofiler 7 in the picture on the right.
But how can I choose whichever fits best the given situation without understanding those concepts?

So lets step back for a moment and think on how to measure the performance of a computer program:
A) We add measurement code into the real code. The measurement code tells us how long the real code took to execute.
b) We add an external observer, which examines the running real code and tells us what code executed at any given time.

Both approaches would work and give results, however they is fundamentally different! So I am going to explain how the they work.

The examples I'll use with this post is based on Java and the JVM, but is applicable to other language s, especially . Net with it CLR as well.

So what can we in Java add measurement code?
It turns out there is actually a few established methods for measuring by adding measurement code:

Manually add some System.out.println code for important methods. Execution time is printed to the log.
Create some kind of javax.management beans to record time. They can recorded manually and queried with tools later on.
Use the AOP Libraries to build aspects which record code execution time.
Build an JVMTI Agent, which uses APIs to add code and record execution time.

Those methods overlap in one, or the other. In the end, all would have a code on top of the real application code for calculating the time used to execute. The former approaches usually involve some kind of basic file logging or JMX (JSR-3, JSR-250). JMX was designed to gather metrics for management and to is able to change settings. The latter methods is more dynamic and does not require hardcoding during development. Adding code like this is called "instrumentation" and usually involves bytecode modification.

How can we observe externally in Java?
Ideally we would like to observe from outside the runtime (the JVM). JMX was made accessible externally inJSR-160, but JMX prevents us from seeing many details, as it's just high level. While there is other APIs, this allow us reading the same JVM, none really tells us how fast code executes. To do better pseudo-external observation, we create a thread, which have this observing duty, just inside the monitored JVM . That thread would look from time to time into the other threads and record their activity. The interval between those inspections should is small enough to capture many details. This external observation is called "(time-) sampling". With timesampling, the monitoring isn't continous, but it does cover all requests or threads.

In this post, I am going to compare both approaches using a easy to understand example. Because It is designed to being easily understandable, it uses naive code and does not contain optimizations.

Example Code

So first off, this is the code:
Attached as ZIP, or samplingvsinstrumentation on my private GitHub.

We have a class Demo, which running all our fake production code. It has a few methods called like this:method100ms ()
The name includes the average execution time to allow easier readings of the results. Unfortunately in real code no method name would have the this information
There is a method0ms (), which does some minor code execution, so it isn't zero milliseconds, but much faster than one mil Lisecond.

Demorunner has the methods for executing Demo class business methods;
A) Mixed demo would run demo with [1, 1, 1, +, +] methods
b) Mass Demo would run the 0ms method a hundred million times.

So we can run it like this:

public static void Main (final string[] args) {Mixeddemo (); Massdemo ();}

And it'll complete. But we don't know anything about it. We can use a external tool to get some kind of result:a stopwatch.
On my machine (Dell E6420, Intel 2520 2,5ghz 4 Core CPUs, 64bit Win, Java 1.6.0_27) It takes about a second to run the Mixe D demo and almost three seconds for the Plenty 0ms method invocations.

So let us add some outside measurement code to get more precise numbers:

public static void Main (final string[] args) {Long start = System.currenttimemillis (); Mixeddemo (); Long end = System.curren Ttimemillis (); System.out.printf ("%s Demo completed in%dms%n", demotype.mixed, End-start); Start = System.currenttimemillis (); Massdemo (); end = System.currenttimemillis (); System.out.printf ("%s Demo completed in%dms%n", Demotype.mass, End-start);}

which gives us:

Running demo with [1, +, 1, 1, +, +] methodsmixed demo completed in 967msRunning demo with 100000000 0m S Methodsmass Demo completed in 2781ms

Lets Talk about Overhead
When measuring, you'll distort the results. Generally speaking, the measured value would differ from the true value by so called systematic error and a random error. Systematic errors is introduced by the measurement instruments and measured or estimated to a certain extend, while Rando M errors cannot be predicted.
When the CPU executes measuring code instead of real code we usually speak of ' overhead ', which results in systematic erro RS in measurements. It also consumes CPU cycles which could has been used by other production code and can as such influence also unmeasured Code behaviour. Additionally, the real important overhead is the delay othe regular transactions through the system. Additional system ressource usage can usally be tolerated.

Instrumentation

To better measure what's the Demo code is doing, I'll build an instrumentation agent based on AOP WITHASPECTJ Loadtime WEA Ving Tsun. This would add some extra code invocation to some methods I specify with a so called "pointcut expression".
ASPECTJ would enhance the bytecode of classes when they is loaded. The pointcut expression describes the signature of the method which shall be instrumented.
In a @Around advice are used, which is passed the so called Joinpoint, which are actually a pointer to the real C Ode that is about to being executed. ASPECTJ uses an JVMTI Agent and does the hard work for me. I just has to write a so called Aspect to do my measurements.

The interesting part of the code was this:

 @Around ("Call (void De.codecentric.performance.Demo.metho d* (..)) ") public void Arounddemomethodcall (final proceedingjoinpoint thisjoinpoint) throws Throwable {Long start = System.currenttimemillis (); Thisjoinpoint.proceed (); Long end = System.currenttimemillis (); String Currentmethod = Thisjoinpoint.getsignature (). toString (); if (Executionpath.size () < Max_execution_path) { Executionpath.add (Currentmethod);} Methodstatistics statistics = methodstatistics.get (Currentmethod), if (statistics = = null) {statistics = new Moremethodstatistics (Currentmethod); Methodstatistics.put (Currentmethod, statistics);}
  Statistics.addtime (End-start); overhead + = System.currenttimemillis ()-end;}

As can see, I give explicit method names to intercept: call(void de.codecentric.performance.Demo.method* (..)) .
I record start time at the beginning, and Endtime after executing the method. Additionally I store the current method name in "Execution path" (unless it reached its maximum), and I record method stat Istics for the current method. I also record how much time I spent recording this data in a field called "overhead".

Running This instrumentation gives me this:

Running demo with [1, +, 1, 1, +] methodsmixed demo completed in 950msTrace Aspect recorded Followin G Results:void de.codecentric.performance.Demo.method500ms () 501ms (min:501ms, max:501ms)-1 invocations void De.cod Ecentric.performance.Demo.method100ms () 303ms (min:101ms, max:101ms)-3 invocations void DE.CODECENTRIC.PERFORMANCE.D Emo.method50ms () 102ms (min:51ms, max:51ms)-2 invocations void de.codecentric.performance.Demo.method1ms () 6ms (min: 2ms, Max:2ms)-3 Invocationscode execution path:void de.codecentric.performance.Demo.method100ms () void De.codecentri C.performance.demo.method1ms () void de.codecentric.performance.Demo.method100ms () void De.codecentric.performance.Demo.method500ms () void de.codecentric.performance.Demo.method1ms () void De.codecentric.performance.Demo.method100ms () void de.codecentric.performance.Demo.method1ms () void De.codecentric.performance.Demo.method50ms () void de.codecentric.performance.Demo.method50ms () ageNT Internal Overhead 2msAgent Overhead 91ms running demo with 100000000 0ms methodsmass demo completed in 7261msTrace Aspect recorded following results:void de.codecentric.performance.Demo.method0ms () 2892ms (min:0ms, max:2ms)-10000000 0 Invocationscode Execution path:void de.codecentric.performance.Demo.method0ms () void  De.codecentric.performance.Demo.method0ms () [...] void de.codecentric.performance.Demo.method0ms () void de.codecentric.performance.Demo.method0ms () execution Path incomplete! Agent internal Overhead 2836msAgent Overhead 4ms

We can clearly see, the instrumentation caught all the different 8 method invocations in the first example and did quite a Ccurately record the time spent. It can tell us also in which order these methods executed. But it has a problem, as the second output shows us. The execution path was very long. One hundred million executions the aspect would need to keep in memory. Thats why I put in a limit there.

So what's about Overhead?

There is kinds of overhead measured by my demo code. Both is isn't really accurate, but give good indication where time was spent by the agent.
The internal one counts the time spent the agent doing the statistics. It is internal because it cannot being differentiated externally and looks like time the actual business method takes to exec Ute. And there is the overhead which can be seen externally. The the time required to set up the instrumentation and to print the results.
We can see that the overhead of instrumentation are low in the first case, but outputting the data to standard out took SOM E time. In the second demo, the output was faster, because there is less data, and the internal overhead was huge. However there is a problem with the overall overhead. The internal overhead differs from the total time minus method time. 7261ms–2892ms = 4369 Ms of the time is not spent in running real code. But the agent is only claims to account for 2836ms. The delta is accounted on inaccuracy both of the external and internal time measures. And of course there is some code execution inside the instrumentation which are not added to the overhead time (like the Me Thod Invocation cost ofaroundDemoMethodCall(JoinPoint thisJoinPoint))

Sampling

My Sampling code would create a daemon thread, which would look into the main thread every 10ms and the activity. T He interesting code of that the agent is this:

@Overridepublic void Run () {lastsample = System.currenttimemillis (); while (true) {try {thread.sleep (interval);} catch ( Interruptedexception e) {thread.currentthread (). interrupt ();}  string Currentmethod = Getcurrentmethod (); Long currentsample = System.currenttimemillis ();  Addmeasurementsifstillinmethod (Currentmethod, currentsample);  lastmethod = Currentmethod;lastsample = Currentsample; overhead + = System.currenttimemillis ()-Currentsample;}}  private void Addmeasurementsifstillinmethod (Final String Currentmethod, final long currentsample) {if ( Currentmethod.equals (Lastmethod)) {Methodstatistics statistics = methodstatistics.get (Currentmethod); if (statistics = = null) {statistics = new methodstatistics (Currentmethod); Methodstatistics.put (Currentmethod, statistics);} Statistics.addtime (currentsample-lastsample);} else {if (Executionpath.size () < Max_execution_path) {Executionpath.add (Getparentmethod () + ">" + Currentmethod); }}} private String GetcurrEntmethod () {stacktraceelement topofstack = Monitoredthread.getstacktrace () [0];return formatstackelement (TopOfStack );}

So the agent would sleep its given interval, wake-up and find-out-what method was executing on the monitored thread using monitoredthread.getstacktrace () [0] . Then it records the current time to find out how long it had been sleeping since the last sample (this would be likely Arou nd 10ms but might differ!). Then it would find out if the code was still in the same method as the last time. This was quite important, because the agent can only record the execution time when the same method is seen back-to-back. If The method is seen the first time, we add it to the execution path (also respecting the same limit here). Then we update internal states and calculate overhead for the statistics part.

Agent Monitoring thread main with sampling interval of 10msRunning Demo with [1, 1, 1, +, +] methods MIXED Demo completed in 916msAgent stopped-results:void de.codecentric.performance.Demo.method500ms () 488ms void de.c ODECENTRIC.PERFORMANCE.DEMO.METHOD100MS () 285ms void Java.lang.Thread.sleep () 101msCode execution path:void De.codecentric.performance.Demo.runCode () > void de.codecentric.performance.Demo.method100ms () void De.codecentric.performance.Demo.runCode () > void de.codecentric.performance.Demo.method500ms () void De.codecentric.performance.Demo.runCode () > void de.codecentric.performance.Demo.method100ms () void DE.CODECENTRIC.PERFORMANCE.DEMO.METHOD50MS () > void Java.lang.Thread.sleep () Agent internal Overhead 4msAgent Overhead 36msAgent Monitoring thread main with sampling interval of 10ms running Demo with 100000000 0ms Methodsmass Demo completed in 2959msAgent stopped-results:void de.codecentric.performance.Demo.method0ms ()2736msCode execution path:void De.codecentric.performance.DemoRunner.massDemo () > void De.codecentric.performance.DemoRunner.massDemo () void De.codecentric.performance.Demo.runCode () > void DE.CODECENTRIC.PERFORMANCE.DEMO.METHOD0MS () Agent internal Overhead 0msAgent Overhead 0ms

So we can clearly see that sampling had problems capturing the 1 millisecond methods. But we see a thread.sleep () , which we had not seen with instrumentation. Because the sampling have much easier access to previous executing method using monitoredthread.getstacktrace () [1] , we discover that it's method50ms which is invoking thread sleep. But the execution path was missing a few short invocations. The invocations of 100ms, 1ms and 100ms is seen as one about 200ms long invocation of the method called 100ms. It kind of automatically filters the performance wise irrelevant 1ms execution, so this chain is presented as 200ms Execut Ion of METHOD100MS. This was mainly due to the fact that the agent would not see code which returns faster than the sampling interval. When doing sampling, there is other aspects to consider in respect to the sampling interval. A Good paper on this topic is: "Evaluating the accuracy of Java profilers"

As we can see, sampling gives the expected results on the second demo code without problems with the executionpath length.

Overhead again

So we can see this in the first example, and the overhead is quite similar to instrumentation. The second example the internal overhead is drastically lower. We only Miss 223MS (2959ms–2736ms) and this seems is not the caused by our internal measures. Also This agent runs in a different thread, which makes some overhead not resulting in longer execution time of the real C Ode. And this would utilize multiple cores easily.

Conclusion

Both, instrumentation and sampling, has different characteristics with their own advantages and disadvantages.
They is caused by the fundamental difference and can is mitigated to some extend by clever construction of the agents Never removed.

Instrumentation

have access to invocation count, precise actual/max/min/avg execution times and precise invocation order.
Needs to has the configuration on the what methods to instrument. Instrumentation needs to being balanced to exclude mass invocations or invocations where the measuring code outweighs the MEA Sured code.
Generally have much more data to process.

Sampling

Stable overhead, mainly determined by sampling interval, not by measured code.
Execution hot spots is shown instead of fine granular execution path and time.
Can Discover unknown code.
Runs easily on separate core.

Measure Java performance–sampling or instrumentation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More