In the first part of this article, we compare the performance between synchronous StringBuffer and asynchronous StringBuilder through a single thread benchmark. From the initial benchmark results, bias locks provide the best performance and are more efficient than other optimizations. The results of the tests seem to indicate that acquiring a lock is an expensive operation. But before I reached the final conclusion, I decided to test the results first: I asked my colleagues to run the test on their machines. Although most of the results confirm my test results, some results are quite different. In the second part of this article, we'll look at the techniques used to test the results in more depth. Finally, we will answer the real question: Why are the lock overhead differences so great on different processors?
Traps in benchmark tests
It is very difficult to answer this question through a benchmark, especially a "small scale benchmark test" (Microbenchmark). In most cases, the benchmark test will show some completely different scenarios than you would expect to measure. Even when you measure the factors that affect the problem, the results are influenced by other factors. One thing is clear at the beginning of the experiment, that the benchmark needs to be reviewed by others, so that I can avoid falling into the trap of reporting invalid benchmark data. In addition to other people's checks, I used a number of tools and techniques to verify the results, which I'll discuss in the following sections.
Statistical processing of the results
The operations performed by most computers will be completed within a certain period of time. As far as my experience is concerned, I find that even those uncertain operations can be done in a fixed period of time under most conditions. It is based on this characteristic of computing that we can use a tool that lets us know when things are starting to become dysfunctional. Such tools are based on statistics, and their measurement results are somewhat different. That is to say, I would not have done too much explaining even if I had seen some of the reported values above normal levels. The reason for this is that if I provide a fixed CPU for the number of instructions, and it does not finish in a relatively fixed time, it means that my measurements are affected by some external factors. If the test result is a large exception, it means that I have to find the external impact and then solve it.
Although these exceptions will be magnified in a small benchmark test, it will not affect a large scale benchmarking test. For large-scale benchmarking, each aspect of the target application being measured can interfere with each other, which can lead to some anomalies. But exceptions can still provide some useful information that can help us judge the level of interference. Under a steady load, I would not be surprised by individual anomalies, and of course, there should not be too many anomalies. For those results that are larger or smaller than the usual results, I will look at the performance of the test and see it as a signal that my benchmark test has not been properly isolated or set up. Doing so differently for the same test illustrates the difference between a comprehensive benchmark test and a small benchmark test.
Last but not least, you are testing what you are trying to do. This at most can only be explained, for the final problem, this test is most likely to be correct.
Caching of Preheating methods
JIT compiles your code, which is also one of the many behaviors that affect benchmark testing. Hotspot will frequently check your programs to find opportunities to apply certain optimizations. When an opportunity is found, it asks the JIT compiler to recompile a piece of code in the problem. At this point it applies a technique, the current stack substitution (on stack replacement,osr), to switch to the execution of the new code. Performing OSR can have a variety of knock-on effects on the test, including suspending the execution of the thread. Of course, all of this activity interferes with our benchmark test. This kind of interference will cause the test to deviate. We have two tools on hand to help us identify when the code will be affected by the JIT. The first of course is the difference in the test, and the second is the-xx:-P rintcompilation tag. Fortunately, if not all of the code is JIT-processed early in the test, then we can treat it as a different startup anomaly. All we need to do is run the benchmark until all the code has been JIT before starting the measurement. This warm-up phase is often referred to as the "cache of preheating methods."
Most JVMs run at the same time in the interpreted and native mode. This is called mixed mode execution. Over time, hotspot and JIT will translate interpreted code into native code based on the information gathered. Hotspot to determine which optimization scheme should be used, it will sample some calls and branches. Once a method reaches a specific threshold, it notifies the JIT of the cost-generator code. This threshold value can be set by the-xx:compilethreshold tag. For example, if you set-xx:compilethreshold=10000, hotspot compiles it to native code 10,000 times after the code is executed. Heap Management
The next area to consider is garbage collection, or more widely known name-heap management. A variety of memory management activities occur on a regular basis during the execution of any application. They include: Re-dividing stack space, reclaiming memory that is no longer in use, moving data from one place to another, and so on. All of these actions cause the JVM to affect your application. The question we face is: is there a need to include memory maintenance or garbage collection time in benchmark tests? The answer to the question depends on the type of problem you are trying to solve. In this case, I'm only interested in the cost of getting the lock, which means I have to make sure that the test cannot contain the time of garbage collection. This time, we can find the factors that affect the test through abnormal phenomena, and when this problem occurs, garbage collection is a possible object of suspicion. The best way to define a problem is to use the-VERBOSE:GC flag to turn on the GC log function.