in order to analyze the use of CPUs in different versions of different software or software, the relevant designer often needs to perform a stack performance analysis of the functions. Using timed interrupts to collect PC register values, function addresses, and the entire stack track is more efficient than regular sampling to get the data. Currently, tools such as OProfile, Gprof, and Systemtap Use this method to give detailed CPU usage reports. However, when dealing with complex statistical data, these tools are often too cumbersome and intuitive to respond directly to the data that analysts need. To this end, Brendan Gregg developed a tool that specifically transforms the sampled stack traces (stack trace) into an intuitive image display--flame graph (flame diagram). However, because of the parser and the JDK environment, the Java program cannot be generated before the mixed-mode flame diagram. Recently, Brendan Gregg and Martin Spier discovered a way to solve the problem, practiced within Netflix, and contributed a very detailed practical article. Provides a great convenience for the performance analysis of Java programs. Then, starting from the cause of the problem, this article briefly introduces its thinking and method to solve the problem.
First of all, the concept of flame diagram is briefly introduced in this paper. A flame diagram is both an open source tool and a type of picture. As a two-dimensional picture, the x-axis of the flame graph represents the total number of samples, while the y-axis represents the stack depth. Each box represents a function in a stack whose width represents the total amount of CPU time consumed. Therefore, a wider box means that the function runs slower or is called more often, which consumes more CPU time. With the flame diagram, the relevant design or analyst can easily observe the CPU usage of each application.
However, the flame diagram itself does not have the capability to detect performance. It requires the assistance of other performance analysis tools. In the Java environment, there are two types of stack track sampling analyzers-System Profiler and JVM Profiler. The former, such as Linux perf Events, can parse the system code path, including the LIBJVM internal, GC, and kernel, but not Java methods, such as HPROF, lightweight Java parsers, and other business analyzers, which can display Java methods, However, the system code path cannot be displayed. Thus, neither of these methods supports both the system code path and the stack trace of the Java method. And the flame diagram of the two can not meet the demand well. Therefore, Brendan and others have been concerned about how to solve the problem.
In a previous discussion, Brendan had analyzed why the System Analyzer could not display the Java method. This includes two aspects of the--JVM compilation method, which is faster, does not expose a symbol table for the System Analyzer, and the JVM uses the frame pointer on the x86 as a universal register, destroying the traditional stack walking. Then, to solve the previous problems, we need to start from these two aspects separately. For the first aspect, the parser for Java and Linux Systems has made a double effort. First, Java began to support the use of open source JVMTI proxy perf-map-agent to create perf-pid.map text files. This file lists the 16-binary symbolic address, size, and symbol name. Then, from 2009 onwards, the Perf_events tool in Linux added support for JIT symbols. The tool examines the/tmp/perf-pid.map file to complete the check of symbols from the language virtual machine. For the second aspect, the JVM adds a new option,-xx:+preserveframepointer. With the efforts of Zoltán, Oracle and other engineers, the latest JDK9 and JDK8 have added this option to save the stack walking.
After both issues are resolved, users simply have to install perf Events, new JDK, perf-map-agent, and flamegraph software and configure Java (especially open-xx: +preserveframepointer option), you can generate a system-level flame diagram. In order to automate the process of generating the flame diagram, Brendan has begun modeling processes based on the open source Instantiation analysis tool vector.
In the future, Breden and others plan to do a lot of work. One is the rule analysis by automating the collection of differential flame graphs of different dates. This helps to quickly understand the changes in CPU usage caused by software changes. In addition, they are also trying to use Perf events to record and analyze user and kernel-level event logging, such as disk IO, networking, scheduling, and memory allocation. Finally, improvements in real-time updating of flame graphs and vectors are also considered to be added in the future.
The practice of forwarding Java flame diagram in Netflix