Introduction: PAPI is a set of local interfaces for accessing the processor hardware performance counters, and using this interface to monitor the hardware execution characteristics of the Java program will help to discover the root cause of the program performance problem on the hardware layer of the computer system. This paper introduces the important concept of PAPI and its common interface, analyzes the key points and difficulties of applying it to the performance evaluation of Java program, and proposes a method based on JVMTI interface to monitor the runtime hardware execution characteristics of Java program using PAPI interface.
PAPI Interface Overview
In the hardware layer of the computer system, the performance of the Java program is evaluated and analyzed, which helps to discover the source of the program performance problem. Most of the current mainstream processors set up a class of events to record the details of the processor's behavior during the program's operation, while also specifically designing hardware performance counters (Hardware performance Counter) to count such events. The number of processor events monitored by hardware performance counters can visually reflect the hardware execution characteristics of the program, such as the number of instructions executed by the processor, the number of L1 Cache failures, and so on, which is a reliable basis for evaluating and analyzing the performance of the program.
PAPI (Performance application Programming Interface) is a set of standard interfaces that can access hardware performance counters on multiple processor platforms. Its goal is to facilitate users to monitor and collect processor event information logged by hardware performance counters while the program is running.
Different processors define different processor event sets based on their architectural features, which are referred to as native events (Native event) in PAPI. At the same time, different processors also have a different number of hardware performance counters, and at any time a counter can only monitor a specified native event. Taking into account the need for event monitoring and performance analysis, the native event sets for different processors tend to intersect functionally (for example, those related to storage-level access, Cache consistency protocol, cycle and instruction count, function unit, and pipelining status), but their corresponding native event names may not be the same. To facilitate the screening of events, PAPI abstracts these native events with functional commonalities in different processors into PAPI-specific prefabricated events (Preset event) and uniformly named. PAPI prefabrication event is not only a simple mapping of a single native event, according to the difference of native event setting in different architectures, it may also be composed of several native events, such as PAPI prefabricated events that record the number of L1 Cache failures, and need to rely on L1 D-cache failure when implementing on some processors The number of times and the number of L1 I-cache failures are supported by two native events. By defining a prefab event, the PAPI interface has some portability, but for the set of native events defined in some processors, the PAPI prefab event may not be able to fully overwrite it.
PAPI provides two types of interface access to hardware performance counters: One is a simpler high-level interface for basic counting, and the other is a programmable low-level interface that meets the user's complex monitoring needs.
PAPI High-level interfaces provide some basic functionality needed to access hardware performance counters, such as configuring counters, starting counts, stopping counts, reading counter values, and so on. High-level interfaces can only take advantage of PAPI prefabricated events and cannot be configured to monitor processor native events beyond the coverage of prefabricated events. However, the PAPI high level interface can directly return some of the performance metrics that are most commonly used in program evaluation, such as the number of instructions performed per cycle, floating-point instructions/floating-point operands executed per second, and the running time of the program, and the high-level interface can also obtain some system information, For example, the number of hardware performance counters that the processor can support.
Unlike high-level interfaces, which can only use PAPI prefabricated events, the PAPI interface can monitor the processor hardware behavior of the program at run time directly using native events. A user can compose one or more native events into an event set, then, by setting the hardware performance counter, all the native events in the Event group are monitored simultaneously, and then the performance problems of the program are analyzed according to the monitoring results, for example, by capturing the number of floating-point instructions performed and L1 Cache per second The number of failures helps to analyze whether it is because the L1 Cache's hit rate is low, resulting in a drop in program floating-point performance. It should be noted that the number of native events in the event group cannot exceed the number of hardware performance counters that the processor can support.
The PAPI bottom interface is more flexible to use than the high-level interface, enabling more comprehensive monitoring of processor events. Therefore, the main content of this paper is to explore how to use the PAPI bottom interface to monitor and capture the hardware performance features of Java programs.