Performance Monitor Unit (PMU) of the 64-bit armv8-a in Linux__linux

Source: Internet
Author: User
Tags volatile

Performance Monitor is a optional feature in armv8-a architecture. Performance Monitor in armv8-a includes a 64-bit cycle counter, a number of 32-bit event counters and control component.

From programmer perspective, it's a handy tool for performance monitoring and tuning. We can get processor status, like cycle, instruction executed, branch taken, Cache Miss/hit, memory read/write, etc. from T Hese PMU Event counters.

Performance counters support has been added in Linux Kernel since 3.6. Kernel has a utility named perf to view CPU PMU event statistics. Perf supports raw event ID or named event. Due to the difference architecture of CPUs, a few events are common defined in kernel. All the other events related to specific CPU architecture can is accessed by using raw event ID. For detailed usage of the Perf utility, refer to perf wiki tutorial page.

Perf can be used when measure the whole software program. But if only a piece of the code are interested in debugging, how to monitoring the CPU performance event counters for it? There are some articles describe the it for ARMV7. But Few of them mention ARMV8. This article'll try to cover ARMv8 ' s PMU.

There are two ways I know so far. Access PMU Registers by assembly code directly

The basic way is write assembly code to access PMU registers directly. Please note this armv8-a architecture allows access PMU counters from EL0 (means into user space of Linux). This article won't cover all register detail. Please refer to ARMv8 architecture Reference Manual for details. )

So the ' thing ' is to create a kernel module to enable User-mode access to PMU counters. Below is the code to set PMU register pmuserenr_el0 to enable User-mode access.

/*enable User-mode access to counters. */
ASM volatile ("MSR pmuserenr_el0,%0":: "R" (U64) armv8_pmuserenr_en_el0| armv8_pmuserenr_er| ARMV8_PMUSERENR_CR));

/*   Performance monitors Count enable Set register bit 30:0 disable to enable. Can also enable other event counters here. * 
/ASM volatile ("MSR pmcntenset_el0,%0":: "R" (armv8_pmcntenset_el0_enable));

/* Enable Counters * *
u64 val=0;
ASM volatile ("Mrs%0, Pmcr_el0": "=r" (Val));
ASM volatile ("MSR pmcr_el0,%0":: "R" (val| Armv8_pmcr_e));

After this kernel the module is loaded, the user space application can access PMU event counters.

/* Access Cycle counter/ASM volatile ("Mrs%0, Pmccntr_el0": "=r" (r)); /* Setup PMU counter to record specific event/* Evtcount is the event ID * * Evtcount &= armv8_pmevtyper_evtcount_m
Ask;
ASM volatile ("ISB");
/* Just Use counter 0/ASM volatile ("MSR pmevtyper0_el0,%0":: "R" (Evtcount));
/* Performance monitors Count enable Set register bit 30:1 disable, 31,1 Enable/uint32_t r = 0;
ASM volatile ("Mrs%0, Pmcntenset_el0": "=r" (r));

ASM volatile ("MSR pmcntenset_el0,%0":: "R" (r|1));

/* Read counter/ASM volatile ("Mrs%0, Pmevcntr0_el0": "=r" (r)); /* Disable PMU counter 0.
Performance monitors Count Enable Set register:clear bit 0*/uint32_t r = 0;
ASM volatile ("Mrs%0, Pmcntenset_el0": "=r" (r));

ASM volatile ("MSR pmcntenset_el0,%0":: "R" (R&&0xfffffffe)); 

This is a simple way to access PMU. But it also has limitation. It could conflict with the other performance tools running in background (like perf). Using Perf_event_open System Call

Another Way is to use Linux perf infrastructure. Software can use Perf_event_open system call to get the PMU event counters from kernel. So above ugly kernel the module is not needed. PAPI is a tool to access hardware performance counters. But Unfortunately, it doesn ' t support armv8-a yet. Austin Seipp suggests to use GNU C ' __attribute__ ((constructor)) and __attribute__ ((destructor)) routines. The constructor invokes the system call which returns a file descriptor. We can later read from the ' file descriptor to ' the cycle count from the processor.

static int fddev =-1; 
__attribute__ ((constructor)) static void
init (void)
{
        static struct perf_event_attr attr;
        Attr.type = Perf_type_hardware;
        Attr.config = Perf_count_hw_cpu_cycles;
        Fddev = Syscall (__nr_perf_event_open, &attr, 0,-1,-1, 0); 
}

__attribute__ ((destructor)) static void
fini (void)
{close
        (Fddev);
}

Static inline Long long
cpucycles (void)
{
        long long result = 0;
        if (read (Fddev, &result, sizeof (result)) < sizeof [result)] return 0;
        return result;
}

In above sample, Attr.type could is below types. Since This article is talking about processor ' PMU, hardware ' s Perf types, are perf_type_hardware, perf_ty Pe_hw_cache,perf_type_raw.

 * * Attr.type
 /
enum perf_type_id {
        Perf_type_hardware    = 0,
        perf_type_software    = 1,
        Perf_type_tracepoint  = 2,
        perf_type_hw_cache    = 3,
        perf_type_raw         = 4,
        perf_type_ Breakpoint  = 5,

        Perf_type_max,        /* Non-abi
/};

Attr.config could be picked from enum perf_hw_id, combination of perf_hw_cache_id, Perf_hw_ cache_op_id, perf_hw_cache_op_result_id), or raw hardware PMU event ID, like 0x011. Please check the details in include/uapi/linux/perf_event.h in kernel.

But (System call) involves additional latency comparing to access PMU registers directly. Because it needs to switch between the user context and kernel the context. and perf ' s infrastructure is complicated.

There are methods could get PMU events. For example, JTAG tools, like ARM's DS-5 with Dstream could use PM hardware to record cycles per instructions. OProfile provides the Ocount tool for collecting raw event counts on a per-application, per-process, per-cpu, or System-wi De basis.

Based on the work from Austin Seipp, I added ARMV8 support for PMU. My sample code was hosted on Github in Dev branch.

Reference http://neocontra.blogspot.sg/2013/05/user-mode-performance-counters-for.html http://stackoverflow.com/ Questions/30709432/how-to-get-cpu-performance-counter-for-a-piece-of-code http://web.eece.maine.edu/~vweaver/ Projects/perf_events/perf_event_open.html http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/ 299228.html

Https://community.arm.com/groups/embedded/blog/2015/03/08/using-the-arm-performance-monitor-unit-pmu-linux-driver

https://zhiyisun.github.io/2016/03/02/how-to-use-performance-monitor-unit-(PMU)- of-64-bit-armv8-a-in-linux.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.