In the multi-core era, x86 rdtsc commands should not be used to test the command cycle and time

Source: Internet
Author: User
Tags intel pentium

In the multi-core era, x86 rdtsc commands should not be used to test the command cycle and time

 

Chen Shuo
Blog.csdn.net/solstice

 

Since Intel Pentium added the rdtsc command, this command is a powerful tool for micro-Benchmarking. It can get high-precision time stamp counter at a very small cost ), many articles about optimization [1] and books use it to compare the speed of two sections of code. Some code even uses the rdtsc command to timing, replacing system calls such as gettimeofday. In the multi-core era, the accuracy of rdtsc commands has been greatly reduced for three reasons:

 

  1. The TSC of each core on the same motherboard cannot be synchronized;
  2. The clock frequency of the CPU may change, such as the energy-saving function of the laptop;
  3. Execution in disordered order leads to inaccurate cycle numbers measured by rdtsc. This problem exists in the Pentium Pro era.

 

These affect the two major uses of rdtsc, micro-benchmarking and timing.

 

Generally, rdtsc is executed twice and two 64-bit integers start and end are written down. End-start represents the number of CPU clock cycles during this period.

 

In a multi-core environment, these two operations may occur on two CPUs, and the initial values of the two CPU counters are not necessarily the same (because the correct time for power-on Reset is different ), (For more information, see [3]). As a result, the result of micro-benchmarking contains this error, which can be positive or negative, it depends on whether the clock Counter of the CPU is advanced or backward.

 

In addition, for timing purposes, the time = the number of cycles/frequency may change because of the frequency (for example, the CPU of my notebook usually runs at 800 MHz at half speed, if the system runs at full speed at 1.6 GHz during busy hours, the measured time is inaccurate. The rdtsc count frequency of some new CPUs is constant, so the clock is accurate, which will lead to inaccurate results of micro-benchmarking, see [2]. Another possibility is to recover after power loss (such as sleep), so TSC will be cleared. In short, timing with rdtsc is ineffective.

 

The issue of disordered execution is relatively simple [1], but it has far-reaching significance: in the complex architecture of modern CPUs, it is meaningless to measure the time consumption of several or dozens of commands, because the observation itself will interfere with CPU execution (cache, pipeline, multi-emission, out-of-order, and speculation), it sounds a bit like a quantum mechanical system. Either we use more macro indicators to mark performance, and replace "XXX clock cycles" with "processing YYY messages per second" or "processing delay of messages is ZZZ millisecond "; you can either use a dedicated profiler to reduce the impact on the observed results (whether it's a virtual CPU like callgrind or an oprofile Sampler ).

 

Although rdtsc is useless, there is still a way to use high-precision timing for Performance Testing [2]. In Windows, queryperformancecounter and queryperformancefrequency are used. In Linux, the POSIX clock_gettime function is used to call it with clock_monotonic parameters. Or follow the [3] Method in the document to synchronize TSC before using it. (I don't know if the latest official Linux kernel is built with this synchronization algorithm. It is not clear whether the "Clock" of the two CPUs after calibration will fail again .)

 

[1] http://www.ccsl.carleton.ca /~ Jamuir/rdtscpm1.pdf
[2] http://en.wikipedia.org/wiki/Time_Stamp_Counter

[3] x86: unify/Rewrite smp tsc sync code http://lwn.net/Articles/211051/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.