High-precision timing using CPU timestamps
Http://www.3800hk.com/Article/cxsj/vc/zhlbcvc/2005-08-06/Article_32674.html
For performance-focused program developers, a good timing component is both a mentor and a mentor. Timers can be used as program components to help programmers precisely control program processes, and are also a powerful debugging weapon. experienced programmers can determine program performance bottlenecks as soon as possible, or make a convincing performance comparison for different algorithms.
On Windows, there are two commonly used Timers: timegettime multimedia timer, which provides millisecond-level timer. However, this accuracy is still too rough for many applications. The other is the queryperformancecount counter, which provides a microsecond-level count as the system differs. For real-time graphics processing, multimedia data stream processing, or real-time system construction programmers, using queryperformancecount/queryperformancefrequency is a basic skill.
This article introduces another high-precision timing method that uses the internal timestamp of the Pentium CPU directly. The following discussion mainly benefited from the book Windows graphic programming, page 1-page 17. Interested readers can directly refer to the book. For more information about the rdtsc commands, see the Intel product manual. This article is only used for throwing bricks.
Among Intel Pentium-level CPUs, there is a part called "Time Stamp", which is in the format of a 64-bit unsigned integer, records the number of clock cycles that have elapsed since CPU power-on. Because the current CPU clock speed is very high, this component can achieve the time precision of the nanosecond level. This accuracy is incomparable to the above two methods.
In a CPU above Pentium, a machine command rdtsc (read time stamp counter) is provided to read the timestamp number and save it in The edX: eax register pair. Since the edX: eax register is the register that stores the function return value in the C ++ language on the Win32 platform, we can regard this instruction as a common function call. Like this:
Inline unsigned _ int64 getcyclecount ()
{
_ ASM rdtsc
}
But no, because rdtsc is not directly supported by the C ++ Embedded Assembler, we need to use the _ emit pseudo command to directly embed the machine code form 0x0f, 0x31 of the command, as shown below:
Inline unsigned _ int64 getcyclecount ()
{
_ ASM _ emit 0x0f
_ ASM _ emit 0x31
}
In the future, when a counter is required, you can call the getcyclecount function twice like using a common Win32 API to compare the difference between the two return values, as shown in the following code:
Unsigned long T;
T = (unsigned long) getcyclecount ();
// Do Something time-intensive
T-= (unsigned long) getcyclecount ();
On page 15th of Windows graphic programming, a class is written to encapsulate this counter. Interested readers can refer to the code of that class. For more precise timing, the author makes a small improvement by calculating and saving the time for executing the rdtsc command by calling the getcyclecount function twice in a row, to get more accurate timing numbers. But I personally think this improvement is of little significance. According to the test on my machine, this command took about dozens to 100 cycles. It was only a tenth of microsecond in the time on the celon MHZ machine. For most applications, this time is completely negligible, and for those applications that are indeed accurate to the order of nanoseconds, this compensation is too rough.
The advantages of this method are:
1. High precision. The timing accuracy can be achieved directly in nanoseconds (each clock cycle on a 1 GHz CPU is One nanosecond), which is hard to achieve by other timing methods.
2. low cost. The timegettime function needs to be linked to the multi-media library winmm. the Lib and queryperformance * functions are supported by hardware (although I have not seen any machines that are not supported) and the kernel library according to msdn instructions, therefore, both of them can only be used on the Windows platform (for precise timing on the DOS platform, refer to the graphic program developer Guide, which provides detailed instructions on the control timer 8253 ). However, the rdtsc command is a CPU command, which is supported by any machine above the Pentium on the i386 platform, or even without platform restrictions (I believe that the i386 UNIX and Linux methods are also applicable, but there is no conditional test), and the function call overhead is the smallest.
3. There is a direct rate relationship with the CPU clock speed. One count is equivalent to 1/second (CPU clock speed Hz), so that as long as you know the CPU clock speed, you can directly calculate the time. This is different from queryperformancecount. The latter must use queryperformancefrequency to obtain the number of times the current counter is counted per second before it can be converted to time.
The disadvantage of this method is:
1. Most of the existing C/C ++ compilers do not directly support the use of rdtsc commands. You need to program the code by embedding the machine code directly, which is troublesome.
2. High Data jitter. In fact, accuracy and stability are always a conflict for any measurement method. If low-precision timegettime is used for timing, the results are basically the same each time. The rdtsc command has different results each time, with hundreds or even thousands of gaps. This is a contradiction inherent in this method of high precision.
We can use the following formula to calculate the maximum length of timing in this method:
Number of seconds since CPU power-on = number of cycles read by rdtsc/CPU clock speed (HZ)
The maximum number that a 64-bit unsigned integer can express is 1.8 × 10 ^ 19. On my celon 800, it can be timed around (the book says it can be timed on a MHz Pentium in, I don't know how this number is obtained, but it is different from my calculations ). In any case, we don't have to worry about overflow.
The following is a few small examples, which briefly compares the usage and accuracy of the three timing methods.
// Timer1.cpp Timer class that uses the rdtsc command // ktimer class definition can be found in Windows graphic programming p15
// Compilation line: CL timer1.cpp/link user32.lib
# Include <stdio. h>
# Include "ktimer. H"
Main ()
{
Unsigned T;
Ktimer timer;
Timer. Start ();
Sleep (1000 );
T = timer. Stop ();
Printf ("lasting time: % d \ n", t );
}
// Timer2.cpp uses the timegettime Function
// <Mmsys. h> must be included, but the Windows header file is complex.
// Simple inclusion <windows. h> is relatively lazy :)
// Compilation line: CL timer2.cpp/link winmm. Lib
# Include <windows. h>
# Include <stdio. h>
Main ()
{
DWORD T1, T2;
T1 = timegettime ();
Sleep (1000 );
T2 = timegettime ();
Printf ("begin time: % u \ n", T1 );
Printf ("End Time: % u \ n", T2 );
Printf ("lasting time: % u \ n", (t2-t1 ));
}
// Timer3.cpp uses the queryperformancecounter Function
// Compilation line: CL timer3.cpp/link kernel32.lib
# Include <windows. h>
# Include <stdio. h>
Main ()
{
Large_integer T1, T2, TC;
Queryperformancefrequency (& TC );
Printf ("frequency: % u \ n", TC. quadpart );
Queryperformancecounter (& T1 );
Sleep (1000 );
Queryperformancecounter (& T2 );
Printf ("begin time: % u \ n", t1.quadpart );
Printf ("End Time: % u \ n", t2.quadpart );
Printf ("lasting time: % u \ n", (t2.quadpart-t1.quadpart ));
}
//////////////////////////////////////// ////////
// The above three examples are the time required to test the sleep for 1 second.
File: // test environment: celeon 800 MHz/256 M SDRAM
// Windows 2000 Professional SP2
// Microsoft Visual C ++ 6.0 SP5
//////////////////////////////////////// ////////
The following are the running results of timer1, using the high-precision rdtsc command.
Lasting Time: 804586872
The following is the running result of timer2, using the rough timegettime API
Begin time: 20254254
Endtime: 20255255
Lasting Time: 1001
The following is the running result of timer3, using the queryperformancecount API
Frequency: 3579545
Begin time: 3804729124
Endtime: 3808298836
Lasting Time: 3569712
The ancients said that the class was accessible. I am very happy to get such a useful real-time processing knowledge from this introduction to Graphic programming. I hope everyone will like this light and effective timer like me.
References:
[Yuan 2002] Feng Yuan, Ying Yu studio, Windows graphics programming, Mechanical Industry Press, 2002.4., P15-17