Windows uses the CPU timestamp for high-precision timing

Source: Internet
Author: User
Tags intel pentium

For performance-focused program developers, a good timing component is both a mentor and a mentor. Timers can be used as program components to help programmers precisely control program processes, and are also a powerful debugging weapon. experienced programmers can determine program performance bottlenecks as soon as possible, or make a convincing performance comparison for different algorithms.

On Windows, there are two commonly used Timers: timegettime multimedia timer, which provides millisecond-level timer. However, this accuracy is still too rough for many applications. The other is the queryperformancecount counter, which provides a microsecond-level count as the system differs. For real-time graphics processing, multimedia data stream processing, or real-time system construction programmers, using queryperformancecount/queryperformancefrequency is a basic skill. This article introduces another high-precision timing method that uses the internal timestamp of the Pentium CPU directly. The following discussion mainly benefited from the book Windows graphic programming, page 1-page 17. Interested readers can directly refer to the book. For more information about the rdtsc commands, see the Intel product manual. This article is only used for throwing bricks. Among Intel Pentium-level CPUs, there is a part called "Time Stamp", which is in the format of a 64-bit unsigned integer, records the number of clock cycles that have elapsed since CPU power-on. Because the current CPU clock speed is very high, this component can achieve the time precision of the nanosecond level. This accuracy is incomparable to the above two methods. In a CPU above Pentium, a machine command rdtsc (read time stamp counter) is provided to read the timestamp number and save it in The edX: eax register pair. Since the edX: eax register is the register that stores the function return value in the C ++ language on the Win32 platform, we can regard this instruction as a common function call. As shown in the following figure, inline unsigned _ int64 getcyclecount () {_ ASM rdtsc} does not work, because rdtsc is not directly supported by C ++ Embedded Assembler, therefore, we need to use the _ emit pseudo command to directly embed the machine code form 0x0f, 0x31 of the command, as shown below: inline unsigned _ int64 getcyclecount () {_ ASM _ emit 0x0f _ ASM _ emit 0x31} can call the getcyclecount function twice, just like using a common Win32 API, compare the difference between the two return values, such as: Unsigned long T; t = (unsigned long) getcyclecount (); // Do Something time-intensive... t-= (unsigned long) getcyclecount (); On page 15th of indows Graphic programming, a class is written to encapsulate this counter. Interested readers can refer to the code of that class. For more precise timing, the author makes a small improvement by calculating and saving the time for executing the rdtsc command by calling the getcyclecount function twice in a row, to get more accurate timing numbers. But I personally think this improvement is of little significance. According to the test on my machine, this command took about dozens to 100 cycles. It was only a tenth of microsecond in the time on the celon MHZ machine. For most applications, this time is completely negligible, and for those applications that are indeed accurate to the order of nanoseconds, this compensation is too rough. The advantage of this method is: 1. High precision. The timing accuracy can be achieved directly in nanoseconds (each clock cycle on a 1 GHz CPU is One nanosecond), which is hard to achieve by other timing methods. 2. low cost. The timegettime function needs to be linked to the multi-media library winmm. the Lib and queryperformance * functions are supported by hardware (although I have not seen any machines that are not supported) and the kernel library according to msdn instructions, therefore, both of them can only be used on the Windows platform (for precise timing on the DOS platform, refer to the graphic program developer Guide, which provides detailed instructions on the control timer 8253 ). However, the rdtsc command is a CPU command, which is supported by any machine above the Pentium on the i386 platform, or even without platform restrictions (I believe that the i386 UNIX and Linux methods are also applicable, but there is no conditional test), and the function call overhead is the smallest. 3. There is a direct rate relationship with the CPU clock speed. One count is equivalent to 1/second (CPU clock speed Hz), so that as long as you know the CPU clock speed, you can directly calculate the time. This is different from queryperformancecount. The latter must use queryperformancefrequency to obtain the number of times the current counter is counted per second before it can be converted to time. The disadvantages of this method are as follows: 1. Most of the existing C/C ++ compilers do not directly support the use of rdtsc commands, and they need to be programmed by embedding machine code directly. 2. High Data jitter. In fact, accuracy and stability are always a conflict for any measurement method. If low-precision timegettime is used for timing, the results are basically the same each time. The rdtsc command has different results each time, with hundreds or even thousands of gaps. This is a contradiction inherent in this method of high precision. We can use the following formula to calculate the maximum length of the Timer: the number of seconds since the CPU is powered on = the number of cycles read by rdtsc/CPU clock speed (HZ) the maximum number that a 64-bit unsigned integer can express is 1.8 × 10 ^ 19. On my celon 800, it can be timed around (the book says it can be timed on a MHz Pentium in, I don't know how this number is obtained, but it is different from my calculations ). In any case, we don't have to worry about overflow. Below are a few small examples, the usage and accuracy of the three timing methods are briefly compared. // timer1.cpp uses the Timer class of the rdtsc command. // ktimer class definition can be found in Windows graphic programming P15 // compilation line: CL timer1.cpp/link user32.lib # include <stdio. h> # include "ktimer. H "Main () {unsigned t; ktimer timer; timer. start (); sleep (1, 1000); t = timer. stop (); printf ("lasting time: % d \ n", T);} // timer2.cpp uses the timegettime function // must contain <mmsys. h>, but the Windows header file is intricate. // simple inclusion <windows. h> relatively lazy :) // compilation line: CL timer2.cpp/link win Mm. lib # include <windows. h> # include <stdio. h> main () {DWORD T1, T2; T1 = timegettime (); sleep (1000); t2 = timegettime (); printf ("begin time: % u \ n ", t1); printf ("End Time: % u \ n", T2); printf ("lasting time: % u \ n", (t2-t1 ));} // timer3.cpp uses the queryperformancecounter function // The compiled row Cl timer3.cpp/link kernel32.lib # include <windows. h> # include <stdio. h> main () {large_integer T1, T2, TC; queryperformancefre Quency (& TC); printf ("frequency: % u \ n", TC. quadpart); queryperformancecounter (& T1); sleep (1000); queryperformancecounter (& T2); printf ("begin time: % u \ n", t1.quadpart ); printf ("End Time: % u \ n", t2.quadpart); printf ("lasting time: % u \ n", (t2.quadpart-t1.quadpart ));} //////////////////////////////////////// //////// the above three examples are the time-consuming file used to test the sleep for one second: // test environment: celon 800 MHz/256 m sdram // Windows 2000 pro Fes1_sp2 // Microsoft Visual C ++ 6.0 SP5 ///////////////////////////// ///////////////// The following is the result of timer1, the high-precision rdtsc command lasting time: 804586872 is the running result of timer2, and the rough timegettime API begin time: 20254254 End Time: 20255255 lasting time: below 1001 are the running results of timer3, using the queryperformancecount API frequency: 3579545 begin time: 3804729124 End Time: 3808298836 lasting time: 3569712 the ancients said that the analogy is passed. I am very happy to get such a useful real-time processing knowledge from this introduction to Graphic programming. I hope everyone will like this light and effective timer like me. Original post: http://down.dns.sh.cn/article/236/417/2008/2008090187669.asp

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.