More accurate measurement of program execution time

Last Update:2018-12-07 Source: Internet

Author: User

Tags emit time in milliseconds intel pentium

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Gettickcount () and getcurrenttime () are both accurate to 55 ms (one tick is 55 ms ). To be accurate to milliseconds, use the timegettime function or the queryperformancecounter function. For specific examples, refer to qa001022 "using high-precision timer in VC ++", qa001813 "How to Implement accurate timing in Windows" and qa004842 "timegettime function delay inaccuracy ".

Gettickcount is not accurate enough to implement real millisecond-level functions.

Although the unit returned by gettickcount is 1 ms, its precision is only about 10 ms. (I searched for the gettickcount () function from the Internet. There are two precision functions: 55 ms for a moment, and about 10 ms for a moment. Which one is accurate? Even unclear)

To improve the accuracy, you can use queryperformancecounter and queryperformancefrequenc.Y. These two functions are not supported in every system. For systems that support them, the accuracy is lower than 1 ms. Windows has a very high-precision timer in microseconds, but different systems have different timer frequencies, which may be related to hardware and operating systems. Use the API function queryperformancefrequencY to get the timer frequency. You can use the API function queryperformancecounter to obtain the current value of the timer. Based on the time to be delayed and the timer frequency, you can calculate the number of cycles of the time timer to be delayed. In the loop, queryperformancecounter is used to read the timer value without stopping until the cycle is completed after a specified number of cycles. This achieves the goal of High Precision latency.

The written functions are as follows:

Float time ()
{
Static _ int64 start = 0;
Static _ int64 frequency = 0;

If (START = 0)
{
  Queryperformancecounter (large_integer *) & START );
  QueryperformancefrequencY (large_integer *) & frequency );
  Return 0.0f;
}

_ Int64 counter = 0;
Queryperformancecounter (large_integer *) & Counter );
Return (float) (counter-Start)/double (frequency ));
}

Hope this helps,
Kenshin

The post we see in the forum below does not know the author's name.

Focusing on Performance Program For developers, a good timing component is both a mentor and a mentor. Timers can be used as program components to help programmers precisely control program processes, and are also a powerful debugging weapon. experienced programmers can determine program performance bottlenecks as soon as possible, or for different Algorithm Make a convincing performance comparison.
On Windows, there are two commonly used Timers: timegettime multimedia timer, which provides millisecond-level timer. However, this accuracy is still too rough for many applications. The other is the queryperformancecount counter, which provides a microsecond-level count as the system differs. Use queryperformancecount/queryperformancefrequenc for real-time graphics processing, multimedia data stream processing, or real-time system construction programmers Y is a basic skill.
This article introduces another high-precision timing method that uses the internal timestamp of the Pentium CPU directly. The following discussion mainly benefited from the book Windows graphic programming, page 1-page 17. Interested readers can directly refer to the book. For more information about the rdtsc commands, see the Intel product manual. This article is only used for throwing bricks.
Among Intel Pentium-level CPUs, there is a part called "Time Stamp", which is in the format of a 64-bit unsigned integer, records the number of clock cycles that have elapsed since CPU power-on. Because the current CPU clock speed is very high, this component can achieve the time precision of the nanosecond level. This accuracy is incomparable to the above two methods.
In a CPU above Pentium, a machine command rdtsc (read time stamp counter) is provided to read the timestamp number and save it in The edX: eax register pair. Since the edX: eax register is the register that stores the function return value in the C ++ language on the Win32 platform, we can regard this instruction as a common function call. Like this:
Inline unsigned _ int64 getcyclecount ()
{
_ ASM rdtsc
}
But no, because rdtsc is not directly supported by the C ++ Embedded Assembler, we need to use the _ emit pseudo command to directly embed the machine code form 0x0f, 0x31 of the command, as shown below:
Inline unsigned _ int64 getcyclecount ()
{
_ ASM _ emit 0x0f
_ ASM _ emit 0x31
}
In the future, when a counter is required, you can call the getcyclecount function twice like using a common Win32 API to compare the difference between the two return values, as shown in the following code:
Unsigned long T;
T = (unsigned long) getcyclecount ();
// Do Something time-intensive...
T-= (unsigned long) getcyclecount ();
On page 15th of Windows graphic programming, a class is written to encapsulate this counter. Interested readers can refer to the classCode . For more precise timing, the author makes a small improvement by calculating and saving the time for executing the rdtsc command by calling the getcyclecount function twice in a row, to get more accurate timing numbers. But I personally think this improvement is of little significance. According to the test on my machine, this command took about dozens to 100 cycles. It was only a tenth of microsecond in the time on the celon MHZ machine. For most applications, this time is completely negligible, and for those applications that are indeed accurate to the order of nanoseconds, this compensation is too rough.
The advantages of this method are:
1. High precision. The timing accuracy can be achieved directly in nanoseconds (each clock cycle on a 1 GHz CPU is One nanosecond), which is hard to achieve by other timing methods.
2. low cost. The timegettime function needs to be linked to the multi-media library winmm. the Lib and queryperformance * functions are supported by hardware (although I have not seen any machines that are not supported) and the kernel library according to msdn instructions, therefore, both of them can only be used on the Windows platform (for precise timing on the DOS platform, refer to the graphic program developer Guide, which provides detailed instructions on the control timer 8253 ). However, the rdtsc command is a CPU command, which is supported by any machine above the Pentium on the i386 platform, or even without platform restrictions (I believe that the i386 UNIX and Linux methods are also applicable, but there is no conditional test), and the function call overhead is the smallest.
3. There is a direct rate relationship with the CPU clock speed. One count is equivalent to 1/second (CPU clock speed Hz), so that as long as you know the CPU clock speed, you can directly calculate the time. This is different from queryperformancecount. The latter must use queryperformancefrequenc. Y. The count times of the current counter per second can be converted to time.
The disadvantage of this method is:
1. Most of the existing C/C ++ compilers do not directly support the use of rdtsc commands. You need to program the code by embedding the machine code directly, which is troublesome.
2. High Data jitter. In fact, accuracy and stability are always a conflict for any measurement method. If low-precision timegettime is used for timing, the results are basically the same each time. The rdtsc command has different results each time, with hundreds or even thousands of gaps. This is a contradiction inherent in this method of high precision.
We can use the following formula to calculate the maximum length of timing in this method:
Number of seconds since CPU power-on = number of cycles read by rdtsc/CPU clock speed (HZ)
The maximum number that a 64-bit unsigned integer can express is 1.8 × 10 ^ 19. On my celon 800, it can be timed around (the book says it can be timed on a MHz Pentium in, I don't know how this number is obtained, but it is different from my calculations ). In any case, we don't have to worry about overflow.
The following is a few small examples, which briefly compares the usage and accuracy of the three timing methods.
// Timer1.cpp Timer class that uses the rdtsc command // ktimer class definition can be found in Windows graphic programming p15
// Compilation line: CL timer1.cpp/link user32.lib
# Include <stdio. h>
# Include "ktimer. H"
Main ()
{
Unsigned T;
Ktimer timer;
Timer. Start ();
Sleep (1000 );
T = timer. Stop ();
Printf ("lasting time: % d \ n", t );
}
// Timer2.cpp uses the timegettime Function
// <Mmsys. h> must be included, but the Windows header file is complex.
// Simple inclusion <windows. h> is relatively lazy :)
// Compilation line: CL timer2.cpp/link winmm. Lib
# Include <windows. h>
# Include <stdio. h>
Main ()
{
DWORD T1, T2;
T1 = timegettime ();
Sleep (1000 );
T2 = timegettime ();
Printf ("begin time: % u \ n", T1 );
Printf ("End Time: % u \ n", T2 );
Printf ("lasting time: % u \ n", (t2-t1 ));
}
// Timer3.cpp uses the queryperformancecounter Function
// Compilation line: CL timer3.cpp/link kernel32.lib
# Include <windows. h>
# Include <stdio. h>
Main ()
{
Large_integer T1, T2, TC;
Queryperformancefrequenc Y (& TC );
Printf ("frequency: % u \ n", TC. quadpart );
Queryperformancecounter (& T1 );
Sleep (1000 );
Queryperformancecounter (& T2 );
Printf ("begin time: % u \ n", t1.quadpart );
Printf ("End Time: % u \ n", t2.quadpart );
Printf ("lasting time: % u \ n", (t2.quadpart-t1.quadpart ));

//The time (in seconds) must be calculated.

DoubleDtotaltime=(Double) (t2.quadpart-t1.quadpart)/(Double) TC. quadpart; // Second

Printf ("Time consumed:% F \ n ",Dtotaltime );
}
//////////////////////////////////////// ////////
// The above three examples are the time required to test the sleep for 1 second.
File: // test/test environment: celeon 800 MHz/256 M SDRAM
// Windows 2000 Professional SP2
// Microsoft Visual C ++ 6.0 SP5
//////////////////////////////////////// ////////
The following are the running results of timer1, using the high-precision rdtsc command.
Lasting Time: 804586872
The following is the running result of timer2, using the rough timegettime API
Begin time: 20254254
Endtime: 20255255
Lasting Time: 1001
The following is the running result of timer3, using the queryperformancecount API
Frequency: 3579545
Begin time: 3804729124
Endtime: 3808298836
Lasting Time: 3569712
There is a saying on the Internet:

Double dtotaltime = (double) (t2.quadpart-t1.quadpart)/(double) TC. quadpart

There may be problems. For example, many mainboards now have the automatic CPU frequency adjustment function, mainly for energy saving, especially in the notebook, which cannot guarantee accuracy. I'm not sure whether this statement is accurate for your research.

The above is mainly taken from "high-precision timing with CPU timestamps". In fact, in addition to the three methods mentioned above, there is also a common method, of course, which is not accurate above, that is, using the gettickcount function, this method can obtain the time in milliseconds. The usage is as follows:

DWORDStarttime=Gettickcount ();
//DoSomething
DWORDTotaltime=Gettickcount ()-Starttime;

The sleep () function has different compiler usage. I searched the internet and explained it as follows:

Function Name: Sleep

Function: the execution is suspended for a period of time.

Usage: Unsigned sleep (unsigned seconds );

Use the header file in VC

# Include <windows. h>

In the GCC compiler, the header file used varies with the GCC version.

# Include <unistd. h>

Note:

The first English character in sleep in VC is uppercase "S"

In standard C, sleep is used. Do not use uppercase letters. The following uses uppercase letters to describe how to use sleep. In short, sleep is used for VC and sleep is used for other purposes.

The general form of sleep functions:

Sleep (unisgned long );

The Unit in sleep () is millisecond, so if you want to keep the function for 1 second, it should be sleep (1000 );

In Linux, "S" in sleep is not capitalized.

Sleep () is measured in seconds rather than milliseconds. Example:

# Include <windows. h>

Int main ()

{

Int;

A = 1000;

Sleep ();

Return 0;

}

References:

High-precision timing using CPU timestamps Author: zhangyan_qd

Windows graphic programming, by Feng Yuan

Time in milliseconds in VC, http://www.cppblog.com/humanchao/archive/ 2008/04/22/43322 .html

From: http://www.dakaren.com/index.php/archives/768.htm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More