I wrote an article a long time ago. Now I am moving it and moving it over. Although it is not comprehensive enough, It is enough in the general sense.
Measure the running time
Some time ago I was working on a program performance optimization project. In order to test the degree of optimization, I roughly learned some techniques for measuring program running time, www. amazon. co. UK/computer-systems-programmers-Randal-Bryant/DP/013034074x there is almost no reference in this regard. The following are some of my materials, which are recorded here, for future reference.
1: Computer System: a programmer's perspective
2: software optimization for high-performance computing: Creating faster applications 3: ia32 intel architecture software developer's manual, Volume 3: system programming guide we all know, it is impossible to accurately measure the exact time of a program running. The so-called measurement Runtime is just an approximate measurement. All the methods I have summarized are based on ia32, Win32, and Unix/Linux platforms.
Currently, there are two main ways to measure the running time: one is based on the timer and the other is based on the counter.
1. Timer-based measurement method.
Disadvantage: The accuracy is not high enough. It cannot be used for measurement with the program running duration less than Ms.
Advantage: accuracy is not very dependent on system load, and the error between it and the theoretical value is very low when the execution time is greater than 1 s.
Method: Read the timer content at the beginning of the program, and read the timer content again before the program ends. The main interface functions are:
Unix/Linux:
Clock_t times (struct TMS * BUF );
// Return value: the number of time-drops that have elapsed since the system was started. The constant clk_tck indicates the number of time-drops that have elapsed every second. // parameter: a pointer to the TMS structure.
// When using this function, the header file <sys/times. h> Win32:
DWORD gettickcount (void)
// Return value: the number of milliseconds that have elapsed since the system was started.
// <Windows. h> should be included during use. The link stage should be linked to kernel32.lib. To write the code that can be transplanted to the platform, you can use the following function:
Clock_t clock (void)
// The constant clocks_per_sec ensures that the value returned by the function is formatted as the number of seconds.
// When using this function, you must include the header file <time. h> 2. Counter-based measurement method. Disadvantage: it can only be read in assembly language and cannot guarantee universality. When the system load is very high, it will greatly affect the accuracy.
Advantage: high accuracy, and because the number of clock cycles passed during program execution is obtained, the execution time of the Program on different hardware platforms can be roughly estimated.
Method: In the ia32 architecture, the CPU has a 64-bit unsigned counter called "timestamp, the number of clock cycles that have elapsed since CPU power-on. I. A queryperformancecouter function in Win32 reads a 64-bit counter. ii. Currently, compiler does not support rdtsc commands. In this compiler, you can use the _ emit command to bypass compiler execution and add it to the file header: # define cpuid _ ASM _ emit 0fh _ ASM _ emit 0a2h
# Define rdtsc _ ASM _ emit 0fh _ ASM _ emit 031 H
Microsoft's C/C ++ compiler supports cpuid and rdtsc commands starting from version 6.0. Therefore, assembly code can be embedded directly in the program. The following is a simple example: # include <stdio. h>
Int main ()
{
Unsigned int cycle, I;
_ ASM
{
Cpuid
Rdtsc
MoV cycle, eax
}
For (I = 0; I <10000; I ++)
;
_ ASM
{
Cpuid
Rdtsc
Sub eax, cycle
MoV cycle, eax
}
Printf ("the program duration cycle = % d/N", cycle );
Return 0;
}
The counter-based measurement method is affected by many factors, mainly the impact of context switch and Instruction Cache. Therefore, the effects of these two factors must be eliminated in high-precision timing, the context switch mainly calculates the average value multiple times on a low-load machine, while the Instruction Cache usually loads the instruction that needs to be tested in advance, and then executes the measurement method. for more information, see
Http://www.cs.usfca.edu /~ Rdtscpm1-1.pdf/cruse/cs210/
Computer System: a programmer's Perspective (chapter 7)