Measure the running time

Source: Internet
Author: User
Tags emit
I wrote an article a long time ago. Now I am moving it and moving it over. Although it is not comprehensive enough, It is enough in the general sense.

Measure the running time

Some time ago I was working on a program performance optimization project. In order to test the degree of optimization, I roughly learned some techniques for measuring program running time, www. amazon. co. UK/computer-systems-programmers-Randal-Bryant/DP/013034074x there is almost no reference in this regard. The following are some of my materials, which are recorded here, for future reference.

1: Computer System: a programmer's perspective
2: software optimization for high-performance computing: Creating faster applications 3: ia32 intel architecture software developer's manual, Volume 3: system programming guide we all know, it is impossible to accurately measure the exact time of a program running. The so-called measurement Runtime is just an approximate measurement. All the methods I have summarized are based on ia32, Win32, and Unix/Linux platforms.

Currently, there are two main ways to measure the running time: one is based on the timer and the other is based on the counter.

1. Timer-based measurement method.

Disadvantage: The accuracy is not high enough. It cannot be used for measurement with the program running duration less than Ms.
Advantage: accuracy is not very dependent on system load, and the error between it and the theoretical value is very low when the execution time is greater than 1 s.
Method: Read the timer content at the beginning of the program, and read the timer content again before the program ends. The main interface functions are:
Unix/Linux:
Clock_t times (struct TMS * BUF );
// Return value: the number of time-drops that have elapsed since the system was started. The constant clk_tck indicates the number of time-drops that have elapsed every second. // parameter: a pointer to the TMS structure.
// When using this function, the header file <sys/times. h> Win32:
DWORD gettickcount (void)
// Return value: the number of milliseconds that have elapsed since the system was started.
// <Windows. h> should be included during use. The link stage should be linked to kernel32.lib. To write the code that can be transplanted to the platform, you can use the following function:
Clock_t clock (void)
// The constant clocks_per_sec ensures that the value returned by the function is formatted as the number of seconds.
// When using this function, you must include the header file <time. h> 2. Counter-based measurement method. Disadvantage: it can only be read in assembly language and cannot guarantee universality. When the system load is very high, it will greatly affect the accuracy.
Advantage: high accuracy, and because the number of clock cycles passed during program execution is obtained, the execution time of the Program on different hardware platforms can be roughly estimated.
Method: In the ia32 architecture, the CPU has a 64-bit unsigned counter called "timestamp, the number of clock cycles that have elapsed since CPU power-on. I. A queryperformancecouter function in Win32 reads a 64-bit counter. ii. Currently, compiler does not support rdtsc commands. In this compiler, you can use the _ emit command to bypass compiler execution and add it to the file header: # define cpuid _ ASM _ emit 0fh _ ASM _ emit 0a2h
# Define rdtsc _ ASM _ emit 0fh _ ASM _ emit 031 H
Microsoft's C/C ++ compiler supports cpuid and rdtsc commands starting from version 6.0. Therefore, assembly code can be embedded directly in the program. The following is a simple example: # include <stdio. h>
Int main ()
{
Unsigned int cycle, I;
_ ASM
{
Cpuid
Rdtsc
MoV cycle, eax
}
For (I = 0; I <10000; I ++)
;
_ ASM
{
Cpuid
Rdtsc
Sub eax, cycle
MoV cycle, eax
}
Printf ("the program duration cycle = % d/N", cycle );
Return 0;
}
The counter-based measurement method is affected by many factors, mainly the impact of context switch and Instruction Cache. Therefore, the effects of these two factors must be eliminated in high-precision timing, the context switch mainly calculates the average value multiple times on a low-load machine, while the Instruction Cache usually loads the instruction that needs to be tested in advance, and then executes the measurement method. for more information, see
Http://www.cs.usfca.edu /~ Rdtscpm1-1.pdf/cruse/cs210/

Computer System: a programmer's Perspective (chapter 7)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.