Does the REP;NOP command execute multiple NOP or 1 NOP?
Originally, adding the rep prefix is the rep instruction until the ECX value is 0. In the kernel code, as in the Spin_lock implementation, you will see REP;NOP such a statement, it is easy to think of the execution of multiple NOP. But in fact it is not so. Look at the demo program below:
#include <stdio.h>
#define NOPS (Times) __asm__ __volatile__ ("Rep;nop": "=c" (Result): "C" (Times))
#define MOVSTR (src,des) __asm__ __volatile__ ("cld\n\t" \
"REP;MOVSB" \
: "=c" (result) \
: "S" (SRC), "D" (des), "C" (Times))
int main ()
{
unsigned int times, result;
Char src[5] = {' A ', ' B ', ' C ', ' d ', ' e '};
Char des[5];
int i;
times = 5;
result = 5;
Movstr (Src,des);
printf ("result =%d\n", result);
for (i = 0; I < times; i++)
printf ("%c", Des[i]);
printf ("\ n");
times = 5;
Nops (times);
printf ("result =%d\n", result);
return (0);
}
Run the output:
[Beyes@slinux c]$./rep
result = 0
A b c d E
result = 5
In the above program, the MOVSTR () macro is used to demonstrate the use of the generic rep prefix, where 5 characters from the array src are copied to the Des array with an inline assembly. At first, the value of the result variable is 5. But after Movstr (), it becomes 0, and the value is passed through the ECX register, because the rep prefix is used and the ECX value is reduced to 0 at the end of the copy. Next, we execute the Nops () macro. NOPS macro is used to test Rep;nop, see Rep;nop is not going to perform 5 times, if so, then result will turn into 0, but the final outcome is not, but 5. This shows that REP;NOP is not equivalent to the implementation of the 5 NOP. So what is REP;NOP? As can be seen through the disassembly program, REP;NOP is translated into pause instructions, and both scripts are F3 90. So what does the pause command do? This is explained in the Intel manual:
Pause-spin Loop Hint
Description
Improves the performance of spin-wait loops. When executing a "spin-wait loop," a Pentium 4
Processor suffers a severe performance penalty when exiting the loop because it detects a
Possible memory order violation. The PAUSE instruction provides a hint to the processor that
The code sequence is a spin-wait loop. The processor uses this hint to bypass the memory order
Violation in most situations, which greatly improves processor. For this reason, it
is recommended that a PAUSE instruction being placed in all spin-wait loops.
Enhance the performance of the spin-wait loops (spin lock loop wait). When a spin-wait loop is executed, the PENTIUM4 processor
Experienced serious performance loss. The PAUSE directive gives the processor a hint that tells the processor that the sequence of code executed is a spin-wait loop.
The processor avoids the memory sequence conflict (memory order violation) based on this hint, which means that the spin-wait loop does not cache and does not make instructions
Reorder and other actions. This can greatly improve the performance of the processor. It is for this reason that the Pasuse directive is recommended for use in spin-wait loops.
An additional function of the PAUSE instruction be to reduce the power consumed by a Pentium
4 processor While executing a spin loop. The Pentium 4 processor can execute a spin-wait loop
Extremely quickly, causing the processor to consume a lot of power while it waits for the
Resource it is spinning in to become available. Inserting a pause instruction in a spin-wait loop
Greatly reduces the processor ' s power consumption.
Another function of the pause directive is to allow the PENTIUM4 processor to reduce power consumption while performing the spin-wait loop.
When you wait for a resource to perform a spin lock, the PENTIUM4 processor consumes a lot of power when it performs a spin wait at extremely fast speeds,
However, using the pause directive can greatly reduce the power consumption of the processor.
This instruction being introduced in the Pentium 4 processors, but are backward compatible with
All IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP
instruction.
The PAUSE directive is introduced in the PENTIUM4 processor, but it is also forward compatible. In an earlier IA-32 processor, the PAUSE instruction was actually equivalent to the NOP instruction.
The Pentium 4 processor implements the PAUSE instruction as a pre-defined delay. The delay
is finite and can are zero for some processors. This instruction does isn't change the architectural
State of the processor (which is, it performs essentially a delaying no-op operation).
The PENTIUM4 processor implements PAUSE directives with a pre-delay (pre-defined delay) technique. There is a limit to this delay, and there are 0 delays on some processors. The directive does not change the state of the processor's processor.
The Rep_nop () function in the kernel wraps the REP;NOP instruction:
static inline void Rep_nop (void)
{
__asm__ __volatile__ ("Rep;nop"::: "Memory");
}
There are also examples in the kernel using the Rep_nop () function:
static void Delay_tsc (unsigned long loops)
{
unsigned long bclock, now;
Preempt_disable ();
RDTSCL (Bclock);
do {
Rep_nop ();
RDTSCL (now);
while ((Now-bclock) < loops);
Preempt_enable ();
}
The above function, by constantly reading the TSC value to compare whether the required loops have been reached to delay, in the process of delaying the continuous execution of the REP_NOP () function to pause.
Http://www.groad.net/bbs/read.php?tid-3373-ds-1-toread-1.html