Optimize barrier and memory barrier

Source: Internet
Author: User

From: http://blog.chinaunix.net/u3/93713/showart_2061476.html

 

Optimize barrier and memory barrier
Optimization barrier

When the compiler compiles the source code, the source code is optimized, and the commands of the source code are reordered to be suitable for parallel execution of the CPU. However, kernel synchronization must avoid re-sorting of commands and optimize barrier to avoid the re-sorting optimization operation of the compiler, ensure that the commands before the optimization barrier are compiled are not executed after the optimization barrier.
Linux uses macro barrier to implement the optimization barrier. The GCC compiler's optimization barrier is defined as follows (in include/Linux/compiler-gcc.h ):
# Define barrier () _ ASM _ volatile _ ("": "Memory ")

In the preceding definition, "_ ASM _" indicates that an assembly language program is inserted, and "_ volatile _" indicates that the compiler is blocked from optimizing the value, make sure that the variable uses a user-defined precise address instead of some aliases containing the same information. "Memory" indicates that the instruction modifies the memory unit.
Memory barrier

The software can enforce the memory access order through the read/write barrier. The read/write barrier is like a wall. All memory accesses initiated before the read/write barrier is set must be completed before the memory access initiated after the barrier is set. Ensure that the memory access is completed in the program order.
Read/write barrier is implemented through the special instruction mfence (memory barrier), lfence (read barrier) and sfence (write barrier) of the processor framework. For details, see chapter 1 of x86-64 framework specifications. In addition, in the x86-64 processor, the assembly language command for hardware operations is "serial", but also has the role of memory barrier, such: all commands that operate on the I/O port, commands with the lock prefix, and all commands that write control registers, system registers, or debugging registers (such as CLI and STI ).
The memory barrier API function provided by the Linux kernel is described in table 2. The memory barrier can be used in both a multi-processor and a single-processor system. If it is used only for a multi-processor system, the smp_xxx function is used. On a single-processor system, nothing is needed.
Table 2 memory barrier API function description memory barrier macro definition function description
MB () is suitable for the memory barrier of multiple processors and single processors.
RMB () is suitable for reading memory barriers between multiple processors and single processors.
WMB () is applicable to the write memory barrier of multiple processors and single processors.
Smp_mb () is suitable for multi-processor memory barrier.
Smp_ RMB () is suitable for the READ memory barrier of multiple processors.
Smp_wmb () is applicable to the write memory barrier of multiple processors.

The memory barrier macro definitions for both microprocessor and single processor are listed below (in include/asm-x86/system. h ):
# Ifdef config_x86_32
/* The command "lock; addl $ (% ESP)" indicates locking and adding 0 to the memory unit at the top of the stack. This command is meaningless, however, these commands act as a memory barrier so that the preceding commands can be executed successfully. If the CPU with xmm2 features an existing memory barrier command, you can directly use this command */
# Define MB () Alternative ("lock; addl $0, 0 (% ESP)", "mfence", x86_feature_xmm2)
# Define RMB () Alternative ("lock; addl $0, 0 (% ESP)", "lfence", x86_feature_xmm2)
# Define WMB () Alternative ("lock; addl $0, 0 (% ESP)", "sfence", x86_feature_xmm)
# Else
# Define MB () ASM volatile ("mfence": "Memory ")
# Define RMB () ASM volatile ("lfence": "Memory ")
# Define WMB () ASM volatile ("sfence": "Memory ")
# Endif

/* Refresh all pending read operations on which the subsequent read depends, not required on the x86-64 architecture */
# Define read_barrier_depends () do {} while (0)

Macro defines ead_barrier_depends () to refresh all pending read operations on which the subsequent read depends on the data returned by the read operation being processed. This macro is not needed on the x86-64 architecture. It indicates that before this barrier, there was no re-sorting of reads from the memory region data. All read operations process this primitive and ensure that the memory is accessed before any read Operations follow this primitive (but no other CPU cache is required ). This primitive is lighter than RMB () on most CPUs.
The local CPU and compiler follow the sorting restrictions of the memory barrier. Only the memory barrier primitive ensures the sorting. Even if the data is dependent, the sorting cannot be ensured. For example, the following code forces sorting because * q's read operations depend on P's read operations, and the two read operations are separated by read_barrier_depends. The Program Statements executed on CPU 0 and CPU 1 are listed as follows:

CPU 0 CPU 1
B = 2;
Memory_barrier ();
P = & B; q = P;
Read_barrier_depends ();
D = * q;

The following code does not force sort because there is no dependency between the read operations of A and B. Therefore, on some CPUs, such as alpha and Y, it is set to 3, set X to 0. For read operations without data dependency, RMB () should be used for sorting ().

CPU 0 CPU 1

A = 2;
Memory_barrier ();
B = 3; y = B;
Read_barrier_depends ();
X =;

The memory barrier macro definitions suitable for multi-processor are listed below (in include/asm-x86/system. h ):
# Ifdef config_smp
# Define smp_mb () MB ()
# Ifdef config_x86_ppro_fence
# Define smp_ RMB () RMB ()
# Else
# Define smp_ RMB () barrier ()
# Endif
# Ifdef config_x86_oostore
# Define smp_wmb () WMB ()
# Else
# Define smp_wmb () barrier ()
# Endif
# Define smp_read_barrier_depends () read_barrier_depends ()
# Define set_mb (VAR, value) do {(void) xchg (& var, value) ;}while (0)
# Else
# Define smp_mb () barrier ()
# Define smp_ RMB () barrier ()
# Define smp_wmb () barrier ()
# Define smp_read_barrier_depends () do {} while (0)
# Define set_mb (VAR, value) do {Var = value; barrier ();} while (0)
# Endif

The rdtsc_barrier function is used to add a memory barrier to prevent rdtsc guesses. When a defined code area uses the read time-stamp counter (rdtsc) function (or the get_cycles or vread function) you must add a memory barrier to prevent rdtsc from making guesses. It is listed as follows:

Static inline void rdtsc_barrier (void)
{
Alternative (asm_nop3, "mfence", x86_feature_mfence_rdtsc );
Alternative (asm_nop3, "lfence", x86_feature_lfence_rdtsc );

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.