Golang 1.3 sync. Atomic Source Code Analysis

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

In the previous article we talked about sync. Mutex source code implementation, the core is the use of the CPU instruction CAs, from the concurrency performance of atomic efficiency is higher than the mutex, after all, the mutex did a lot of other steps, and the core of atomic is closely related to the processor, With one or two instructions to complete the atomic operation, let's take a look at some of the details of atomic in Golang. By directory:

64bit_arm.go              asm_amd64p32.s            asm_linux_arm.s           atomic_test.go            race.goasm_386.s                 asm_arm.s                 asm_netbsd_arm.s          doc.goasm_amd64.s               asm_freebsd_arm.s         atomic_linux_arm_test.go  export_linux_arm_test.go

It is found that Golang mainly relies on assembly to implement atomic operations, and different CPU architectures have corresponding different. s assembly files. Let's focus on the implementation under the ASM_AMD64.S X86-64CPU architecture:

Text CompareAndSwapUint64 (SB), nosplit,$0-25//Http://godoc.org/sync/atomic#CompareAndSwapUint64//function prototype is: Func Compa ReAndSwapUint64 (addr *uint64, old, new UInt64) (swapped bool)//write the addr parameter to the BP register, why is +0 (FP), I guess it should be the current call stack bottom + offset to implement the variable save,                                          While the pointer in Golang should be 8 bytes movq addr+0 (FP), BP//writes the old parameter to the AX register, current offset = 8, because UInt64 occupies 8 bytes                                                         Movq old+8 (FP), AX//write the new parameter to the CX register, current offset =16                                               Movq new+16 (FP), CX//Use the lock directive to indicate that a bus lock is required for multicore, or to use the MESI protocol to guarantee atomicity of atomic instructions, I'll focus on the following The LOCK//CMPXCHG R/m,r compares the value in the accumulator Al/ax/eax/rax with the first operand (the purpose operand), and if equal, the value of the//2nd operand (source operand) is loaded to the first operand, ZF 1.                                                                  If not, the value of the first operand is loaded into the Al/ax/eax/rax and ZF 0//We see the definition of CMPXCHG, then it is clear that AX and CX are used for comparison                         Cmpxchgq CX, 0 (BP)///Finally we use the Sete command to determine whether the ZF bit is equal to one to determine the success of the operation, and to write the value to the return value swapped, the current offset =24                                 Seteq swapped+24 (FP) RET  

The entire assembly is over, and the functions in the other atomic should be similar and not analyzed here.

Bus Locking & MESI

Let's take a look at the Mesi protocol and what is going on with the bus lock. I mainly use the "Intel IA-32 architectures software Developer ' s Manual" document as a reference, the 8th chapter of Multicore Management and chapter 11th of memory cache management.

8.1.2 Bus Locking Intel and IA-32 processors provide a lock# signal that's asserted automatically during certain criti Cal memory operations to lock the system bus or equivalent link. While this output signal was asserted, requests from other processors or bus agents for control of the bus was blocked. Software can specify other occasions when the LOCK semantics is to being followed by prepending the lock prefix to an Instru Ction.

8.1.2 mentions that bus locks provide lock# signals in the Intel 64 and IA-32 processors to lock the bus, requests from other processors or bus control requests from the bus agent all block, and I was wondering if this was the whole system hang up (albeit for a moment , a single instruction is fast enough), always feel not elegant enough, then I am surprised to find that Intel actually provides a better way.

For the P6 and more recent processor families, if the memory area being accessed are cached internally in the processor, th e lock# signal is generally not asserted; Instead, locking is only applied to the processor's caches (see section 8.1.4, "Effects of a LOCK operation on Internal Pr Ocessor Caches ").

P6 above the CPU family, if the memory area accessed has been cached by the CPU cache such as L1/2/3, then the lock# signal does not trigger the assertion (that is, the bus is not locked), instead of using the CPU cache protocol consistency to ensure the atomicity of the instructions. Let's look at Chapter 11.2, which mentions some cache-related terms:

Cache line fill: When the CPU considers the bytes read from memory to be cached, the processor reads the memory fill cache line to write to from L1/2/3;

Cache hit: When the data in the memory address is still cached, the processor can fetch the data directly from the L1/2/3 rather than read it from memory;

Write hit: When the processor tries to write the cache of the memory data, first detects if the memory address exists in the cache line, and if it does not exist, the processor (according to the policy that has been in effect) writes the cache L1/2/3 instead of memory directly;

Modified State (Modified): This cache line has been modified (dirty lines), content is different from main memory and for this cache line is proprietary;

Exclusive State (Exclusive): This cache line content is the same as main memory, but does not appear in other caches;

Shared state: This cache line content is the same as main memory, but also appears in other caches;

Invalid state (Invalid): This cache line content is invalid (blank line);

Because of the existence of the L1/2/3CPU cache, we can think of some of the features and states of the cache to ensure the completion of atomic operations? The answer is yes:

In multicore systems, IA-32CPU (i486) and Intel 64 processors have the ability to sniff other processors to access the system's memory to their own processor cache, using this feature to achieve caching consistency, such as in the Pentium and P6 family of processors, Sniff another processor to discover that a processor is trying to write a memory address (it is in a shared state) then the sniffer immediately invalidates its own cache line and forces a cache line fill operation the next time the memory address is accessed.

When a processor detects that another processor is trying to access the memory address cache in the Modifyed state (not yet written back to the address of the Write-back to the system memory) by sniffing, the sniffer processor sends a hitm# signal to the other processor to tell the cache Line is in a modified state and is about to trigger an implicit write-back operation.

The implicit writeback operation means that the memory controller is transferred directly to the initial request processor and sniffer to ensure that the system memory has been updated. Here, the processor that carries the valid data is routed directly to the other processor rather than to the system memory, because he hosts the memory controller with the responsibility to write the system memory.

11.4 The following section describes the cache control protocol currently defined for the Intel and IA-32 architectures . In the L1 data cache and in the L2/l3 unified caches, the MESI (modified, exclusive, GKFX, invalid) cache protocol Maint Ains consistency with caches of other processors. The L1 data cache and the L2/L3 unified caches have both MESI status flags per cache line. Each line can is marked as being in one of the states defined in Table 11-4. In general, the operation of the MESI protocol are transparent to programs.

The Cache Management Protocol Mesi (modified, exclusive, shared, invalid) is mentioned in section 11.4 to achieve cache consistency between different CPUs. Each CPU cache line has a corresponding two Mesi status flag bit, let's look at the following table:

Cache Line Status M (modifed) E (Exclusive) S (Shared) I (Invalid)
is the cache line invalid? Is Is Is Whether
Memory Copy (L1/2/3) Expired Effective Effective -
Whether the replica exists in other cores Whether Whether possible possible
Write operation to Cache line Does not go through the system bus Does not go through the system bus Cause the kernel exclusive cache line Direct use of the system bus

The whole and sync.atomic related CPU knowledge and source code introduced here:)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.