Go compilation learning 2. Deconstruct AMD64 bytes. Equal

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

The previous article learned the basic knowledge of memory structure, this article will learn the symbols (symbol), the meaning of the statement. I personally like to learn by example, so I'm going to bytes from Src/runtime/asm_amd64.s. Equal start it:)

corresponding Code

TEXT bytes·Equal(SB),NOSPLIT,$0-49MOVQa_len+8(FP), BXMOVQb_len+32(FP), CXCMPQBX, CXJNEeqretMOVQa+0(FP), SIMOVQb+24(FP), DILEAQret+48(FP), AXJMPruntime·memeqbody(SB)eqret:MOVB$0, ret+48(FP)RET

Pre-knowledge

SB (static base) related knowledge

The following is an introduction to go ASM

The SB Pseudo-register can be thought of as the origin of memory, so the symbol foo (SB) is the name Foo as a address in M Emory. This form is used to name global functions and data. Adding <>to the name, as in Foo<> (SB), makes the name "visible only" in the "current source" file, like a Top-leve l Static declaration in a C file. Adding an offset to the name refers to this offset from the symbol's address, so Foo+4 (SB) was four bytes past the start of Foo.

A general translation, such as the Foo (SB) symbol, corresponds to the address in the code segement, which is globally visible. When the <> symbol is added, it becomes visible to the current file, similar to the static declaration of the C file, and additional addresses can be accessed by adding an offset (offset).

Instruction format

The text command in the example defines a name called bytes The equal symbol (note is the midpoint number ·), followed by the corresponding instruction (can be understood as a function body), and finally RET is the return instruction (exit the current stack). Typically, the parameter size is followed by a minus (-) split with the size of the stack frame. $0-49 means that this is a 0-byte stack with 49-byte long parameters. Nosplit Note that the scheduler is not allowed to resize the stack frame, which means that the stack frame size must be manually specified. But why is it 49 bytes?

Because we can look at bytes. Definition of equal

func Equal(a, b []byte) bool

A, B is []byte (indefinite byte slice), while the structure of slice is:

type slice struct {    array unsafe.Pointer    len   int    lcap  int}

Unsafe. Pointer on the AMD64 is uintptr, namely UInt64. int is int64 behind the AMD64. So a slice occupies 3 qword (word=2byte qual=4 i.e. 2x4=8byte 8x8=64bit), 3x8 = 24byte, then two slice as parameters, plus a bool byte, so the call The stack frame should have 24x2 ([]byte) + 1 (bool) = 49byte. It is also defined as 0 because it does not require a local variable.

$0-49

function Directive Deconstruction

Assembly is the door "hardheaded" + "crazy shorthand" language, followed by the function statement parsing, once understood, the statement is very simple.

MOVQa_len+8(FP), BX  // move qword, 把a slice的长度放入BX寄存器MOVQb_len+32(FP), CX // 把b slice的长度放入CX寄存器CMPQBX, CX  // compare qword, 对比BX,CXJNEeqret            // jump not equal, 如果不相等就跳转至标签eqret(equal ret)MOVQa+0(FP), SI // 把a的指针放入SI寄存器中MOVQb+24(FP), DI // 把b的指针放入DI寄存器中LEAQret+48(FP), AX // load effective address, 将返回值的内存地址放入AX寄存器中JMPruntime·memeqbody(SB) // JUMP, 跳转至 runtime·memeqbody(SB) 地址空间eqret:MOVB$0, ret+48(FP)          // move byte, 将$0 (意思是数字0, 而false = 0)传入返回的参数中,即两个slice不相等。RET

There are two new concepts here:

Offset definition, for example a_len+8(FP) , remember that the FP refers to the low memory bit in the previous article? Therefore, this defines the A_len, a length = +8 (FP), which is offset by 8 bytes relative to the FP (remember the structure of slice), which is exactly where the length of a is located. If you don't remember, you can refer to

FP               +------------> b pointer                 |+                |    +-------> b length|                |    |v                |    |     +-> b capacityLow              |    |     |+----+----+----+-+--+-+--+--+-+-+|    |    |    |    |    |    | |+-+--+-+--+-+--+----+----+----+++  |    |    |                  |  |    |    +-> a capacity     +-> return value  |    |  |    +------> a length  |  +-----------> a uint64-pointer

Each space corresponds to a byte

Another concept is label, the assembly is different from the high-level language, the assembly of conditional jumps are basically by the label (tag) implementation, the example of Eqret, is a label.

AVX

Next is the essence Hugeloop

AVX is an Intel-proud SIMD instruction set, specifically about the use of SSE, AVX, and AVX2 for Avx,go in character comparisons based on the CPU's ability, which is why we write the assembly.

// 64 bytes at a time using ymm registers (一次就能对比64个byte,64倍性能就问你怕不怕)hugeloop_avx2:CMPQBX, $64 // 对比字符长度JBbigloop_avx2 // 不够64个字节就用其他方法。VMOVDQU(SI), Y0// AVX2 专用加载数据的指令,将SI前32个byte加载进Y0寄存器(512bit)VMOVDQU(DI), Y1// ...VMOVDQU32(SI), Y2VMOVDQU32(DI), Y3VPCMPEQBY1, Y0, Y4  // 对比Y0 - Y1,把结果存入Y4中VPCMPEQBY2, Y3, Y5  // 同上VPANDY4, Y5, Y6// AND 操作VPMOVMSKB Y6, DX// MOVE BYTE MASK , 将Y6中的每8个bit做一个掩码存入DX中(简单点就是相同就都是0xf)ADDQ$64, SI// SI位移64个byteADDQ$64, DI// DI位移64个byteSUBQ$64, BX// BX 长度减64CMPLDX, $0xffffffff// 对比DX的低位JEQhugeloop_avx2// 相同则继续对比VZEROUPPER// 清空Y寄存器MOVB$0, (AX)// 发现不同,返回RET

Summary

Now the compilation of Golang is familiar, the next article will extract and translate some notes.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.