This is a creation in Article, where the information may have evolved or changed.
PLAN9 assembly Language for reference
2013-02-21
This article stems from a partial translation of the Manual for the Plan 9 assembler
Machine
This assembly language can be used for Mips,sparc,intel 386,intel 960,amd 29000,motorola 68020 and 68000,motorola Power pc,amd64,dec alpha,arm.
Register
All predefined symbols in assembly language are large. The data registers are R0 to R7, the address registers are A0 to A7, and the floating-point registers are F0 to F7.
The A6 register is used by the C compiler to point to data. The A6 register is a constant and must be set to an externally defined symbolic address when the C program is initialized.
Next is the hardware register, for example in 68020: Caar,cacr,ccr,dfc,isp,msp,sfc,sr,usp, and VBR
Assembly language defines pseudo registers, FP,SP and TOS for stack operations. FP is a frame register, 0 (FP) is the first parameter, 4 (FP) is the second, and so on. 0 (SP) is the first automatic variable. The TOS is a top-of-stack register for pushing parameters into procedure, saving temporary variables, and so on. (Note: Here is a 68020 hardware system for example, it seems that the hardware system is a bit like Lisp language)
A7 is the stack address register of the hardware, note that mixing A7 and pseudo register SP can cause problems. Assembly language accepts directives like p+0 (FP) like this, and P is the first parameter. The name comes from the symbol table and has no effect on the final program results.
Data references
All external references must be relative to a pseudo register, PC (program counter) or SB (static base).
BRA 2 (PC)
Allows the use of tags, such as
BRA return Nopreturn: RTS
No PC labeling when using labels
Pseudo register SB refers to the starting address space of a program. Refer to global data as an offset to SB, such as
MOVL $array (SB), TOS
Push the address of a global array onto the stack, or
MOVL array+4 (SB), TOS
Put the second element of the array into the stack, paying attention to the use of offsets. Similarly, subroutine calls must use SB:
BSR exit (SB)
File static variables using symbols
Local <>+4 (SB)
<> will be populated with a unique integer at load time
When a program starts, it must execute before accessing any global data
MOVL $a 6base (SB), A6
An expression
Source files are preprocessed by the C compiler, so # define # and # include are working correctly
Addressing mode
o indicates that the offset,d represents a substitution, which is a constant of 128 to 127.
Placing data
Put it in the instruction flow:
LONG $12345
Put the data segment with pseudo-directive data, using two parameters: Place the address, including the size, and placement of the location. For example, define a string "ABC":
Data array+0 (SB)/1, $ ' a ' data array+1 (SB)/1, $ ' B ' data array+2 (SB)/1, $ ' C ' globl Array (SB), $4
Or
DATA array+0 (SB)/4, $ "abc\z" Globl Array (SB), $4
/1 Defines the number of bytes, GLOBL generates a global symbol, and $4 describes how many bytes the symbol occupies. Uninitialized data is automatically cleared by 0. Character \z is only 8 bytes in \0.data equivalent to C language
Defining functions
The entry point uses the pseudo-action text definition, accepts the function name as a parameter, and automatically pre-allocates the number of bytes on the stack, which is typically 0 bytes at the time the assembler is written. Here is a function that returns the sum of two numbers:
TEXT sum (SB), $ movl arg1+0 (FP), R0 addl arg2+4 (FP), R0 RTS
You can also take a parameter that is control optimized, and 1 indicates blocking optimizations, such as:
TEXT sum (SB), 1, $ movl arg1+0 (FP), R0 addl arg2+4 (FP), R0 RTS
Will not be optimized, and the above example will. Subroutines with special states, such as system calls, should not be optimized.
The return value is placed in the R0. The floating-point return value is placed in the F0. A function that returns a struct to a C program, the first parameter that is accepted is the address where the result is stored, and the call protocol does not use R0 in this function. The calling function is responsible for saving its own arguments (caller saves).
Instruction Set
NOP is eliminated directly in the loader, not as an instruction to do nothing. If you want to generate instructions that do nothing, use word pseudo-directives
I386
The assembler assumes a 32-bit protected mode. The register name is Sp,ax,bx,cx,dx,bp,di, and the SI stack pointer is an SP (not a pseudo register). The return value register is ax. No frame pointer But FP can be used as frame pointer pseudo register
The binary code name is mostly the same as the Intel Manual, L,w,b 32-bit, 16-bit, 8-bit operation respectively. Except for loads,stores,conditionals exceptions. All load and store from universal registers, special registers (such as CR0,CR3,GDTR,IDTR,SS,CS,DS,ES,FS and GS) or memory operation writing:
MOVx SRC, DST
Conditional directives are compiled by 68020 instead of Intel idioms, using JOS,JOC,JCS,JCC,JEQ,JNE,JLS,JHI,JMI,JPL,JPS,JPC,JLT,JGE,JLE,JGT instead of JO,JNO,JB,JNB,JZ, Jnz,jbe,jnbe,js,jns,jp,jnp,jl,jnl,jle,jnle.
The address pattern uses symbols similar to AX, (AX), (AX), bx*4 (AX), (AX) (bx*4). Offsets relative to AX can be replaced by FP or SB to access the name, such as Extern+5 (SB) (ax*2).
Note: Non-relative jumps are added with a * symbol for JMP and call. Only Loop,loopeq and Loopne are legitimate circular instructions. Only rep and REPN are treated as duplicates.
AMD64
The assembler assumes a 64-bit pattern. If you want to change to 32-bit mode, the mode pseudo-operation:
MODE $32
This function is primarily to detect whether the instruction in a given pattern is legitimate, but loader still assumes that it is a 32-bit operand and address, and that both calls and returns are 32-bit PCs. Mostly similar to the above 386. There are additional R8 to R15 in the architecture. All registers are 64 bits, but the instruction accesses low 8 bits, 16 bits, and 32 bits. For example, MOVL for Ax will have a low 32-bit assignment and a high 32-bit clear of 0. 64 bits use MOVQ. The C language of Plan 9 uses additional registers from R15 down. There are some instructions such as MMX and XMM. The MMX register is M0 to M7,XMM Register is X0 to X15. All uniformly use l to denote ' long word ' (32-bit), Q for ' quad word ' (64-bit). Some instructions use O (' Octword ') to denote 128 bits. The long long type of the C language is 64-bit, but it is a value instead of a reference that is passed and returned. It is also important to note that the C pointer is 64 bits. Ax is still the return value, but unlike 386, the floating-point return value is X0. All parameters that are less than 8 bytes are aligned in the stack by 8 bytes.
Originally, the purpose of this is to learn from the Go language assembly, the results of a circle to see the meaning is not significant, it is better to directly see the various cases of go generated by the sink code, in practice learning.
Func f (x, y Int32) int32 { return x}
After the compilation is
---prog list "f"---0000 (test.go:3) TEXT f+0 (SB), $0-120001 (test.go:4) movl x+0 (FP), BX0002 (Test.go:4) Movl
bx,.noname+8 (FP) 0003 (test.go:4) RET ,
X+0 (FP) x is the variable name x, which doesn't seem to work, 0 (FP) this refers to the first parameter: Noname is useless. Note that this is where the final return value is placed at 8 (FP). 0 (FP) is the parameter x,4 (FP) is the parameter y, so you can see the function call protocol of the Go language: The return value is placed next to the argument in the stack. This makes it easy to explain how many values are returned.
This code is a bit short and is inline when called. If you write a bit longer, look at the assembly that the function call generates
F (3,4)
After compilation
0034 (TEST.GO:14) MOVL $ $ , (SP) 0035 (test.go:14) movl $4,4 (sp) 0036 (test.go:14) Call , f+0 (SB) 0037 (test.go : RET ,
Here we can see the order of the parameters in the stack, the SP above the first parameter, the second parameter ...