Chapter III The machine-level representation of the program a historical perspective
Intel processor (X86)
Second, the program code
gcc -01 -o p p1.c p2.c
① compile Option-01 means the compiler uses first-level optimizations
② compile Option-02 means the compiler uses a second-level optimization (better choice)
③-o that the p1.c and p2.c compiled executables are named P, respectively
steps for GCC to convert the source code into executable code:
C预处理器:扩展源代码,插入所有#include命令指定的文件,并扩展生成.i文件 编译器:产生两个源代码的汇编代码,生成.s文件 汇编器:将汇编代码转化成二进制目标代码——生成.o文件 链接器:产生可执行代码文件p
回顾:gcc命令编译运行C语言
Preprocessing phase: Converts the *.c file into a *.i pre-processed C program.
Compile phase: Compiles the *.i file into the assembly code *.S file.
Assembly phase: Converts the *.s file into a *.O binary target code file.
Link stage: Convert the *.O file to an executable file.
Generate executable file: Convert *.o to executable file.
Executes executable C language files.
(1)机器级代码
two kinds of abstraction
(1) Instruction set architecture Isa
(2) The memory address used by the machine-level program is a virtual address (combined with multiple hardware memory and operating system software).
Processor Status:
- Program Counter (CS:IP)
- Integer register (AX,BX,CX,DX)
- Condition Code Register (OF,SF,ZF,AF,PF,CF)
- Floating-point Registers
(2)代码示例
C语言代码文件code.c:
int accum = 0;
int sum (int x, int y)
{
int t = x + y;
Accum + = t;
return t;
}
产生汇编代码code.s:
gcc -01 -S code.c
编译并汇编该代码,产生目标代码code.o:gcc -01 -c code.c
反汇编器——查看目标代码文件
abjdump -d code.o
三、数据格式
Data type:
Word: 16 bit
Double word: 32 bit
Four-bit: 64-bit
Basic data type:
C Declaration |
Intel |
Assembly code suffix |
Size (bytes) |
Char |
Bytes |
B |
1 |
Short |
Word |
W |
2 |
Int |
Double word |
L |
4 |
Long int |
Double word |
L |
4 |
Long Long int |
— |
— |
4 |
char * |
Double word |
L |
4 |
Float |
Single precision |
S |
4 |
Double |
Double precision |
L |
8 |
Long double |
Extended Precision |
T |
10/12 |
Data transfer Instructions
Movb Transfer bytes
MOVW Transfer Word
MOVL Transmission Double Word
Iv. access to information
(i) Operand designator
operand : Indicates the source data value to be referenced in the execution of an operation, and the target location of the drop result.
Type of operand:
- Immediate number (constant value) Example: $0x1f
- Register Example:%ax
- Memory Valid Address
Addressing method:
(1) Immediate number addressing method
(2) Register addressing mode
(3) Memory addressing mode
(4) Direct addressing method
(5) Register Indirect addressing method
(6) Register relative addressing method
(7) Address Change Address method
(8) Relative base address variable address addressing mode
(ii) data transmission instructions
1.mov instruction: Copy the value of the source operand to the destination operand
Movb, MOVW, MOVL
2. Stacks
- Follow the principle of "LIFO first"
- Stack pointer points to top of stack element
- The address of the top element of the stack is the lowest in the stack.
Press stack push: pushes data into the stack
Pop out stack: Popup data
3. Data Transfer Example
The pointer in C is actually an address, and the indirect reference pointer is to place the pointer in a register, and then use that register in the memory reference
Local variables are usually stored in registers
(v) Arithmetic and logic operations
byte addition: ADDB
word addition; ADDW
Double word addition: Addl
1. Load a valid address
Loading the valid address directive--leal, is the deformation of the movl instruction, which is compared to the LEA instruction in the compilation.
writes a valid address to the destination operand, the destination operand must be Register Device
2, unary operation and two Yuan operation
1. Unary operation
Only one operand, both a source and a destination, can be a register, or a memory location.
Example: addition operator (+ +) and minus 1 operator
2. Two Yuan operation
The second number is both a source and a destination
第一个操作数可以是立即数、寄存器或者存储器位置
第二个操作数可以是寄存器或者存储器位置
两个操作数但是不能同时是存储器位置。
Third, shift operation
右边填0:SAL 算术左移 SHL 逻辑左移
填上符号位:
SAR 算术右移
填上0:SHR 逻辑右移
The shift operation source operand can be an immediate number or CL
The operand of a shift operation can be a register or a memory
4. Special arithmetic operation
(vi) control
First, the condition code
CF:进位标志ZF:零标志SF:符号标志OF:溢出标志
Attention:
- Leal does not change the condition code register
- CMP differs from sub: CMP also sets the condition code based on the difference of two operands, but only sets the condition code and does not update the target register
- Conditional jump condition See status register (textbook is called Conditional Code register)
Common directives:
MOV does not affect the flag bit
PUSH POP does not affect flag bit
XCHG Exchange instruction does not affect flag bit
XLAT Code change instruction does not affect flag bit
LEA valid address send register instruction does not affect flag bit
PUSHF flag-in-stack instruction does not affect flag bit
Popf flag out stack instruction flag bit determined by Mount value
2. Access Condition code
- A byte is set to 0 or 1 according to a combination of the condition code;
- Jump to some other part of the program
- Conditionally transmitted data.
Set instruction sets the condition code according to the result of T=a-b
Set instruction:
3. Jump instruction and its code
The jump instruction causes the execution to switch to a completely new position in the program, usually with a label indicating
The JMP directive is an unconditional jump that can be divided into direct jumps and indirect jumps:
直接跳转:后面跟标号作为跳转目标
间接跳转:*后面跟一个操作数指示
Difference:
JMP *%eax Use the value in register%eax as a jump target
JMP * (%EAX) reads the jump target from the storage using the value in%eax as the read address
The most important thing in control is the jump statement:
Conditional jump (implement If,switch,while,for)
Unconditionally jump to JMP (for Goto)
When performing a PC-related addressing, the value of the program counter is the address of the instruction following the jump instruction, not the address of the jump instruction itself.
4. Translation Conditions Branch
The most common way to translate conditional expressions and statements from C to machine languages is to combine conditional and unconditional jumps .
Unconditional jump: for example Goto. The example in the book is the translation of the If-else statement into a goto form, which is then translated into assembly language by this form.
The general form of If-else statements in C language:
if(test-expr)then-statementelseelse-statement
Assembly structure:
t=test-expr;if!(t)goto false;then-statementgoto done;false:else-statementdone:
5. Circulation
The assembly can be combined with conditional tests and jumps to achieve the loop effect, but most of the assembler must first convert other forms of the loop to do-while format.
1.do-while Cycle
General form:
dobody-statementwhile(test-expr);
The loop body body-statement is executed at least once.
The conditions and GOTO statements that can be translated as follows:
loop:body-statementt = test-expr;if(t)goto loop;
That is, the loop body statement is executed before the test expression is executed.
2.while Cycle
Common form of a while statement in C language:
while(test-expr)body-statement
Assembly structure:
t=test-expr;if(!t)goto done;loop:body-statementt=test-expr;if(t)goto loop;done:
For loop
The general form of the for statement in C language:
for(init-expr;test-expr;update-expr)body-statement
Assembly structure:
init-exprt=test-expr;if(!t)goto done;loop:body-statementupdate-expr;t=test-expr;if(t)goto loop;done:
3.for Cycle
For loop
The general form of the for statement in C language:
for(init-expr;test-expr;update-expr)body-statement
Assembly structure:
init-exprt=test-expr;if(!t)goto done;loop:body-statementupdate-expr;t=test-expr;if(t)goto loop;done:
6. Conditional delivery Instruction (detailed reading book)
7. Switch statement
The switch statement is typical of multiple branches (this is already well-mastered, see the book in detail)
(vii) Process
Stack frame structure
The stack allocated for a single process is called a stack frame, the register%EBP is a frame pointer, and the register pointer is%esp as a stack pointer, the stack pointer moves when the program executes, and most of the information is accessed relative to the frame pointer.
The stack grows to the low address direction, while the stack pointer%esp to the top of the stack element.
Transfer control
Call: The goal is to indicate the address of the instruction at the beginning of the called procedure, and the effect is to put the return address into the stack and jump to the beginning of the called procedure.
Ret: POPs the address from the stack and jumps to this position.
function return value exists in%eax
1.call
The call instruction and the transfer instruction are similar, the same direct and indirect, the target of the direct invocation is the label, the target of the indirect call is * followed by an operand indicator, as in jmp.
The effect of the call instruction is to put the return address into the stack and jump to the beginning of the called procedure. The return address is the address of the instruction that is also immediately behind the call in the program.
Then you'll use a ret.
2.ret
RET refers to the pop-out address from the stack and jumps to this position.
In the last semester of assembly language learning, call and RET are often used for sub-functions, sub-module calls.
3.leave
This instruction allows the stack to be ready for return, equivalent to:
%ebp,%esppopl %ebp
GDB command to view function call stack information ŸBACKTRACE/BT n
n is a positive integer that represents only the stack information for the top n-tier of the stack.
The-n table is a negative integer that prints only the stack information for the n-tier below the stack.
Ÿframe N
N is an integer starting from 0, which is the layer number in the stack. This instruction means moving to n the specified stack frame and printing the selected stack's information. If n is not present, the information for the current frame is printed.
Ÿup N
Moves the n layer to the top of the stack without hitting N, which means moving up one layer.
Ÿdown N
Moves the n layer below the stack, without hitting N, to move down one layer.
Register Usage Conventions
Registers%eax,%edx, and%ecx are divided into caller-save registers.
Registers%EBX,%esi,%edi are divided into callee save registers.
Procedure Examples
GCC insists that all stack space used by a function must be an integer multiple of 16 bytes, including the 4 bytes that hold the%EBP value and the 4 bytes of the return value.
Recursive process
Recursive calls themselves are the same as calling other functions. Stack rules provide a mechanism for each function call to have its own private state information store (saved return location, stack pointer, and the value of the callee Save register)
补充汇编语言命令:
Arithmetic InstructionsAdd addition directive affects flag bitADC with carry addition instruction affects flag bitINC plus one directive does not affect CF, which affects other markersSUB subtraction directive affects flag bitSBB with borrow subtraction command affects flag bitDEC minus one instruction does not affect CF and affects other flag bitsThe NEG command affects the flag bit only if the operand is0, for example the word arithmetic pair-128, of=1, other times of=0CMP comparison directives do subtraction but do not store results, set conditional flag bits based on resultsMUL unsigned number multiplication instruction Imul signed number multiplication instruction is undefined for the condition code bit other than CF and the of bit (i.e., indeterminate state)DIV unsigned number Division instruction IDIV signed number Division instruction Division Instruction no definition for all condition code bits Bit operations Directives :and logic with or logical or not logic does not affect the flag bit XOR or test instruction except for the four kinds of, set CF, of 0,af undefined, SF,ZF, PF sets the shift instruction based on the result of the operation:SHL logical left SHIFT instruction SHR logical Right SHIFT instruction shift instruction based on result set SF,ZF,PF bit ROL loop left shift instruction ROR loop right Shift instruction The cyclic shift instruction does not affect the conditional bit string processing instructionexcept cf,of: The MOVS string transfer instruction STOs the string instruction LODs from the string fetch instruction does not affect the condition bit cmps string comparison instruction SCAs string Scan instruction does not save the result, Control transfer directives only based on the result set criteria code:JMP Unconditional transfer directive does not affect the condition code
reference: 20135202 Kang Jiaxin Summary of the machine-level representation of the third chapter of the program
20135223 He Weizin-Information security system design basics Fifth Week study summary