Chapter III Machine-level representation of the program
1. Historical Perspective
The GCC C language compiler produces output in the form of assembly code, which is a textual representation of machine code, giving each instruction in the program. The x86 architecture was first seen in the Intel 8086 CPU, launched in 1978, from the Intel 8008 processor, while 8008 was developed from Intel 4004. 8086 was chosen after three years for IBM PC, then x86 became the standard platform of PC and became the most successful CPU architecture ever.
2. Program code
- The format and behavior of a machine-level program, defined as the instruction set architecture (ISA), defines the processor state, the format of the instruction, and the effect of each instruction on the state. Describe the behavior of the program as if each instruction was executed sequentially.
- The provided memory model appears to be a very large byte array. The actual implementation of the memory system is to combine multiple hardware memory and operating system software.
- Instruction set architecture It defines the processor state, the format of the instruction, and the effect of each instruction on the state. IA32 describes the behavior of the program as if it were executed sequentially as if it were each instruction, and after the end of an instruction, the next one begins.
Binary files can be viewed with the Od command, or by GDB's X command. Some of the output is too much, and we can use the more or less command to view it in conjunction with a pipe or output redirection.
od code.o | more od code.o > code.txt
- Program Memory contains: The program executable machine code, the operating system needs some information, stacks, heaps. The program memory is addressed with a virtual address (this virtual address is not a machine-level virtual address). The operating system manages the virtual address space (program-level virtual address) and translates the virtual address into the physical address (machine-level virtual address) in the actual processor memory.
3. Data format
b-字节 ——8位w-字 ——16位l-双字 ——32位
The IA32 representation of the language base data type: most common data types are stored in double-word form.
单精度(4字节)float双精度(8字节)double扩展精度(10字节)long double
Access information
Global variables are stored at a fixed address (at least for the program, they are fixed). The most common way to access these variables is to explicitly indicate that fixed address in the instruction. MOV eax,[1234134h]; Loads EAX with value stored on location 12341234H
INC DWORD PTR Test2!_ncount; Increments DWORD variable ncount
Heap variables exist on the heap, and they are accessed by pointers. Typically, more than one instruction is required to access the heap variable. MOV ESI, test2! mpfilelist; Load the pointer
MOV EAX, [esi+4]; Read Second DWORD (pszname) in heap
- The stack follows the principle of "first out." The address of the top element of the stack is the lowest of all the element addresses in the stack.
- Note You cannot direct MOV from memory address to another memory address, use register to relay.
- Effective address calculation imm (eb,ei,s) = Imm + R[eb] + r[ei]*s
JMP command
This command simply sets the EIP register to the address of the next instruction. No data is stored on the stack, and no flag bits are set. JMP is used on a fixed branch of instruction. Most If-then-else statement families require at least one JMP directive.
int command
When the operand of the int command is an interrupt number, the instruction throws a software interrupt. This is similar to the call command, where the EFLAGS register is pressed into the stack. Also, if called in user mode, the Eflag register will also be pushed to the stack when switching to kernel mode. At the end of the interrupt function, the EFlags register and the EIP will be restored from the stack as the reti instruction executes.
Conditional Jump Instruction
Loop directives are used to implement loops in high-level languages. It will not move to the branch address until the value of ECX (counter) is 0. If the ecx is not 0, then the ECX will be reduced by one, then continue the loop operation.
XOR Eax,eax; Clear EAX Register
MOV ECX, 5; Load Loop count
START:
ADD eax,1; Add one to EAX
LOOP START
The instructions to jump according to the conditions will determine whether the specified condition is true, if it is to execute the jump. For example, JNZ (jump is not zero), the address specified in the operand is not transferred until the zero flag bit is set to 1. These directives are mainly used in the IF statement block.
XOR Eax,eax; Clear EAX
MOV ecx,5
START:
ADD eax,1; Add one to EAX
DEC ECX; Decrement Loop counter
JNZ START
5. Arithmetic and logic operations
- Four sets of operations: unary operation: Only one operand, which can be a register or a memory location. Binary operation: The source operand is the first one, which can be an immediate number, register, and memory. The purpose operand is the second one, which can be a register, a memory. Both cannot be memory at the same time. Shift: The first is the shift amount, encoded in a single byte (only 0-31-bit shifts are allowed), either immediately or in a single-byte register%CL.
6. Control
The most important thing in control is the jump statement:
Conditional jump (implement If,switch,while,for)
Unconditionally jump to JMP (for Goto)
- Condition Code Register: Describes the properties of the most recent arithmetic or logical operation, which can be detected to perform conditional branching instructions. Common condition codes are:
CF: Carry Flag
ZF: 0 Logo
SF: Symbol Sign
Of: Overflow flag
7. Process
- Stacks are used to pass parameters, store return information, save registers, and local storage.
- The topmost stack frame is defined with two pointers, the register%EBP as the frame pointer, and the register%ESP as the stack pointer. When the program executes, the stack pointer can be moved, and most of the information is accessed relative to the frame pointer.
Program register groups are the only resources that can be shared by all processes.
根据惯例寄存器%eax,%edx,%ecx被划分为调用者保存寄存器。 %ebx,%esi,%edi被划分为被调用者保存寄存器。 %ebp,%esp 惯例保持 %eax用来保存返回值
- The call command has a target, which indicates the address of the instruction at the beginning of the called process, and the effect is to put the return address into the stack and jump to the beginning of the called procedure.
- The RET instruction pops the address from the stack and jumps to this location, using the command stack pointer to point to the location where the call command stores the return address.
20135304 Liu Xipeng--The basic design of information security system Fourth Week study summary