Section fourth, machine language of the program
First, the historical view of x86
The x86 architecture was first seen in the Intel 8086 CPU, launched in 1978, from the Intel 8008 processor, while 8008 was developed from Intel 4004. 8086 was chosen after three years for IBM PC, then x86 became the standard platform of PC and became the most successful CPU architecture.
Second, the program code
Machine-Level Code
The computer system uses a variety of different forms of abstraction, using simpler abstract models to hide the details of the implementation.
For machine-level programming, two of these abstractions are particularly important:
①, instruction set architecture (instruction set architecture ISA)
It defines the processor state, the format of the instruction, and the effect of each instruction on the state.
IA32 describes the behavior of the program as if it were executed sequentially as if it were each instruction, and after the end of an instruction, the next one begins. (In fact, the processor executes many instructions concurrently, but can take steps to ensure that the overall behavior is fully consistent with the sequence specified by ISA)
②, the memory address used by the machine-level program is the virtual address
The provided memory model appears to be a very large byte array. The actual implementation of the memory system is to combine multiple hardware memory and operating system software.
③, program memory contains: The program executable machine code, the operating system needs some information, stacks, heaps. The program memory is addressed with a virtual address (this virtual address is not a machine-level virtual address). The operating system manages the virtual address space (program-level virtual address) and translates the virtual address into the physical address (machine-level virtual address) in the actual processor memory.
Third, the data format
The assembly does not declare a data type statement, using a code suffix.
Includes, byte B, Word w, double word 1, single precision s, etc...
Iv. Access to Information
As with other levels of programming languages, assembly language can access variables in many ways. There are three basic ways to store variables.
1. Global variable/static variable-assigned in Program Data section
2. Local variables/Parameters-allocate on stack
3. Heap variables-allocation on the heap
① Global, static variables
Global variables are stored at a fixed address (at least for the program, they are fixed). The most common way to access these variables is to explicitly indicate that fixed address in the instruction.
MOV eax,[1234134h]; Loads EAX with value stored on location 12341234H
INC DWORD PTR Test2!_ncount; Increments DWORD variable ncount
Note that debugger will use it when the symbolic information is available.
Local variables, parameters
Local variables and parameters exist on the stack and are accessed through EBP (sometimes ESP). Optimized code usually clears the dependency on the stack base pointer (frame pointer), in which case the ESP register is used to access the local variable, and EBP can be used to make an additional generic register to use. When you use a standard stack base pointer, the instructions should look like this.
MOV Eax,[ebp+8]; Load EAX with argument
MOV Eax,[ebp-4]; Load EAX with local variable
There is a memory trick when EBP is not used as a universal register, that is, most of the time, when the displacement is positive, the parameters are accessed. When the displacement is negative, the local variable is accessed.
Note that the typical first argument passed to a function is ebp+8
② Heap Variables
Heap variables exist on the heap, and they are accessed by pointers. Typically, more than one instruction is required to access the heap variable.
MOV ESI, Test2!_m_pfilelist; Load the pointer
MOV EAX, [esi+4]; Read Second DWORD (pszname) in heap
Another important note is that most compilers place frequently accessed variables in registers to facilitate faster access. In particular, thin instruction computers.
Execution Flow Control
The control Flow command is either conditional (when the condition is met), or unconditional. These statements support function calls, If-then-else,switch case, and other advanced language components.
③ Unconditional Jump Instruction
1. JMP command
This command simply sets the EIP register to the address of the next instruction. No data is stored on the stack, and no flag bits are set. JMP is used on a fixed branch of instruction. Most If-then-else statement families require at least one JMP directive.
2. Call command
This instruction first stores the value of the EIP on the stack, and then sets the EIP to the address of the next instruction. The EIP stack allows the program to return to the statement following the call statement after it has finished calling the function.
For JMP and call directives, the operand can be a fixed address, a register value, or a pointer to a branch address.
3. RET command
The RET instruction assigns the value on the current stack to the EIP register. This command is used to fix stack pointers for parameters passed to the stack.
4. int command
When the operand of the int command is an interrupt number, the instruction throws a software interrupt. This is similar to the call command, where the EFLAGS register is pressed into the stack. Also, if called in user mode, the Eflag register will also be pushed to the stack when switching to kernel mode. At the end of the interrupt function, the EFlags register and the EIP will be restored from the stack as the reti instruction executes.
④, conditional jump instruction
1, LOOP Adress
Loop directives are used to implement loops in high-level languages. It will not move to the branch address until the value of ECX (counter) is 0. If the ecx is not 0, then the ECX will be reduced by one, then continue the loop operation.
XOR Eax,eax; Clear EAX Register
MOV ECX, 5; Load Loop count
START:
ADD eax,1; Add one to EAX
LOOP START
2. Jnx,je, etc.
The instructions to jump according to the conditions will determine whether the specified condition is true, if it is to execute the jump. For example, JNZ (jump is not zero), the address specified in the operand is not transferred until the zero flag bit is set to 1. These directives are mainly used in the IF statement block.
XOR Eax,eax; Clear EAX
MOV ecx,5
START:
ADD eax,1; Add one to EAX
DEC ECX; Decrement Loop counter
JNZ START
V. Control
The program can not be a smooth execution, need some branch process control syntax, for high-level language, there are branch loops, for the assembly, there is a "jump", or selective jump, the jump instruction itself is very simple, just a jmp instruction, similar to the C language Goto, the syntax is: Label ... JMP Label Jump is divided into segments jump (less than 128 bytes), far jump (segmented mode cross-section jump), near Jump (other), but these at t/T in the compiler will be based on the change of parameters to generate machine code, but for MASM, you need to specify, jmp near PTR label, JMP Far PTR label. But essentially, if only this kind of jmp, that no matter how the jump will be a dead loop, so there is a conditional jump (jcond), under certain conditions to jump, here the so-called conditions, is still the eflags of different marker bits, as follows:
Instructions |
Jump conditions |
EFlags logo |
JA |
Jump if above |
Cf=0 & Zf=0 |
JAE |
Jump if above or equal |
Cf=0 |
Jb |
Jump if below |
Cf=1 |
Jbe |
Jump if below or equal |
Cf=1 or Zf=1 |
Jc |
Jump if carry |
Cf=1 |
Jcxz |
Jump if cx=0 |
Register cx=0 |
JE (is the same as JZ) |
Jump if equal |
Zf=1 |
Jg |
Jump if Greater (signed) |
Zf=0 & Sf=of |
Jge |
Jump if greater or equal (signed) |
Sf=of |
JL |
Jump if Less (signed) |
SF! = of |
Jle |
Jump if less or equal (signed) |
Zf=1 or Sf!=of |
JMP |
Unconditional Jump |
- |
JNA |
Jump if not above |
Cf=1 or Zf=1 |
Jnae |
Jump if not above or equal |
Cf=1 |
JNB |
Jump if not below |
Cf=0 |
Jnbe |
Jump if not below or equal |
Cf=1 & Zf=0 |
JNC |
Jump if not carry |
Cf=0 |
JNE |
Jump If not equal |
Zf=0 |
JNG |
Jump if not greater (signed) |
Zf=1 or Sf!=of |
Jnge |
Jump if not greater or equal (signed) |
Sf!=of |
JNL |
Jump if not less (signed) |
Sf=of |
Jnle |
Jump if not less or equal (signed) |
Zf=0 & Sf=of |
Jno |
Jump if not overflow (signed) |
Of=0 |
JNP |
Jump If no parity |
Pf=0 |
JNS |
Jump if not signed (signed) |
Sf=0 |
Jnz |
Jump if not zero |
Zf=0 |
JO |
Jump if Overflow (signed) |
Of=1 |
Jp |
Jump if parity |
Pf=1 |
JPE |
Jump if parity even |
Pf=1 |
MP ( |
Jump if paity odd |
Pf=0 |
Js |
Jump if signed (signed) |
Sf=1 |
JZ |
Jump if Zero |
Zf=1 |
|
Seventh Quarter Process
The procedure can be understood as a function in C, when the caller (caller) invokes the callee (be caller), the system allocates space within the stack for the callee, which is called the stack frame. The structure of the stack is probably as follows:
The program stack is to the low address growth stack, similar to the stack structure in the data structure, has a last-in-first-out nature, the register%ESP (stack pointer) holds the address of the stack top pointer, the register%EBP (* * pointer) Save the frame pointer address. When the program executes, the stack pointer can be moved to increase or decrease the space of the stack, and the frame pointer is fixed because most of the data stored in the stack is relative to the frame pointer (frame pointer + offset).
When the caller calls another procedure:
- First, if the called procedure has parameters, these parameters are constructed in the stack frame of the call and deposited into the caller's stack frame (so the above graph parameter n ... Parameter 1, this is the reason);
- Returns the address into the stack. The return address is the address of the instruction that the caller should continue to execute after execution of the called procedure, which belongs to the part of the caller stack frame, forming the end of the caller stack frame
- By this step, the stack frame of the callee is entered, so-called current stack frame. Save the caller's frame pointer so that the caller's program stack can be retrieved later;
- Finally into the program execution, the general process will sub 0xNh%ESP to allocate the size of the current program stack, to access temporary variables ah, the value of the staging register Ah, and so on.
- If the callee calls another procedure again, return to the first step;
- When the process is finished, the stack pointer, the frame pointer, is restored, often seen in the disassembly as follows: At the same time, the return address will be restored to the PC.
- This is where the caller should continue to execute.
The above text can be more generalized, disassembly a process (function) will have to establish (initialize), the body (execution), the end (return). Before it was easy to mix stacks and heaps (not in data structures), find a good article to share with you: Stack and heap differences. It is said to have been transferred countless times, indicating that it is well written. Procedure calls and returns are implemented in assembly language using call and RET (return) respectively. Call and RET practices are not very transparent,
- Call returns the address into the stack and jumps the PC to the starting address of the called process;
- RET, instead of call, pops the return address from the stack and jumps to the PC.
Reference documents
First, Baidu Encyclopedia
Second, the blog Park in the road alumni
Third, rookie of the private plots blog
Four, the Electronic Enthusiasts official website
Questions and Answers
This week's content is somewhat similar to the previous semester's compendium, but the explanation is deeper.
This week's main problem is that assembly language has no data type declaration and is prone to input errors.
Section fourth, machine language of the program