Section fourth, machine language of the program

Last Update:2015-10-11 Source: Internet

Author: User

Tags switch case

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Section fourth, machine language of the program

First, the historical view of x86

The x86 architecture was first seen in the Intel 8086 CPU, launched in 1978, from the Intel 8008 processor, while 8008 was developed from Intel 4004. 8086 was chosen after three years for IBM PC, then x86 became the standard platform of PC and became the most successful CPU architecture.

Second, the program code

Machine-Level Code

The computer system uses a variety of different forms of abstraction, using simpler abstract models to hide the details of the implementation.

For machine-level programming, two of these abstractions are particularly important:

①, instruction set architecture (instruction set architecture ISA)

It defines the processor state, the format of the instruction, and the effect of each instruction on the state.

IA32 describes the behavior of the program as if it were executed sequentially as if it were each instruction, and after the end of an instruction, the next one begins. (In fact, the processor executes many instructions concurrently, but can take steps to ensure that the overall behavior is fully consistent with the sequence specified by ISA)

②, the memory address used by the machine-level program is the virtual address

The provided memory model appears to be a very large byte array. The actual implementation of the memory system is to combine multiple hardware memory and operating system software.

③, program memory contains: The program executable machine code, the operating system needs some information, stacks, heaps. The program memory is addressed with a virtual address (this virtual address is not a machine-level virtual address). The operating system manages the virtual address space (program-level virtual address) and translates the virtual address into the physical address (machine-level virtual address) in the actual processor memory.

Third, the data format

The assembly does not declare a data type statement, using a code suffix.

Includes, byte B, Word w, double word 1, single precision s, etc...

Iv. Access to Information

As with other levels of programming languages, assembly language can access variables in many ways. There are three basic ways to store variables.

1. Global variable/static variable-assigned in Program Data section

2. Local variables/Parameters-allocate on stack

3. Heap variables-allocation on the heap

① Global, static variables

Global variables are stored at a fixed address (at least for the program, they are fixed). The most common way to access these variables is to explicitly indicate that fixed address in the instruction.

MOV eax,[1234134h]; Loads EAX with value stored on location 12341234H

INC DWORD PTR Test2!_ncount; Increments DWORD variable ncount

Note that debugger will use it when the symbolic information is available.

Local variables, parameters

Local variables and parameters exist on the stack and are accessed through EBP (sometimes ESP). Optimized code usually clears the dependency on the stack base pointer (frame pointer), in which case the ESP register is used to access the local variable, and EBP can be used to make an additional generic register to use. When you use a standard stack base pointer, the instructions should look like this.

MOV Eax,[ebp+8]; Load EAX with argument

MOV Eax,[ebp-4]; Load EAX with local variable

There is a memory trick when EBP is not used as a universal register, that is, most of the time, when the displacement is positive, the parameters are accessed. When the displacement is negative, the local variable is accessed.

Note that the typical first argument passed to a function is ebp+8

② Heap Variables

Heap variables exist on the heap, and they are accessed by pointers. Typically, more than one instruction is required to access the heap variable.

MOV ESI, Test2!_m_pfilelist; Load the pointer

MOV EAX, [esi+4]; Read Second DWORD (pszname) in heap

Another important note is that most compilers place frequently accessed variables in registers to facilitate faster access. In particular, thin instruction computers.

Execution Flow Control

The control Flow command is either conditional (when the condition is met), or unconditional. These statements support function calls, If-then-else,switch case, and other advanced language components.

③ Unconditional Jump Instruction

1. JMP command

This command simply sets the EIP register to the address of the next instruction. No data is stored on the stack, and no flag bits are set. JMP is used on a fixed branch of instruction. Most If-then-else statement families require at least one JMP directive.

2. Call command

This instruction first stores the value of the EIP on the stack, and then sets the EIP to the address of the next instruction. The EIP stack allows the program to return to the statement following the call statement after it has finished calling the function.

For JMP and call directives, the operand can be a fixed address, a register value, or a pointer to a branch address.

3. RET command

The RET instruction assigns the value on the current stack to the EIP register. This command is used to fix stack pointers for parameters passed to the stack.

4. int command

When the operand of the int command is an interrupt number, the instruction throws a software interrupt. This is similar to the call command, where the EFLAGS register is pressed into the stack. Also, if called in user mode, the Eflag register will also be pushed to the stack when switching to kernel mode. At the end of the interrupt function, the EFlags register and the EIP will be restored from the stack as the reti instruction executes.

④, conditional jump instruction

1, LOOP Adress

Loop directives are used to implement loops in high-level languages. It will not move to the branch address until the value of ECX (counter) is 0. If the ecx is not 0, then the ECX will be reduced by one, then continue the loop operation.

XOR Eax,eax; Clear EAX Register

MOV ECX, 5; Load Loop count

START:

ADD eax,1; Add one to EAX

LOOP START

2. Jnx,je, etc.

The instructions to jump according to the conditions will determine whether the specified condition is true, if it is to execute the jump. For example, JNZ (jump is not zero), the address specified in the operand is not transferred until the zero flag bit is set to 1. These directives are mainly used in the IF statement block.

XOR Eax,eax; Clear EAX

MOV ecx,5

START:

ADD eax,1; Add one to EAX

DEC ECX; Decrement Loop counter

JNZ START

V. Control

The program can not be a smooth execution, need some branch process control syntax, for high-level language, there are branch loops, for the assembly, there is a "jump", or selective jump, the jump instruction itself is very simple, just a jmp instruction, similar to the C language Goto, the syntax is:

Label

...

JMP Label

Jump is divided into segments jump (less than 128 bytes), far jump (segmented mode cross-section jump), near Jump (other), but these at t/T in the compiler will be based on the change of parameters to generate machine code, but for MASM, you need to specify, jmp near PTR label, JMP Far PTR label.

But essentially, if only this kind of jmp, that no matter how the jump will be a dead loop, so there is a conditional jump (jcond), under certain conditions to jump, here the so-called conditions, is still the eflags of different marker bits, as follows:

Instructions	Jump conditions	EFlags logo
JA	Jump if above	Cf=0 & Zf=0
JAE	Jump if above or equal	Cf=0
Jb	Jump if below	Cf=1
Jbe	Jump if below or equal	Cf=1 or Zf=1
Jc	Jump if carry	Cf=1
Jcxz	Jump if cx=0	Register cx=0
JE (is the same as JZ)	Jump if equal	Zf=1
Jg	Jump if Greater (signed)	Zf=0 & Sf=of
Jge	Jump if greater or equal (signed)	Sf=of
JL	Jump if Less (signed)	SF! = of
Jle	Jump if less or equal (signed)	Zf=1 or Sf!=of
JMP	Unconditional Jump	-
JNA	Jump if not above	Cf=1 or Zf=1
Jnae	Jump if not above or equal	Cf=1
JNB	Jump if not below	Cf=0
Jnbe	Jump if not below or equal	Cf=1 & Zf=0
JNC	Jump if not carry	Cf=0
JNE	Jump If not equal	Zf=0
JNG	Jump if not greater (signed)	Zf=1 or Sf!=of
Jnge	Jump if not greater or equal (signed)	Sf!=of
JNL	Jump if not less (signed)	Sf=of
Jnle	Jump if not less or equal (signed)	Zf=0 & Sf=of
Jno	Jump if not overflow (signed)	Of=0
JNP	Jump If no parity	Pf=0
JNS	Jump if not signed (signed)	Sf=0
Jnz	Jump if not zero	Zf=0
JO	Jump if Overflow (signed)	Of=1
Jp	Jump if parity	Pf=1
JPE	Jump if parity even	Pf=1
MP (	Jump if paity odd	Pf=0
Js	Jump if signed (signed)	Sf=1
JZ	Jump if Zero	Zf=1

Seventh Quarter Process

The procedure can be understood as a function in C, when the caller (caller) invokes the callee (be caller), the system allocates space within the stack for the callee, which is called the stack frame. The structure of the stack is probably as follows:

The program stack is to the low address growth stack, similar to the stack structure in the data structure, has a last-in-first-out nature, the register%ESP (stack pointer) holds the address of the stack top pointer, the register%EBP (* * pointer) Save the frame pointer address. When the program executes, the stack pointer can be moved to increase or decrease the space of the stack, and the frame pointer is fixed because most of the data stored in the stack is relative to the frame pointer (frame pointer + offset).

When the caller calls another procedure:

First, if the called procedure has parameters, these parameters are constructed in the stack frame of the call and deposited into the caller's stack frame (so the above graph parameter n ... Parameter 1, this is the reason);
Returns the address into the stack. The return address is the address of the instruction that the caller should continue to execute after execution of the called procedure, which belongs to the part of the caller stack frame, forming the end of the caller stack frame
By this step, the stack frame of the callee is entered, so-called current stack frame. Save the caller's frame pointer so that the caller's program stack can be retrieved later;
Finally into the program execution, the general process will sub 0xNh%ESP to allocate the size of the current program stack, to access temporary variables ah, the value of the staging register Ah, and so on.
If the callee calls another procedure again, return to the first step;
When the process is finished, the stack pointer, the frame pointer, is restored, often seen in the disassembly as follows: At the same time, the return address will be restored to the PC.
This is where the caller should continue to execute.

The above text can be more generalized, disassembly a process (function) will have to establish (initialize), the body (execution), the end (return). Before it was easy to mix stacks and heaps (not in data structures), find a good article to share with you: Stack and heap differences. It is said to have been transferred countless times, indicating that it is well written. Procedure calls and returns are implemented in assembly language using call and RET (return) respectively. Call and RET practices are not very transparent,

Call returns the address into the stack and jumps the PC to the starting address of the called process;
RET, instead of call, pops the return address from the stack and jumps to the PC.

Reference documents

First, Baidu Encyclopedia

Second, the blog Park in the road alumni

Third, rookie of the private plots blog

Four, the Electronic Enthusiasts official website

Questions and Answers

This week's content is somewhat similar to the previous semester's compendium, but the explanation is deeper.

This week's main problem is that assembly language has no data type declaration and is prone to input errors.

Section fourth, machine language of the program

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More