Chapter III Machine-level representation of the program
3.1 Historical View
The Intel processor family, commonly known as x86, has undergone a long-term development process.
Each successor processor is designed to be back-compatible, meaning that code edited on earlier versions can run on newer processors.
3.2 Program code
Suppose a C program, with two files p1.c and p2.c, compiles the code on the IA32 machine with the UNIX command line as follows:
Unix> Gcc-01-o P p1.c p2.c
Command GCC refers to the gcc C compiler
01 tell the compiler to use the first level of optimization (improving the optimization level will make the final program run faster, but the compilation time will be longer, debugging more difficult)
In fact, the GCC command invokes a series of programs that convert the source code into executable code:
First, the C preprocessor extends the source code
The compiler then produces two source code assembly code. S
Next, the assembler translates the assembly code into binary target code. O
Finally, the linker merges two object code files with the code that implements the library functions, producing the final executable code file p
3.2.1 Machine-Level code
The computer system uses a variety of different forms of abstraction, using simpler abstract models to hide the details of the implementation.
There are two important abstractions for machine-level programming:
1. The format and behavior of the machine-level program, defined as the instruction set architecture (ISA), which defines the processor state, the format of the instruction, the impact of each instruction on the state
Most ISA executes instructions by order.
2. The storage address used by the machine-level program is a virtual address, and the provided memory model appears to be a very large byte array.
The compiler does most of the work throughout the compilation process. The assembler code is a major feature of the binary format of machine code: It is expressed in a more readable text format.
The IA32 machine code differs greatly from the original C code, and some of the processor states that are usually hidden from the C language programmer are visible, some memory:
• Program counter (PC, expressed in%eip): Indicates the address of the next instruction to be executed in memory
• Integer Register: Contains 8 named locations, each storing 32-bit values, which can store addresses (pointers to C languages) or integer data.
• Condition Code Register: holds state information for the most recently executed arithmetic or logic instruction to implement conditional changes in the control or data flow.
• Floating-point registers: storing floating-point data.
3.2.2 Code Example
To view the contents of the target code file, the most valuable is the disassembler, in Linux, the command-line flag with-D objdump can act as this role.
The machine code and its disassembly represent the value of the attribute:
· IA32 instruction lengths ranging from 1 to 15 bytes
• The instruction format is designed in such a way that, starting at a given location, a byte can be uniquely decoded into a machine instruction
• The disassembler simply determines the assembly code based on the sequence of bytes in the machine code file and does not require access to the program's source code or assembly code
• There are some differences between the command naming conventions used by the disassembler and the assembly code generated by GCC
3.3 Data formats
IA32 for the basic data type of C language:
Most gcc-generated assembly code directives have a character suffix that indicates the size of the operand, such as:
Movb Transfer bytes
MOVW Transfer Word
MOVL Transmission Double Word
3.4 Access Information
A IA32 central processing unit (CPU) contains a set of 8 registers that store 32-bit values:
3.4.1 Operand instruction character
IA32 supports a number of operand formats, such as:
The number of operations is divided into three types:
• Immediate count, i.e. constant value
• Registers that represent the contents of a register
• Memory to access a memory location based on a valid address
3.4.2 Data Transfer Instructions
The routing instruction is divided into the instruction class: The instruction in a class executes the same operation, except that the operands are of different sizes.
The Mov class's instruction copies the value of the source operand to the destination operand
Both the Movs and Movz directives replicate a smaller source data to a larger data location, with the sign bit extension (movs) or 0 extension (MOVZ).
Symbol bit extension: All highs of the destination are populated with the highest bit value of the source value
0 Expansion: High-level with 0 padding
PUSHL and POPL can push data into the program stack and eject data from the program stack.
3.5 Arithmetic and logical operations
3.5.1 Load Valid address
Instruction Leal S,d, effect d<-&s
Writes a valid address to the destination operand
3.5.21 Yuan operation and two Yuan operation
Unary operation: Only one operand, both source and destination, can be a register or a memory location
Binary operation: The second operand is the source and the destination, two operands cannot be the memory location at the same time
3.5.3 Shift Operation
The shift amount is given and the shift value is given, and the arithmetic and logical right shifts can be made, but only 0-31 bits are shifted.
3.6 Control
3.6.1 Condition Code
CF Carry Flag
ZF 0 Logo
SF symbol Flag
of overflow flag
3.6.2 Access Condition Code
Condition codes are usually not read directly and are commonly used in three ways:
1, according to a combination of criteria code, a byte is set to 0 or 1;
2, the condition jumps to some other part of the program;
3. Conditionally transmit data.
3.6.3 Jump instruction and its encoding
Jump instruction JMP, the purpose of the jump is indicated by a label, the label
A jump is conditional, depending on a combination of criteria code, or a jump or continuation of the next command of a code sequence
JMP directives:
3.6.4 Translation Conditions Branch
The most common way to translate conditional expressions and statements from the C language into machine code is to combine conditional and unconditional jumps.
3.6.5 Cycle
1.do-while
To judge by execution first.
2.while
First execution, that is, the first implementation may terminate
3.for
3.6.6 Conditional Delivery Instructions
The traditional way to implement conditional operations is to take advantage of controlled conditional shifts.
Conditional transfer of data is an alternative strategy that evaluates two outcomes of a conditional operation before selecting one based on whether the condition is satisfied.
3.6.7switch statements
The switch statement can be used in multiple branches based on an integer index value, which is particularly useful when dealing with tests with multiple possible results, which not only improve the readability of C code, but also make the implementation more efficient by using the data structure of the jump table.
3.7 Process
A procedure call involves passing data (in the form of procedure parameters and return values) and control from part of the code to another part, in addition to allocating space for local variables of the procedure on entry, and freeing the space when exiting.
3.7.1 Stack frame structure
The machine uses stacks to pass process parameters, store return information, save registers for later recovery, and local storage. The portion of the stack allocated for a single process is called a stack frame.
Suppose the procedure P (caller) calls the procedure Q (callee), then the parameter of Q is placed in the stack frame of P, and when P calls Q, the return address in P is pressed into the stack to form the end of the stack frame of p. The return address is where the program should continue to execute when it returns from Q. The stack frame of Q starts with the value of the saved frame pointer, followed by the value of the other registers that are saved.
The process Q also uses stacks to hold other local variables that cannot be stored in the register, for the following reasons:
• There are not enough registers to store all local variables
• Some local variables are arrays or structs, so you must access them through an array or struct reference
• To use the address operator & for a local variable, we must be able to generate an address for it
3.7.2 Transfer Control
The following table is a command that supports procedure calls and returns:
The call command has a target, which indicates the address of the instruction at the beginning of the called process, which can be either direct or indirect, the target of the direct call in the assembly code is a symbol, and the target of the introduction call is * followed by an operand designator.
The effect of the call instruction is to put the return address into the stack and jump to the beginning of the called procedure.
The RET instruction pops the address from the stack and jumps to that position.
3.7.3 Register Usage Conventions
You must guarantee that when a caller invokes the callee, the callee does not overwrite the value of a register that the caller will later use.
Two ways to achieve the above requirements:
• Before calling Q, the value of y is stored in its own stack frame, and when Q returns, process P can remove the value of y from the stack, that is, the caller holds the value of Y.
• Save the value of Y in the callee Save register, and restore the value before returning.
Information Security System Design Foundation Fourth Week study summary