1. Data Representation
1
End in decimal format with D or D.
End with B or B
End with H or H in hexadecimal notation
Q or Q is used at the end of octal.
2-character ASCII Representation
The standard ASCII character set is divided into four groups, each with 32 characters.
Group 1 0 ~ 1fh is a group of non-printable characters, called control characters
The second group contains various punctuation marks, special characters, and numbers.
The third group contains 26 uppercase letters (~ 5ah) and 6 special characters
The fourth group contains 26 lower-case letters and ~ 7ah) 5 special characters and a controller
3 BCD
BCD is a binary representation of the decimal number. It can be in the following two formats:
A compressed BCD format
Four binary digits are used to represent a decimal digit.
B Non-compressed BCD format
4 CPU
The CPU is mainly composed of arithmetic logic components, control components, and registers. Its task is to execute the instruction sequence in the memory.
5. the assembler generally uses. s as the filename suffix.
As hello. S-O hello. o
LD hello. O-O hello
6 The name starting with "." In the assembly program is not a mnemonic of commands and will not be translated into machine commands.
. Section indicates dividing the code into several segments. When the program is loaded and executed by the operating system, each segment is loaded to a different address and has different read, write, and execution permissions.
. Data: The program data is read and writable. The global variables of the C program belong to the. Data Segment.
A symbol represents an address in assembly and can be used in commands.
Movl $1, % eax // data transfer command. A number 1 is generated inside the CPU and then transmitted to the eax register. The suffix L of mov indicates long, indicating a 32-bit transfer instruction.
The number generated inside the CPU is called the immediate number. In the assembler, $ is added before the immediate number, and % is added before the Register to distinguish it from the symbol name.
Int $0x80
The A int command is called a Soft Interrupt command. This command can be used to intentionally generate an exception. Exception Handling is similar to interrupt. The CPU switches from user mode to privileged mode, then jump to the kernel code to execute the exception handling program.
In the "B INT" command, the value 0x80 is a parameter. In exception handling, the parameter determines how to handle the problem. In the Linux kernel, an int 0x80 exception is called a system call.
The values of C eax and EBX registers are two parameters passed to the system call. The value of eax is the system call number, 1 indicates _ exit call, and EBX indicates the parameter passed to _ exit, that is, exit status.
7 x86 registers
General registers of X86 include eax, EBX, ECx, EDX, EDI, and ESI.
The Division instruction requires the Division to be included in the eax register, and the remainder to be included in the edX register.
X86 special registers include EBP, ESP, EIP, and efiags.
EIP: program counter
Eflags: stores the flags generated during computation.
Carry CF overflow of zero ZF negative SF
EBP and ESP are used to maintain the function call stack.
Data_items:
. Long 34,222
. Long indicates a set of numbers, each of which occupies 32 bits, which is equivalent to an array in C.
The array starts with data_items. The assembler uses the first address of the array as the address represented by the data_items symbol.
Out of. Long, common data declarations include:
. Byte declares the number of groups, each of which occupies 8 digits.
. ASCII example:. ASCII "Hello word"
The EDI (Destination Address Register) registers store the current position of the array.
EBX saves the maximum value found so far.
Movl data_items (, % EDI, 4), % eax
Transfers The 0th elements of the array to the eax register. Data_items is the first address of the array. The EDI value is the subscript of the array. 4 indicates that each element of the array occupies four bytes. The address of the EDI element in the array should be data_items + EDI * 4.
Start_loop indicates the start loop, and loop_exit indicates the end loop.
CMPL $0, % eax
The CMPL command subtract two operands, but the calculation result is not saved. It only changes the flag bit in the eflags register based on the calculation result. If the two operands are equal, the calculation result is 0, and the ZF position in eflags is 1.
Je is a conditional jump command. It checks the ZF bit in eflags. If the ZF bit is 1, a jump occurs. If the ZF bit is 0, no jump occurs.
Comparison commands and conditional jump commands are used in combination. The former changes the flag bit, and the latter makes a judgment based on the flag bit.
Incl % EDI // Add the value of EDI to 1
CMPL % EBX, % eax
Jle start_loop
Compare the current array element eax with the maximum value EBX found so far. If the former is smaller than or equal to the latter, the maximum value does not change. Jump to the beginning of the loop to compare the next number; otherwise, continue to execute the next command. Jle is also a conditional jump command. Le indicates less than or equal.
JMP is an unconditional jump command, with no judgment on any conditions.
8 registers
1 eax accumulators 2 EBX base address register 3 ECx count register 4 edX data register
1 ~~ 4. Data Register
5 esi source address change register 6 EDI destination address change register
5 ~ 6. Location change register
1 ~~ 6 General registers
7 EBP base address pointer 8 ESP Stack pointer
7 ~ 8 pointer registers
9 eflags mark register 10 EIP instruction pointer
9 ~~ 10 special registers
11 CS code register 12 DS data disconnection register 13 SS stack register 14 es additional segment register
11 ~~ 14-segment register
15 FS
16 GS
9 addressing mode
Memory addressing can be expressed in the command in the following general format:
Address_or_offset (% base_or_offset, % index, multiplier)
The address it represents can be calculated as follows:
Final address = address_or_offset + base_or_offset + multiplier * Index
Address_or_offset and multiplier must be constants, and base_or_offset and index must be registers.
Direct addressing:
Only address_or_offset addressing is used. For example, movl address, % eax transfers the 32-digit address to the eax register.
Indexed addressing mode ). Movl data_items (, % EDI, 4) and % eax belong to this addressing method, which is convenient for accessing array elements.
Indirect addressing mode ). Only use base_or_offset addressing, such as movl (% eax) and % EBX, to regard the value of the eax register as an address and transmit the 32-digit digits of this address to the EBX register. Note that it is separated from movl % eax and % EBX.
Base pointer addressing mode ). Only address_or_offset and base_or_offset are used for addressing. For example, movl 4 (% eax) and % EBX are convenient for accessing struct members. For example, the base address of a struct is saved in the eax register, the offset of a member in the structure is 4 bytes. You can use this command to read the member.
Immediate Mode ). That is, an operand in the instruction is the immediate number, for example, $12 in movl $12 and % eax. In fact, this has nothing to do with addressing, but it is also regarded as an addressing method.
Register addressing (register addressing mode ). That is, an operand in an instruction is a register, such as % eax in movl $12 and % eax. This has nothing to do with memory addressing, but it is also regarded as an addressing method.
10 elf files
Relocatable)
Executable)
Shared object)
Target file:
The ELF file format provides two different perspectives. For the assembler and the linker, the ELF file is a set of sections described by the section header table. When you execute an elf file, in the loader's view, it is a set of segments described by the program header table.
The Section declared in. Section in the assembler will become the section in the target file, and the assembler will automatically add some sections (such as symbol tables ).
A segment is a region with the same attributes loaded to the memory when the program runs. It consists of one or more sections. For example, two sections must be loaded into the memory and readable and writable, it belongs to the same segment.
The target file needs to be further processed by the linker, so there must be section header table; the executable file needs to be loaded and run, so there must be a program header table; and the shared library needs to load and run, dynamic links must be made during loading, so both section header table and program header table exist.
We know that if the global variables in C language are not initialized in the code, they will be initialized with 0 when the program is loaded. This type of data belongs. BSS segment, IT and. data segments are both readable and writable data, but in elf files. the data segment needs to occupy part of the space to save the initial value. BSS segment is not required. That is to say, the. BSS segment occupies only one section header and has no corresponding section. The size of memory occupied by the. BSS segment during program loading is described in section header.
. Rel. text tells the linker where the command needs to be relocated
The objdump tool can disassemble machine commands in the program