The first column of the output information is the address code corresponding to the command. It can be used to set the breakpoint during program execution:
ALD> Break 0x08048088 Breakpoint 1 set for 0x08048088 |
After the breakpoint is set, run the Run Command to run the program. When a breakpoint occurs, the system automatically suspends the program and displays the current values of all registers:
ALD> RUN Starting program: Hello Breakpoint 1 encountered at 0x08048088 Eax = 0x00000004 EBX = 0x00000001 ECx = 0x08049098 edX = 0x 0000000f ESP = 0xbffff6c0 EBP = 0x00000000 ESI = 0x00000000 EDI = 0x00000000 DS = 0x0000002b es = 0x0000002b FS = 0x00000000 GS = 0x00000000 Ss = 0x0000002b cs = 0x00000023 EIP = 0x08048088 eflags = 0x00000246 Flags: pf ZF if 08048088 CD80 int 0x80 |
To debug the assembly code in one step, run the following command:
ALD> next Hello, world! Eax = 0x0000000f EBX = 0x00000000 ECx = 0x08049098 edX = 0x0000000f ESP = 0xbffff6c0 EBP = 0x00000000 ESI = 0x00000000 EDI = 0x00000000 DS = 0x0000002b es = 0x0000002b FS = 0x00000000 GS = 0x00000000 Ss = 0x0000002b cs = 0x00000023 EIP = 0x0804808f eflags = 0x00000346 Flags: pf zf tf if 0804808f b80000000 mov eax, 0x1 |
To obtain a detailed list of all the Debugging commands supported by ALD, you can use the help command:
ALD> help Commands may be abbreviated. If a blank command is entered, the last command is repeated. Type 'help <command> ''for more specific information on <command>. General commands Attach clear continue detach disassemble Enter examine file help load Next quit register run set Step unload window write Breakpoint related commands Break Delete disable enable ignore Lbreak tbreak |
Iv. system calls
Even the simplest assembler program will inevitably use operations such as input, output, and exit. To perform these operations, you must call the services provided by the operating system, that is, system calls. Unless your program only performs addition, subtraction, multiplication, division, and other mathematical operations, it will be difficult to avoid using system calls. In fact, except for different system calls, assembly programming of various operating systems is often very similar.
On the Linux platform, there are two ways to use system calling: using the encapsulated C library (libc) or calling directly through assembly. The method of using the Linux kernel service is the most efficient way to directly call the system call through the assembly language, because the generated program does not need to be linked to any library, but directly communicates with the kernel.
Like Dos, system calls in Linux are also implemented through interruptions (INT 0x80. When executing the int 80 command, the register eax stores the function number of the system call, and the parameters passed to the system call must be placed in the registers EBX, ECx, EDX, ESI, in EDI, after the system call is completed, the return value can be obtained in the register eax.
All system call function numbers can be found in the file/usr/include/bits/syscall. h. For ease of use, they are defined using macros such as sys _ <Name>, such as sys_write and sys_exit. For example, the frequently used write function is defined as follows:
Ssize_t write (int fd, const void * Buf, size_t count ); |
The function is ultimately implemented through the sys_write system call. According to the above conventions, the parameters FB, Buf, and count exist in the registers EBX, ECx, and EDX respectively, while the system call number sys_write is placed in the register eax, after the int 0x80 command is executed, the returned value can be obtained from the register eax.
You may have discovered that at most five registers can be used to save parameters during system calls. Is the number of parameters called by all systems not greater than 5? Of course not. For example, the MMAP function has six parameters. These parameters must be passed to the system to call sys_mmap:
Void * MMAP (void * Start, size_t length, int Prot, int flags, int FD, off_t offset); c |
When the number of parameters required for a system call is greater than 5, when the int 0x80 command is executed, the system call function number still needs to be saved in the register eax, the difference is that all parameters should be placed in a contiguous memory area, and the pointer pointing to the memory area should be saved in the register EBX. After the system call is complete, the returned values are still stored in the register eax.
Because we only need a contiguous memory area to store system call parameters, we can use stacks to pass the parameters required for system calls just like common function calls. Note that Linux uses the C-language call mode, which means that all parameters must be pushed to the stack in the reverse order, that is, the last parameter is pushed to the stack first, the first parameter is then last written to the stack. If the stack is used to pass the parameters required by the system call, the current value of the stack pointer should also be copied to the Register EBX when the int 0x80 command is executed.
V. Command Line Parameters
In Linux, when an executable program is started through a command line, the required parameters are saved to the stack: argc first, next, the array argv pointing to the parameters of each command line, and finally the environment variable pointer data envp. When compiling an assembly language program, you often need to process these parameters. The following Code demonstrates how to process command line parameters in assembly code:
Example 3. process command line parameters
# Args. s . Text . Globl _ start _ Start: Popl % ECx # argc Vnext: Popl % ECx # argv Test % ECx, % ECx # NULL pointer indicates end JZ exit Movl % ECx, % EBX Xorl % edX, % edX Strlen: Movb (% EBX), % Al INC % edX INC % EBX Test % Al, % Al Jnz strlen Movb $10,-1 (% EBX) Movl $4, % eax # system call number (sys_write) Movl $1, % EBX # file descriptor (stdout) Int $0x80 JMP vnext Exit: movl $1, % eax # system call number (sys_exit) Xorl % EBX, % EBX # exit code Int $0x80 RET |
6. GCC inline assembly
Although compiled programs run fast, the development speed is very slow and the efficiency is very low. If you only want to optimize key code segments, it may be better to embed Assembly commands into a C-language program to take full advantage of the respective features of the advanced language and assembly language. But in general, embedding Assembly statements in C code is much more complicated than the "pure" assembly language code, because it is necessary to solve how to allocate registers, and how to combine with the variables in C code.
GCC provides good support for inline assembly. The most basic format is:
_ ASM _ ("ASM statements "); |
For example:
To execute multiple Assembly statements at the same time, separate the statements with "\ n \ t", for example:
_ ASM _ ("pushl % eax \ n \ t" "Movl $0, % eax \ n \ t" "Popl % eax "); |
The Assembly statements embedded in C code are hard to have nothing to do with other parts. Therefore, the complete inline assembly format must be used more often:
_ ASM _ ("ASM statements": outputs: Inputs: registers-modified ); |
The Assembly statement inserted into the C code is separated by ":". The first part is the assembly code, which is usually called the instruction department, the format is basically the same as that used in assembly languages. The command part is required, while the other part can be omitted based on the actual situation.
When embedding Assembly statements into C code, how to combine operands with variables in C code is a big problem. GCC uses the following method to solve this problem: the programmer provides specific instructions, and the use of registers only requires the "sample" and constraints, GCC and gas are responsible for how to combine registers and variables.
In the command department of the GCC inline assembly statement, the number prefixed with ''%'' (for example, % 0, % 1) indicates the "sample" operand of the register. When the instruction Department uses several sample operands, it indicates that several variables need to be combined with registers, so that GCC and gas will properly process the compilation and compilation according to the given constraints. Because the sample operands also use ''' % ''as the prefix, two ''' %'' should be added before the register name when specific registers are involved to avoid confusion.
The output department is followed by the instruction Department. It is a condition that specifies how the output variable is combined with the sample operand. Each condition is called a "constraint" and can contain multiple constraints when necessary, separate them with commas. Each output constraint starts with the sign ''='', followed by a description of the operand type, and finally the constraint on how to combine with the variable. All registers or operands that combine with the operands described in the output part do not retain the content before execution after the embedded assembly code is executed, this is the basis for GCC in scheduling registers.
The output part is followed by the input part. The format of the input constraint is similar to that of the output constraint, but it does not contain the ''='' sign. If a register is required for an input constraint, GCC allocates a register for it during preprocessing and inserts necessary commands to load the operands into the register. Registers or operands that are combined with the operands described in the input part are not reserved after the embedded assembly code is executed.
Sometimes in some operations, in addition to the registers used for data input and output, multiple registers are also used to save the intermediate calculation results, which will inevitably destroy the content of the original register. In the last part of the GCC inline assembly format, you can describe the registers that will produce side effects so that GCC can take appropriate measures.
The following is a simple example of inline assembly:
Example 4: inline assembly
/* Inline. C */ Int main () { Int A = 10, B = 0; _ ASM _ volatile _ ("movl % 1, % eax; \ n \ r" "Movl % eax, % 0 ;" : "= R" (B)/* output */ : "R" (a)/* input */ : "% Eax");/* unaffected REGISTERS */ Printf ("Result: % d, % d \ n", a, B ); } |
The preceding Procedure assigns the value of variable A to variable B, which must be described as follows:
- Variable B is the output operand, which is referenced by % 0, and variable A is the input operand, which is referenced by % 1.
- Both the input and output operations use R constraints to store variables A and B in registers. The difference between an input constraint and an output constraint is that an output constraint has one more constraint modifier ''= ''.
- When using the register eax in an inline assembly statement, add two ''%'' before the register name, namely % eax. In inline assembly, variables are identified using % 0, % 1, and so on. Any identifier with only one ''%'' is regarded as an operand rather than a register.
- The last part of the inline assembly statement tells GCC that it will change the value in the register eax. GCC should not use this register to store any other value during processing.
- Because variable B is specified as the output operand, after the inline assembly statement is executed, the saved value is updated.
The operands used in inline assembly start from the first constraint in the output part and start from 0. Each constraint is counted once. When the instruction Part references these operands, you only need to add ''%'' before the sequence number as the prefix. Note that When referencing an operand, the instruction department of an inline assembly statement always uses it as a 32-bit long word, but the actual situation may need words or bytes, therefore, the correct qualifier should be specified in the constraints:
Qualifier |
Meaning |
"M", "V", "O" |
Memory Unit |
"R" |
Any register |
"Q" |
Registers eax, EBX, ECx, and EDX |
"I", "H" |
Direct operand |
"E" and "F" |
Floating Point Number |
"G" |
Arbitrary |
"A", "B", "C", "D" |
Registers eax, EBX, ECx, and EDX |
"S" and "D" |
Register ESI and EDI |
"I" |