One,Exceptions (Exception) and system call (System calls)
1.1 Traps
traps are intentional exceptions and are the result of an instruction from the processor executing the program. The most important use of traps is to provide an interface between the user program and the kernel, like a normal procedure call, called "System call." User programs often need to request services from the kernel, such as reading a file (read), creating a new process (fork), loading a new program (EXECV), or terminating the current process (exit). To allow controlled access to these kernel services, the processor provides a special "Syscall n" directive that can be executed when the user program wants to request service N. Executing the syscall instruction causes a trap to the exception handler, which decodes the parameter and invokes the appropriate kernel program.
1.1.1 System Call N
System calls run in kernel mode, kernel mode allows you to execute instructions for system invoke functions and access stacks defined in the kernel.
Examples of popular system calls:
Linux provides hundreds of system calls.
Source:/usr/include/sys/syscall.h.
1.1.2 Implementation
1. On the IA32 system, the system call is provided through a trap instruction called int n1, where n may be the index of any of the 256 entries in the IA32 exception table . Historically, system calls were provided through exception Ox80.
2. All parameters to the Linux system calling service are passed through the register. As a rule, register%eax contains system call numbers, registers%EBX,%ECX,%edx,%esi,%edi, and%EBP contain up to six arbitrary parameters. The stack pointer%esp is not available because the kernel overwrites it when it enters kernel mode.
The 3.C program uses the Syscall function to call directly any system calls that exist in the system call table . However, it is hardly necessary to do so in practice. For most system calls, the standard C library provides a convenient set of wrapper functions . These wrapper functions package the parameters together, fall into the kernel with the appropriate system call number , and then pass the return state of the system call back to the calling program. We refer to system calls and the wrapper functions associated with them as system-level functions .
4. The kernel records the list of all registered system calls in the system call table , stored in sys_call_table (sys_call_table is a table of function pointers to kernel functions that implement various system calls). It is architecture-related and is typically defined in ENTRY.S. This table specifies a unique system call number for each valid system call.
1.1.2.1 int Directive
It is necessary to say the INT directive.
The interrupt information can come from outside the CPU, such as an I/O device interrupt request, and can also come from the inside of the CPU and cause an internal interrupt by executing to a specific instruction. An int n instruction is one of the instructions that generates an internal interrupt, where the value range of n is generally 0~256.
-the INT n instruction uses an interrupt vector as an argument, which allows a program to call any interrupt handler.
The CPU executes an int n instruction, which is equivalent to an interrupt process that throws an n interrupt, and can invoke any interrupt handler in the program using an int directive.
1.1.2.2 int Ox80 (trapped in kernel)
When a process executes a system call, a function defined in the system call library is called first, and the function is usually expanded into _syscallN
the INT 0x80
kernel , and its parameters are passed to the kernel via registers. The interrupt handler for INT 0x80 is named system_call
. To ensure that the calling point, which returns to the user state after the kernel executes the system call, continues executing the user code, a context layer must be pressed into the kernel stack as it enters the kernel state, and a context layer pops up when returned from the kernel, so that the user process can continue to run.
Stack Switch on a call to a Different Privilege level:
When the
executes the INT directive, the following actions are actually completed:
(1) The high-priority kernel stack information (SS and ESP) is first obtained from the TSS (Task status segment) because the int instruction has a control transfer between different priorities;
(2) Keep the low-priority stack information (SS and ESP) in the high-priority stack (that is, the kernel stack);
(3) pushes the eflags, outer cs,eip into the high-priority stack (kernel stack).
(4) Load Cs,eip via IDT (control transfer to interrupt handler function)
Note2
The
then enters the interrupt handler for interrupt 0x80 System_call
, where a macro is first used Save_all
, and the macro is defined as follows:
# define Save_all CLD; PUSHL %es ; PUSHL %ds ; PUSHL %eax ; PUSHL %EBP ; PUSHL %edi ; PUSHL %esi ; PUSHL %edx ; PUSHL %ecx ; PUSHL %EBX ; MOVL $ ( __kernel_ds), %edx ; //set the kernel data segment address movl %edx , %ds ; //%ds is a data segment register movl %edx , %es ;
The function of the macro is to press the register context into the kernel stack for the system call to return to the recovery user state, on the other hand, the parameters passed to the system call in user mode are pressed into the kernel stack for internal use by the system call function. The parameters are assigned to each register before the user is trapped in the kernel, and then the parameters stored in the register are then pressed into the kernel stack by using Save_all after the core, so that the kernel can use the parameters passed in by the user.
system_call
part of the source code:
//before System_call, still in user mode, You have placed the parameters that the system calls function to use in the Register ENTRY (system_call) pushl %eax # Save Orig_eax //to save the return value%eax save_all //call Save_all get_current ( Span class= "hljs-variable" >%ebx ) Cmpl $ ( nr_syscalls), %eax //Compare the system call numbers in%eax to see if they are legal. Jae Badsys testb $0 x20,flags (%ebx ) # Pf_tracesys jne Tracesys call *symbol_name (sys_call_table) (, %eax , 4 ) //according to the system call number in Sys_call_ In table, look for the system to invoke the function pointer and jump into the system call function. . . . . . . //assembly instruction related knowledge can refer to "in-depth understanding of the operating System" chapter 3rd.
All the work done here is:
1. Save the%EAX register.
Because%EAX is the default register used to hold the return value of the system call, the contents are overwritten by the return value.
2. Call SAVE_ALL
the Save register context.
3. Determine whether the current call is a legitimate system call (the system call number is stored in%eax, it should be less than nr_syscalls, in unistd.h
which there is a macro definition #define NR_syscalls 294
).
4. If the PF_TRACESYS flag is set (Pf_tracesys is the process flag for Linux, which indicates that it is tracking), jump to Tracesys, where the current process will be suspended and sent Sigtrap to its parent process.
5. If the PF_TRACESYS flag is not set, the lookup system invokes the handler function for the function pointer, and then the handler functions with%eax*4 as an offset (%eax holds the system call number) and finds in the system call table Sys_call_table The system invokes the entry address of the function and jumps to the entry address.
1.1.3 Instances
1) Open
2) Write
int main{ write(1"hello, world\n"13) //第一个参数"1"指示写到stdout;第二个参数是要写的字节序列;第三个参数给出要写的字节数。 exit(0);}
Code/ecflhello-asm.sa
. Section. Datastring:. ASCII"Hello, world\n."String_end:.equ Len, String_end-string. Section.text. Globl mainmain:?//first, call write (1, "Hello, world\n")Movl$4,%eax //system Call number 4Movl $,%EBX //stdout has descriptor 1Movl$string,%ecx //Hello World StringMovl$len,%edx //String lengthint $x80//system Call Code?//next, call exit (0)Movl $,%eax //system Call number 1Movl $,%EBX //argument is 0int $x80//system Call Code
- The number n here is an exception number. It satisfies the concept of anomaly number-locating the corresponding anomaly vector in the exception table.
- ESP is a register that acts as a pointer and is commonly used for stack operations. Here, ESP is used as the stack top offset pointer, and SS is its default segment register (stack base address). The offset address of the EIP store instruction, together with CS, points to the address of the command that will be executed. CS is the default code segment register, which, together with the EIP register, points to the address that is currently executing. IDT is an interrupt description table that contains the entry address of each exception number and its corresponding interrupt handler. DS Data Segment register, which, together with ESI, points to the memory that the instruction will process, and all memory operations directives use it to specify the operation segment by default. ESI: Typically used in memory as a "source address pointer", DS is its default data segment register. Get more information?
The trap of process-exception control flow