1. Overview
The only way that a user-space program accesses the kernel during a system call, GLIBC (C library) provides a set of encapsulation routines that, in addition to the library functions required by the standard C specification, are encapsulated by the system calls for user programming. Therefore, system calls also belong to an API. So, what are the ways that user-space programs can access system calls? A, using encapsulation functions
#include <unistd.h> ...
Getpid ();
......
Getpid is precisely the encapsulated function provided by the GLIBC library to system call Sys_getpid, which is defined in/usr/include/unistd.h. (From here we can also see: #include语句正是从/usr/include directory to start the search)
extern __pid_t getpid (void) __throw
__throw is a macro, in C, the macro is completely undefined, in C + +, the macro means that this function supports C + + throw exception features.
b, using common system calls
In addition to using encapsulation functions, GLIBC provides a unified package function for system calls. Let's think about this: if we update the kernel version and add a new system call to the new kernel version, but the GLIBC library doesn't provide an encapsulation of the system call, the universal system call comes in handy.
extern long int syscall (long int sysno,...) __throw
#include <unistd.h> ...
Syscall (__nr_getpid);
......
The definition of __nr_getpid is located in/usr/include/asm/unistd_32.h.
C. Use of inline compilations
For example, Getpid uses inline assembler to achieve:
......
int i;
ASM volatile ("int $80" \
: "=a" (i) \
: "0" (__nr_getpid) \
);
......
2. System Door
When using inline assembler to implement system calls, we see that we actually execute an int $80 instruction and then save the output in a eax register. Enter as __nr_getpid. Next we analyze what the int $80 does. To understand what this instruction does, we first need to understand the concept of "door" in Linux. In fact, literally, we can see that "door" represents the entrance of the kernel. When a program wants to execute some kernel code (interrupts, system calls, etc.), it must first pass through the "door".
The more commonly used doors in Linux systems include:
enum {
gate_interrupt = 0xe,//Interrupt gate
gate_trap = 0xF,
gate_call = 0xC,
gate_task = 0x5,
};
In this paper, we mainly analyze the trap door, the various door descriptors are defined as follows: (The various door Description field is consistent with the above definition, such as: Trap Door is 0xc). In fact, the interrupt gate and the trap door are basically consistent, the difference is that the if (interrupt flag bit processing is different). Interrupt door clears if flag bit during process transfer. )
So here's a brief analysis of the interrupt processing process in Linux.
In the Linux system has a 48-bit IDTR register, high 32-bit is the IDT table in memory position, low 16 bit is the size of the table. We can find the location of the IDT table by the IDTR register. It then finds the specific table entry in the IDT table based on the interrupt number. The description of each table item is given above. From this table entry we are able to get offset (an interrupt handler entry address in the paragraph), and the segment base is to be looked up from the Gdt/ldt. This allows the address of the interrupt handler to be addressed (the segment base + the offset within the segment). So when does the Linux kernel populate every table entry? We are here to specifically analyze the filling of the trap door (because the system call uses this "door") when the kernel fills this table entry, so that the int $80 can point to the entry address of the system call.
The specific calling process is as follows:
Start_kernel
------>trap_init
------>set_system_trap_gate (Syscall_vector, &system_call)
The Set_system_trap_gate function is the function that fills the system call door descriptor, Syscall_vector is 0x80.
# define Syscall_vector 0x80
This 80 is the index of the "gate" descriptor in the IDT table.
Next, we have a specific analysis:
static inline void set_system_trap_gate (unsigned int n, void *addr)
{
bug_on ((unsigned) n > 0xFF);
_set_gate (n, gate_trap, addr, 0x3, 0, __kernel_cs);
Set_system_trap_gate simply calls the _set_gate function. Gate_trap is the type of door, from the name we can also see is the "trap door." Addr is exactly the entry address of the system call program we need, where this address is the address of the kernel function System_call.
The DPL is a privileged descriptor, and for that door, dpl=3, it shows that this is a trap door that user-state programs can access. __kernel_cs is the segment descriptor for the "door", which shows that the segment selector is a "kernel code snippet." For this segment descriptor women can refer to the following figure:
That is, through __kernel_cs (12) We can find the corresponding table items in the GDT table to get the segment base, in fact, in Linux, each segment is from the 0x00000000 start 4G space.
static inline void _set_gate (int gate, unsigned type, void *addr,
unsigned dpl, unsigned ist, unsigned seg)
{
Gate_desc s;
Pack_gate (&s, type, (unsigned long) addr, DPL, ist, seg);
* * Does not need to be atomic because it are only do once
in * setup time
/write_idt_entry (idt_table, Gate, &s);
static inline void Pack_gate (Gate_desc *gate, unsigned char type,
unsigned long base, unsigned dpl, unsigned flags,
unsigned short seg)
{
gate->a = (seg << 16) | (Base & 0xFFFF);
Gate->b = (Base & 0xffff0000) | ((0x80 | type | (DPL << 5)) & 0xff) << 8);
A careful analysis of Pack_gate's code will show that it is consistent with the definition in the picture above.
Through the above analysis we found that: int $80 the execution of the program to the kernel function System_call. Before we analyze this function, we first introduce the concept of the kernel stack. For each user process, there is a kernel stack and a user stack. Used to store function call parameters, local variables, and other secondary data for user processes when they are executed in kernel space and user space, respectively.
In the kernel, data structure Pt_regs and user space program enter kernel space to push register value into kernel stack in the same order, the corresponding relation is shown in the following figure.
0 (%ESP)-%EBX
* 4 (%ESP)-%ecx
* 8 (%ESP)-%edx
* C (%ESP)-%esi
* (%ESP)-%edi
* (%ESP)-%EBP
* (%ESP)-%eax
* 1C (%ESP)-%ds
* (%ESP)-%es
* %ESP-%fs
* (%ESP)-%gs saved IFF! Config_x86_32_lazy_gs
* 2C (%ESP)-Orig_eax
* (%ESP)-%eip
* (%ESP)-%cs
* ( %esp)-%eflags
* 3C (%ESP)-%oldesp
* (%ESP)-%OLDSS
*
/arch/x86/kernel/entry_32.s is a detailed assembly code for the execution of the entire system call, and here we do a brief analysis.
ENTRY (system_call)
ring0_int_frame # can ' t unwind into user space anyway pushl_cfi%eax # Save Orig_eax
Save_all
Get_thread_info (%EBP)
# system call tracing in Operation/emulation
testl $_tif_work_syscall_entry,ti_flags (% EBP)
jnz syscall_trace_entry
Cmpl $ (nr_syscalls),%eax Jae-syscall_badsys syscall_call
:
Call *sys_call_table (,%eax,4)
movl%eax,pt_eax (%ESP) # Store The return value
We can see that the System_call function does some work, and then the execution of the program is transferred to the entry address of the appropriate interrupt handler.