Linux0.11 Core Series-2. System call Mechanism analysis

Source: Internet
Author: User

Label:

"All rights reserved, please specify the source of the reprint." Source: http://www.cnblogs.com/joey-hua/p/5570691.html "

Linux kernel from boot to initialization also looked at a number of source files, this time see kernel folder under the SYSTEM_CALL.S, this file is mainly the process of system call. But when it comes to system calls, not just this one file is so simple, there are too many things involved in it, here is a note of the complete mechanism from the establishment of the interrupt to the final call to the system call.

Suppose it is interpreted from the write function as a system tune.

The essence of the system call is that the user process needs to access the kernel-level code, but the user process has the lowest permissions, the kernel code is the highest permission, does not allow direct access, need to use the interrupt gate as the medium to achieve the permissions of the jump. Simply speaking, the user process calls an interrupt, and the interrupt accesses the kernel code. Learn how the Linux kernel is done here.

1. Creating an interrupt Descriptor form IDT

Because the interrupt is to be used, it is first to establish an interrupt descriptor, IDT, which functions as:

In the Head.s file, IDT is established, for example, to use the int 0x80, starting from the _idt to find the place where the offset is 0x80 to execute the code.

. align 3# aligns memory address boundaries in 8-byte fashion. _idt:.fill 256,8,0# IDT is uninitialized# 256 items, 8 bytes each, 0. IDT_DESCR: 6-byte operand of #下面两行是lidt instruction: length, base address, Word 256*8-1# IDT contains entries.long _idtlidt idt_descr# load Interrupt Descriptor Descriptor register value.

2. Establish 0x80 Interrupt

All system calls are implemented through the 0x80 interrupt, so the next step is to set up interrupt 0x80, in SCHED.C:

Sets the system call interrupt gate.  set_system_gate (0x80, &system_call);

Here by set_system_gate This macro definition of the 0x80 interrupt and function System_call associated, here first regardless of System_call, first look at Set_system_gate, in system.h:

////set the system call gate function. Parameter: N-interrupt number; addr-Interrupt program offset address. &idt[n] corresponds to the offset of the interrupt number in the Interrupt Descriptor table, the type of the interrupt descriptor is 15, and the privilege level is 3. #define SET_SYSTEM_GATE (N,ADDR) _set_gate (&IDT[N],15,3,ADDR)////set the gate description macros function. Parameters: GATE_ADDR-descriptor address; Type-descriptor field value; DPL-descriptor privileged layer value; addr-offset address. %0-(Type label word combined by Dpl,type);%1-(Descriptor low 4-byte address),//%2-(Descriptor high 4-byte address),%3-edx (program offset address addr),%4-eax (segment selector in high character).  #define _SET_GATE (GATE_ADDR,TYPE,DPL,ADDR) __asm__ ("MOVW%%dx,%%ax\n\t" \//combines the low character of the offset address with the selector to a descriptor low of 4 bytes (EAX).  The "MOVW%0,%%dx\n\t" \//combines the type marker and the offset high word into a descriptor high of 4 bytes (edx). The "Movl%%eax,%1\n\t" \//sets the gate descriptor to a low of 4 bytes and a height of 4 bytes respectively.  "Movl%%edx,%2":: "I" ((short) (0x8000 + (DPL << +) + (type << 8)), "O" (* (char *) (GATE_ADDR)), "O" (* (4 + (char *) (GATE_ADDR)), "D" ((char *) (addr)), "a" (0x00080000)) 

Here refer to the interrupt gate structure diagram , here set the privilege level is 3, the user process is also 3, you can directly access this interrupt, offset the address corresponding to the above System_call, that is, if the call interrupted int 0x80, then will go to access system_ The call function. Note that n here is 0x80, the "0x80],idt" in the IDT array, which is declared in Head.h, which becomes the symbol _idt, defined in Head.s, is associated with this.

3. Declaring the system call function

Take the write system function as an example to declare this function in WRITE.C:

_syscall3 (int, write, int, FD, const char *, BUF, off_t, Count)

_syscall3 is also a macro definition, in Unistd.h:

The macro function is called by a system with 3 parameters. Type name (Atype A, btype B, CType C)//%0-eax (__res),%1-eax (__nr_name),%2-EBX (a),%3-ecx (b),%4-edx (c). #define _SYSCALL3 (type,name,atype,a,btype,b,ctype,c) type name (Atype a,btype b,ctype c) {long __res; __asm__ volatile (" int $0x80 ":" =a "(__res):" "(__nr_# #name)," B "((long) (a))," C "((long) (b))," D "((long) (c))); if (__res>=0) return (type) __res; Errno=-__res; return-1; }

So the translation comes in the write.c can be written:

int write (int fd,const char* buf,off_t count) {long __res; __asm__ volatile ("int $0x80": "=a" (__res): "" (__nr_write ), "B" ((Long) (FD)), "C" ((long) (BUF)), "D" ((long) (count)); if (__res>=0) return (type) __res; Errno=-__res; return-1; }

is not all of a sudden clear, that is, if a user process to use the Write function, will call int 0x80 interrupt, and then put three parameters fd, buf, Count respectively into EBX, ECX, edx Register, there is a key is _nr_write, This value will be stored in the EAX register, what to do with, and so on, this is defined in the Unistd.h:

#define __nr_write 4

OK, now all kinds of initialization and declaration have been completed, everything is only owed to the East wind!

4. System Call procedure

When the user process calls the function write, the int 0x80 interrupt is called, and the 2nd above has already been said, if the call interrupts int 0x80 will go to the System_call function, SCHED.C:

extern int system_call (void);//System call Interrupt Handler (KERNEL/SYSTEM_CALL.S,80).

is defined in System_call, note that after compiling the header with _, the following code only intercepts the first half of the section:

_system_call:cmpl $NR _system_calls-1,%eax # call number if it is out of range, set 1 in EAX and exit. JA bad_sys_callpush%ds # Saves the original segment register value. Push%espush%FSPUSHL%edx # Ebx,ecx,edx The call parameters of the appropriate C-language function are placed in the system. PUSHL%ECX # push%ebx,%ecx,%edx as PARAMETERSPUSHL%EBX # to the system CALLMOVL $0x10,%edx # Set up ds,es to kernel SPAC Emov%dx,%ds # Ds,es points to the kernel data segment (the data segment descriptor in the Global descriptor table). mov%dx,%esmovl $0x17,%edx # FS points to local data Spacemov%dx,%fs # FS points to the local segment (the data segment descriptor in the local descriptor table). # The meaning of this operand is: call address = _sys_call_table +%eax * 4. See the description after the list. # sys_call_table in the corresponding C program in Include/linux/sys.h, which defines a table of address arrays containing 72 # system call C processing functions. Call _sys_call_table (,%eax,4) PUSHL%eax # puts the system calling number into the stack. (This explanation error, is the function return value into the stack) MOVL _current,%eax # Take the current task (process) data structure address?? eax

Note that the three code that starts with PUSHL%edx is the three parameters mentioned in the previous 3rd in turn from right to left. The point is Call _sys_call_table (,%eax,4) This code, translated by call [Eax*4 + _sys_call_table], according to 3rd, EAX is _nr_write value is 4, because _sys_call_ Table is an array of type int (*) () in Sys.h, which contains all the system call function addresses, so the translation is to access sys_call_table[4] that is sys_write function:

The system invokes the function pointer table. Used for system call interrupt handlers (int 0x80), as a jump table. Fn_ptr sys_call_table[] = {sys_setup, sys_exit, Sys_fork, Sys_read,  sys_write, ...}

Sys_write under FS read_write.c:

Intsys_write (unsigned int fd, char *buf, int count) {  struct file *file;  struct M_inode *inode, ...}

Okay, so far it's clear that this is the Sys_write function that inscrutable finally called. This concludes the analysis!

Linux0.11 Core Series-2. System call Mechanism analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: