Running process "Go" of Linux system calls

Source: Internet
Author: User
Tags naming convention posix

This article transferred from: http://blog.csdn.net/kernel_learner/article/details/7331505

In Linux, system calls are the only means of user-space access to the kernel, which is the only legitimate entry for the kernel.

In general, applications are programmed through the Application Programming interface (API) rather than directly through the system call, and this programming interface does not actually need to correspond to the system calls provided by the kernel. An API defines the programming interfaces used by a set of applications. They can be implemented as a system call, or can be implemented by invoking multiple system calls, even if no system calls are being used. In fact, APIs can be implemented on a variety of operating systems, providing the exact same interface to the application, and they may have different implementations on those systems.

In the Unix world, the most popular application programming interface is based on the POSIX standard, and Linux is POSIX-compatible.

From the programmer's point of view, they only need to deal with the API, and the kernel only deal with system calls, library functions and applications are not the kernel concerned about how to use system calls.

System calls (often called Syscalls in Linux) are usually called through functions. They usually need to define one or several parameters (input) and may have some side effects. These side effects indicate success (0 values) or errors (negative values) by a long type of return value. The error code is written to the errno global variable when an error occurs in the system call. By calling the Perror () function, you can translate the variable into an error string that the user can understand.

There are two special implementations of system invocation:

1) There are asmlinkage qualifiers in the function declaration that inform the compiler to extract only the parameters of the function from the stack.

2) system call GetXXX () is defined in the kernel as sys_getxxx (). This is the naming convention that all system calls in Linux should follow.

System call Number: In Linux, each system call is given a system call number, which can be associated with a system call with this unique number. When a user-space process executes a system call, the system invocation number is used to indicate which system call to execute, and the process does not refer to the name of the system call. Once the system call number is assigned, there can be no more changes (otherwise the compiled application crashes), and if a system call is deleted, the system call number it occupies is not allowed to be recycled. Linux has an "unused" system call Sys_ni_syscall (), which does not do any other work except to return-enosys, which is specifically designed for invalid system calls. It is rare, but if a system call is removed, the function is responsible for "filling the empty space".

The kernel records the list of all registered system calls in the system call table, stored in sys_call_table. It is architecture-related and is typically defined in ENTRY.S. This table specifies a unique system call number for each valid system call.

User-space programs cannot execute kernel code directly. They cannot directly invoke kernel-space functions because the kernel resides on the protected address space, and the application should somehow notify the system that it needs to perform a system call and the system switches to the kernel state so that the kernel can execute the system call on behalf of the application. This mechanism for notifying the kernel is implemented through soft interrupts. Soft interrupts on the x86 system are generated by the INT$0X80 directive. This instruction triggers an exception that causes the system to switch to the kernel state and execute the 128th exception handler, which is the system call handler, called System_call (). It is closely related to the hardware architecture and is usually written in assembly language in the Entry.s file.

All system calls are stuck in the same way as the kernel, so it's not enough to just get into kernel space. Therefore, the system call number must be passed to the kernel. On x86, this delivery action is implemented by loading the call number into the EAX register before triggering a soft interrupt. This allows the system call handler to get the data from the EAX once it is run. The above-mentioned System_call () checks the validity of a given system call by comparing it to nr_syscalls. If it is greater than or equal to Nr_syscalls, the function returns-enosys. Otherwise, the corresponding system call is executed: called *sys_call_table (,%eax, 4);

Because table entries in the system call table are stored in 32-bit (4-byte) types, the kernel needs to multiply the given system call number by 4, and then use the resulting results to query the location of the table. As shown in Figure A:

As mentioned above, some external parameter input is required in addition to the system call number. The simplest way to do this is to store the parameters in the registers as if they were passing the system call number. On x86 systems, Ebx,ecx,edx,esi and EDI store the first 5 parameters in order. It is rare to need six or more than six parameters, at which point a separate register should be used to hold pointers to all of these parameters in the user-space address. The return value to the user space is also passed through the register. On the x86 system, it is stored in the EAX register.

System calls must be carefully checked to see if all of their arguments are valid. System calls are performed in kernel space. If the user passes the illegal input to the kernel, the security and stability of the system will be greatly tested. One of the most important checks is to check that the user-supplied pointer is valid, and the kernel must ensure that, before receiving a pointer to a user space:

1) The memory area pointed to by the pointer belongs to the user space
2) The memory area pointed to by the pointer is in the address space of the process
3) If it is read, the read memory should be marked as readable. If it is write, the memory should be marked as writable.

The kernel provides two methods to complete the necessary checks and the back-and-forth copy of the data between the kernel space and the user space. These two methods must have one called.

Copy_to_user (): 3 parameters are required to write data to a user space. The first parameter is the destination memory address in process space. The second one is the source address within the kernel space. The third is the length of the data (in bytes) that needs to be copied.
Copy_from_user (): 3 parameters are required to read data to a user space. The first parameter is the destination memory address in process space. The second is the source address within the kernel space. The third is the length of the data (in bytes) that needs to be copied.
Note: Both of these are likely to cause blocking. This happens when the page containing the user data is swapped out to the hard disk rather than to the physical memory. At this point, the process sleeps until the page-fault handler re-swapped the pages from the hard disk back to physical memory.

The kernel is in the process context when it executes the system call, and the current pointer points to the present task, which is the process that raised the system call. In the context of a process, the kernel can hibernate (such as when a system call is blocked or explicitly called schedule ()) and can be preempted. When the system call returns, the control is still in System_call (), which ultimately takes care of switching to user space and letting the user process continue.

Adding a system call to Linux is a simple matter, how to design and implement a system call is the problem. The first step in implementing a system call is to determine its purpose, which is clear and unique, and do not attempt to write multipurpose system calls. The IOCTL is a negative case. The parameters of the new system call, the return value, and the error code are all critical. Once a system call has been written, it is trivial to register it as a formal system call, usually in the following steps:

1) Add a table entry at the end of the system call table (typically located in Entry.s). Starting with 0, the system table entry's position in the table is its system call number. As a 10th system call is assigned to the system call number 9.
2) Any architecture, the system call number must be defined in Include/asm/unistd.h
3) The system call must be compiled into the kernel image (cannot be compiled into a module). This can be as long as it is put into a related file under the kernel/.

The user's program cannot execute kernel code directly. They cannot directly invoke kernel functions because the kernel resides in the protected address space. So the application should somehow notify the kernel that it needs to execute a system call and want the system to switch to the kernel state so that the kernel can execute the system call on behalf of the application.

The mechanism for notifying the kernel is implemented by means of a soft interrupt mechanism: by throwing an exception that causes the system to switch to the kernel state to execute the exception handler. The exception handler at this point is actually the system call handler.

Typically, system calls are supported by the C library, and the user program can use system calls (or use library functions, which are actually called by library functions) by including the standard header files and linking to the C library. Fortunately, Linux itself provides a set of macros that are used to access the system calls directly. It sets the register and invokes the INT $0x80 directive. These macros are _syscalln (), where the range of n is from 0 to 6. Represents the number of arguments that need to be passed to the system call. This is because the macro must know exactly how many parameters are pressed into the register in what order. Take the open system call as an example:

The open () system call definition is as follows:
Long open (const char *filename, int flags, int mode)
The macro that calls this system call directly in the form of:
#define Nr_open 5
_syscall3 (Long, open, const char *, filename, int, flags, int, mode)

This allows the application to use open () directly. Call the open () system call to place the above macro directly in the application. For each macro, there are 2+2*n parameters. The meaning of each parameter is simple and straightforward, which is not explained in detail here.

Running process "Go" of Linux system calls

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.