Process Analysis of Linux system call

Source: Internet
Author: User
Tags naming convention posix


"Linux kernel design and implementation"

0 Summary

System call process for Linux:
Levels such as the following:
User program------>C library (i.e. API): INT 0x80----->system_call-------> System invoke Service Routines--------> Kernel programs
First of all, we often say that the user API is actually a system-provided C library.
The system call is implemented through the soft interrupt instruction int 0x80, and this INT 0x80 directive is encapsulated in the function of the C library.

(soft interrupts differ from what we often call hard interrupts in that soft interrupts are triggered by instructions, not by hardware peripherals.) )
INT 0x80 The run of this instruction will cause the system to jump to a preset kernel space address, which points to the system call handler. That is, the System_call function.
(Note: ...!.) System call handler System_call is not a system invoke service routine, the system invoke service routine is a kernel implementation function for a detailed system call. The system call handler is a boot process before the system invokes the service routine. is for the Int 0x80 directive, which is intended for all system calls.

Simply put, run no matter what system call. First, by calling the function in library C, the function will have a soft interrupt INT 0x80 statement. Then go to run system call handler System_call,
The system_call then goes to the detailed system call service routine, which runs the detailed system call. )
How does the System_call function find a detailed system invocation service routine? Find system Call Table sys_call_table! by system call number When the soft interrupt instruction int 0x80 is run, the system call number is placed in the EAX register, the System_call function reads the EAX register fetch, multiplies it by 4, generates an offset address, and then takes sys_call_table as its base address. Base address plus offset address, you will be able to get detailed system call service routine addresses!

Then the system invokes the service routine.

What needs to be explained is. The system invoke service routine only gets the number of references from the stack, so before System_call runs. The parameters are stored in the register first. The System_call runtime will first press these registers onto the stack.

System_call after exiting. The user is able to obtain (modified) parameters from the register.

In addition: the system calls through the soft interrupt int 0x80 into the kernel. Jump to the system call handler System_call function and run the corresponding service routine. However, because it represents the user process, this process is not part of the interrupt context, but the process context. So. The system calls the process, can access the user process a lot of information, can be preempted by other processes, can hibernate.
When the system call is complete, the control is handed back to the user process that initiated the call. The kernel will have one schedule.

Assuming that a higher-priority process or a time slice of the current process is exhausted, the higher-priority process is selected or the process is run again.

1 system call meaning
The Linux kernel sets up a set of subroutines that are used to implement system functions, called system calls.

System calls are similar to normal library function calls, only system calls are provided by the operating system core and executed in the core state. The normal function call is provided by the function library or the user itself. Executed in the user state.

In general, the process is not able to access the kernel. It cannot access the kernel's memory space nor can it call kernel functions. The CPU hardware determines these (which is why it is referred to as "protected mode"). The kernel provides a set of interfaces for interacting with processes that are executing on the user space. Through this interface, applications can access hardware devices and other operating system resources. This set of interfaces plays the role of Messenger between the application and the kernel, and the application sends various requests. The kernel is responsible for satisfying these requests (or having the application temporarily shelved).

Actually providing this set of interfaces is mainly to ensure the stability and reliability of the system. Avoid the application of reckless, causing great trouble.

The system call adds a middle tier between the user space process and the hardware device. The main role of this layer is three:
(1) It provides a unified hardware abstraction interface for user space.

For example, when you need to read some files, the application will not be able to control the type of disk and media, or even the file system in which the file is located.
(2) system calls ensure the stability and security of the system. As an intermediary between hardware devices and applications, the kernel is able to adjudicate the required visits based on permissions and other rules.

This can, for example, prevent applications from using hardware devices in an incorrect way, stealing resources from other processes, or making other things that compromise the system.
(3) Each process is executed in the virtual system, while the user space and the rest of the system provide such a layer of public interface. is also due to such considerations. Assuming that the application is able to access hardware without any knowledge of the kernel, it is almost impossible to achieve multitasking and virtual memory, and of course it is not possible to achieve good stability and security.

In Linux. System calls are the only means of user space access to the kernel. In addition to exceptions and interrupts, they are the only legitimate entry for the kernel.

2 Relationship of API/POSIX/C Library
In general, applications are programmed through the Application Programming interface (API) rather than directly through system tuning.

This is important because the programming interfaces used by the application do not actually need to correspond to the system call one by one provided by the kernel. An API defines the programming interfaces used by a set of applications. They can be implemented as a system call and can be implemented by invoking multiple system calls, without any system calls or problems. In fact, the API can be implemented on a variety of operating systems, providing the same interface to the application, which can be implemented on these systems in different ways.

In the Unix world. The most popular application programming interface is based on the POSIX standard, and its goal is to provide a set of largely UNIX-based portable operating system standards. POSIX is an excellent example of the relationship between APIs and system calls. On most UNIX systems. There is a direct relationship between API functions and system invocations defined based on POSIX.

Linux system calls, like most Unix systems, are provided as part of the C library as seen in.

The C library implements the main API for UNIX systems. Contains standard C library functions and system calls. All C programs are able to use C libraries, and because of the characteristics of the C language itself, other languages can also be very convenient to wrap them up for use.

From the point of view of the program ape, system calls don't matter. They just need to deal with the API. Instead. The kernel only deals with system calls, and how library functions and applications use system calls is not the kernel's concern.

The interface design for UNIX has a common maxim of "providing mechanisms rather than strategies".

In other words, a UNIX system call abstracts a function that is used to complete a certain purpose. How to do these functions with absolutely no need for the kernel to care. The differential treatment mechanism (mechanism) and strategy (policy) are a major highlight in Unix design. Most programming problems can be divided into two parts: what functionality is required (mechanism) and how to implement these functions (strategy).

3 Implementation of system calls
3.1 System call handlers
You may wonder: "When I enter Cat/proc/cpuinfo. How is the Cpuinfo () function called? "After the kernel is booted, the control flow is from the relatively intuitive" which function is called next? "changed to depend on system calls, exceptions, and interrupts.

User-space programs cannot run kernel code directly. They cannot directly invoke functions in kernel space because the kernel resides on the protected address space.

Assuming that the process can read and write directly to the kernel's address space, system security loses control.

So, the application should notify the system in some way, telling the kernel that it needs to run a system call and want the system to switch to the kernel state so that the kernel can run the system call on behalf of the application.

The mechanism for notifying the kernel is implemented by software interrupts. First, the user program sets the number of parameters for the system call. One of the parameters is the system call number. After the parameters are set, the program runs the system call command. The soft interrupt on the x86 system is generated by Int.

This command causes an exception: an event that causes the processor to switch to the kernel state and jump to a new address and start running the exception handler there. The exception handler at this point is actually the system call handler.

It is closely related to the hardware architecture.

The instructions for the new address will save the state of the program. Calculates which system call should be called, invokes the function that implements that system call in the kernel, restores the state of the user program, and then returns control to the user program.

A system call is a way in which a function defined in a device driver is finally called.
3.2 System call number
In Linux. Each system call is given a system call number.

This makes it possible to associate system calls with this unique number. When the process of user space runs a system call. This system invocation number is used to indicate which system call is to be run. The process does not mention the name of the system call.

The system call number is quite critical. Once assigned, no matter what changes are made, the compiled application crashes. Linux has an "not implemented" system call Sys_ni_syscall (), which, in addition to returning a enosys, does not do any other work, the error number is specifically for invalid system calls.

Because all system calls fall into the same way as the kernel, it is not enough to get into kernel space. Therefore, the system call number must be passed to the kernel. On x86, the system call number is passed to the kernel through the EAX register. Before trapping the kernel, user space puts the corresponding number of corresponding system calls into EAX.

This allows the system call handler to get the data from the EAX once it is executed. Other architectural implementations are similar.

The kernel records the list of all system calls in the system call table that have been registered. stored in the sys_call_table. It is architecture-related and is typically defined in ENTRY.S.

This table specifies a unique system call number for each valid system call. Sys_call_table is a table consisting of function pointers to kernel functions that implement various system calls:
ENTRY (sys_call_table)
. Long Symbol_name (sys_ni_syscall)/* 0-old "setup ()" System call*/
. Long Symbol_name (sys_exit)
. Long Symbol_name (sys_fork)
. Long Symbol_name (Sys_read)
. Long Symbol_name (Sys_write)
. Long Symbol_name (sys_open)/* 5 */
. Long Symbol_name (Sys_close)
. Long Symbol_name (SYS_WAITPID)

. Long Symbol_name (Sys_capget)
. Long Symbol_name (sys_capset)/* 185 */
. Long Symbol_name (Sys_sigaltstack)
. Long Symbol_name (Sys_sendfile)
. Long Symbol_name (sys_ni_syscall)/* STREAMS1 */
. Long Symbol_name (sys_ni_syscall)/* STREAMS2 */
. Long Symbol_name (sys_vfork)/* 190 */

The System_call () function checks its validity by comparing the given system call number to Nr_syscalls.

Assuming that it is greater than or equal to NR syscalls, the function returns a Enosys.

Otherwise. The corresponding system call is run.

Call *sys_ call-table (,%eax, 4)
Because the table entries in the system call table are stored in a 32-bit (4-byte) type, the kernel needs to multiply the given system call number by 4, and then use the resulting results to query its location in the table

3.3 Pass-through of parameters
In addition to the system call number. Most system calls also require some external input. So. In the event of an exception. These parameters should be passed from user space to the kernel. The simplest way to do this is to store the parameters in the registers as if they were passing the system call number.

On x86 systems, EBX, ECX, edx, ESI, and EDI store the first five parameters in sequence. It is rare to have six or six or more parameters, at which point a separate register should be used to hold pointers to all of these parameters in the user-space address.

The return value to the user space is also passed through the register. On the x86 system. It is stored in the EAX register. A lot of descriptive narratives about the system invocation handlers are for the x86 version number.

But don't worry, the implementation of all architectures is very similar.

3.4 Validation of references
System calls must carefully check that all of their parameters are valid.

For example. System calls related to file I/O must check that the file description descriptor is valid.

Process-related functions must check that the provided PID is valid. Each of the parameters must be checked. Ensure that they are not only valid, but also correct.

One of the most important checks is to check whether a user-supplied pointer is valid. Imagine that a process can pass pointers to the kernel without having to be checked, so it can give a pointer to it that does not have access to it, and spoof the kernel to copy data that it does not agree to, such as data that originally belonged to other processes.

Before receiving a pointer to a user space, the kernel must ensure that:
2 The memory area that the pointer points to belongs to user space. The process must not spoof the kernel to read the data in the kernel space.
2 The memory area pointed to by the pointer is in the address space of the process. The process must not trick the kernel into reading data from other processes.
2 if read, the memory should be marked as readable. Suppose it was written. The memory should be marked as writable. The process must not bypass memory access restrictions.

The kernel provides two methods to complete the necessary checks and a back-and-forth copy of the data between the kernel space and the user space. Attention. The kernel cannot lightly accept pointers from user space at any time! One of the two methods must be called.

In order to write data to the user space, the kernel provides copy_to_user (), which requires three of the parameters. The first number of parameters is the destination memory address in the process space.

The second one is the source address within the kernel space.

The last parameter is the length of the data (in bytes) that needs to be copied.

To read data from user space, the kernel provides copy_from_ user (), which is similar to Copy-to-user (). The function copies the data at the location specified by the second parameter to the location specified by the first parameter, and the length of the copied data is determined by the third parameter.

Assuming that the run fails, both functions return the number of bytes of data that were not able to complete the copy.

Assuming success, return 0. When the above error occurs, the system call returns the standard-efault.

Note that Copy_to_user () and Copy_from_user () are likely to cause clogging. This happens when the page that includes the user data is swapped out to the hard disk rather than to the physical memory. At this time The process sleeps until the page-fault handler returns the pages from the hard disk again to physical memory.

3.5 Return values for system calls
System calls (often called Syscalls in Linux) are usually called through functions.

They usually need to define one or several parameters (inputs) and may have some side effects. such as writing a file or copying data to a given pointer, and so on.

To prevent confusion with normal return values, the system call does not return the error code directly, but instead put the error into a global variable named errno.

The pass often uses a negative return value to indicate an error. Returning a value of 0 usually indicates success. Assuming a system call fails, you can read out the value of errno to determine where the problem lies. By invoking the perror () library function, the variable can be translated into an error string that the user can understand.

errno the error messages that are represented by different values are defined in errno.h, and you can also view them by command "Man 3 errno".

It is important to note that the value of errno is only set when a function error occurs, assuming that the function does not occur incorrectly, and that the value of errno is undefined and will not be placed as 0. In addition, it is best to put its value in a variable before handling errno. Because of the error handling process. Even if a function like printf () fails, it changes the value of errno.

Of course, the system call finally has a clear operation.

For example, such as the Getpid () system call. By definition it will return the PID of the current process. It's easy to implement in the kernel:
Asmlinkage long Sys_ getpid (void)
return current-> Tgid;

Although the above system calls are easy, we can still find two special places. First, note the Asmlinkage qualifier in the function declaration, which is a operas method. Used to tell the compiler to extract only the parameters of the function from the stack. This qualifier is required for all system calls. Second, note that the system call GET_PID () is defined in the kernel as Sys_ getpid.

This is the naming convention that all system calls in Linux should follow

4 Adding a new system call
Adding a new system call to Linux is a relatively easy task. How to design and implement a system call is a problem, but adding it to the kernel does not require much effort. Let's look at the steps required to implement a new Linux system call.

The first step in implementing a new system call is to determine its purpose.

What's it going to do? Every system call should have a clear purpose.

The use of multi-purpose system calls is not advocated in Linux (a system call that chooses to complete different work by passing different values). The IOCTL () should be considered as a counter-example.

What is the parameter, return value, and error code of the new system call? The interface of the system call should be concise. The number of references is as few as possible.

When designing interfaces, try to think about the future as much as possible.

Do you have unnecessary restrictions on functions? The more general the system call is designed, the better.

Not if this system calls today how to use the future also must be so used.

The purpose of the system call may be the same, but the method of use may change.

Is this system call portable?

Do not make the byte-length and byte-order of the machine if.

When you write a system call, always be aware of portability and robustness. Not only to consider the present, but also to make plans for the future.

When a system call has been written. It's a trivial task to register it as a formal system call:
Add a table entry at the end of the system call table.

Every hardware system that supports this system call must do this. From the beginning of 0, the system call's position in the table is its system call number.
For each of the supported architectures, the system call number must be defined in <asm/unistd.h>.
System calls must be compiled into the kernel image (cannot be compiled into a module). This is just going to put it in a related file under the kernel/.

Let's take a closer look at these steps with a fictitious system called F00 ().

First, we're going to add sys_foo to the system call table. For most architectures, the table bit is in the Entry.s file. Forms such as the following:
ENTRY (sys_ call_ table)
Long Sys_ Restart_ syscall/*0*/
. Long Sys_ Exit
Long Sys_ Fork
Long Sys_ Read
. Long Sys_write
We add the new system call to the end of the table:
. Long Sys_foo
Although the numbering was not clearly specified, the system call we added was assigned to the 283 system call number in order. For each architecture that needs to be supported, we have to add our own system calls to the system call table. Each architecture does not require the corresponding system call number.

Next, we add the system call number to <asm/unistd.h>, and its format is as follows:
/* This file includes the system call number */
#define_ nr_ Restart_ Syscall
#define NR Exit
#define NR Fork
#define NR Read
#define NR Write
#define NR-MQ getsetattr 282
We then add the following line to the list:
#define_ nr_ Foo 283

Finally, let's implement the F00 () system call. Regardless of the configuration, the system call must be compiled into the core kernel image, so we put it in the kernel/sys.c file. You can also put it in the code that is most closely related to its function.

Asmlinkage long Sys-foo (void)
That's it! Strictly speaking, it is now possible to call the F00 () system in user space.

Creating a new system call is easy. But they do not advocate this. Usually the module is better able to replace a new system call.

5 Interview system calls
5.1 System Call Context
The kernel is in the process context when the system call is run.

The current pointer points to the present task, which is the process that raised the system call.

In the context of a process, the kernel is able to hibernate and be preempted. These two points are very important. First of all. The ability to hibernate indicates that system calls can use most of the functionality provided by the kernel.

The ability to hibernate can greatly facilitate kernel programming. Being preempted in the context of a process, in fact, shows the same process as in user space. The current process is the same and can be preempted by other processes. Because the new process is able to use the same system call. So be careful. Make sure that the system call is heavy on the human.

Of course. This is also a problem that must be the same concern in symmetric multi-processing.

Control is still in System_call () when the system call returns. It will finally be responsible for switching to user space and allowing the user process to continue running.

5.2 System Invoke Interview Demo sample
The operating system uses the system call table to translate the system call number into a specific system call. The system call table includes the address of the function that implements each system call. For example, the read () system calls the function named Sys_read. The read () system call number is 3, so Sys_read () is in the fourth entry in the System call table (since the system call starts with a number of 0). Read the data from the address Sys_call_table + (3 * word_size) and get the address of the Sys_read ().

Once the correct system call address is found, it transfers control to that system call.

Let's look at defining the location of Sys_read (), the fs/read_write.c file. This function will find the file structure associated to the FD number (passed to the Read () function). The struct includes pointers to functions that read data for a particular type of file. After a few checks, it calls the file-related read () function to actually read the data from the file and return it.

File-related functions are defined elsewhere-for example, socket code, file system code, or device driver code. This is one aspect of a particular kernel subsystem that finally collaborates with the rest of the kernel.

After the read function is finished, it returns from Sys_read (), which switches the control to Ret_from_sys. It checks for tasks that need to be completed before switching back to user space. If there is nothing that needs to be done, then the state of the user process is restored. and return control to the user program.
5.3 Direct access to system calls from user space
Usually. System calls are supported by the C library.

The user program is able to use system calls (or call library functions, which are actually called by library functions) by including the standard header file and the C library link. But assuming you only write system calls, the GLIBC library may not provide support. Fortunately, Linux itself provides a set of macros for direct access to system calls. It sets the register and invokes the trap command. These macros are _syscalln (), where n ranges from 0 to 6. Represents the number of parameters that need to be passed to a system call, because the macro must know exactly how many parameters are pressed into the register in what order.

For example, the definition of an open () system call is:
Long open (const char *filename, int flags, int mode)
Instead of library support, the macro that calls this system call directly is in the form:
#define NR_ Open 5
Syscall3 (Long, open, const char*. filename, int, flags, int, mode)
This allows the application to directly use the open ()

For each macro, there is a n-th number of parameters. The first parameter corresponds to the return value type of the system call. The second parameter is the name of the system call. This is followed by the type and name of each parameter in the order of the system invocation parameters.

_NR_ Open is defined in <asm/unistd.h> and is the system call number. The macro is expanded into the C function of the inline assembly. The steps discussed in the previous section of assembly language Run press the system call number and parameters into the register and trigger a soft interrupt to fall into the kernel.

Call the open () system call to place the above macro directly in the application.

Let's write a macro to use the Foo () system call we wrote earlier, and then write the test code to show off what we've done.
#define NR Foo 283
_sysca110 (Long, foo)
int main ()
long stack size;
Stack_ Size=foo ();
printf ("The kernel stack
Size is 81d\n ", stack_ size);

6 actual use of attention

(1) system calls need to be pre-compiled and cured into the kernel. and need to be officially assigned a system call number

(2) The need to register system calls to each of the supported architectures

(3) system calls can not be directly interviewed in the script

(4) Try to avoid the creation of new system calls, can be replaced by the method of creating device nodes.

Process Analysis of Linux system call

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.