Transferred from: http://www.ibm.com/developerworks/cn/linux/l-system-calls/
Explore the SCI and add your own calls
Linux® system Calls--we use them every day. But do you know how the system calls are performed between the user space and the kernel? This article explores the Linux system invocation Interface (SCI), learns how to add new system calls (and other ways to implement this), and introduces some of the tools related to SCI.
1 reviews:
M. Tim Jones ([email protected]), consultant engineer, Emulex
September 21, 2010
Develop and deploy your next application on the IBM Bluemix cloud platform.
Get started with your trial
A system call is an interface between a user-space application and a service provided by the kernel. Because the service is provided in the kernel, you cannot perform a direct call; instead, you must use a process to cross the boundaries between user space and the kernel. There are different ways to implement this functionality in a particular schema. Therefore, this article will focus on the most common architecture--i386.
In this article, I'll explore Linux SCI, show how to add a system call to the 2.6.20 kernel, and then use this function from the user space. We'll also look at some of the functions that are useful when making system calls, and other options for system calls. Finally, we will describe some of the ancillary mechanisms associated with system calls, such as tracking the use of system calls in a process.
Sci
The implementation of system calls in Linux varies depending on the architecture, and is different even on a given body architecture. For example, an earlier x86 processor used an interrupt mechanism to migrate from user space to kernel space, but the new IA-32 processor provided some instructions to optimize the conversion (usage sysenter
and sysexit
instruction). Because there are a lot of methods and the end result is very complex, this article will focus on the surface of the interface details. For more detailed information, please refer to the resources at the end of this article.
To improve on the sci of Linux, you do not need to fully understand the internal principles of SCI, so I will use a simple system call process (see Figure 1). Each system call is multiplexed into the kernel through a single entry point. The EAX register is used to identify a system call that should be called, which is specified in the C
library (each call from a user-space application). When the system's C
Library call index and parameters are loaded, a software interrupt (0x80 interrupt) is invoked, which executes the system_call
function (through the interrupt handler), which processes all system calls according to the identity in the EAX content. After several simple tests, the system_call_table
actual system calls are performed using the indexes contained in the EAX. After returning from the system call, the final execution syscall_exit
is performed and the call returns the resume_userspace
user space. It then continues to execute in the C
library, which is returned to the user application.
Figure 1. Streamlined process for system calls using interrupt methods
The core of SCI is the system called multi-path decomposition table. As shown in table 2, use the index provided in EAX to determine which system call () in the table to invoke sys_call_table
. Some examples of table content are given in the figure, as well as the location of the content. (For more information on multi-channel decomposition, see sidebar " system call multiplexing ")
Figure 2. System call tables and various links
Back to top of page
Add a Linux system call
Adding a new system call is primarily a procedural operation, but there are a few things to be aware of. This section describes the construction of several system calls to demonstrate their implementation and the use of user-space applications.
To add a new system call to the kernel, you need to perform 3 basic steps:
- Add a new function.
- Update the header file.
- Update the system call table for this new function.
Note: This process ignores the need for user space, which I'll cover later.
The most common scenario is that you create a new file for your own function. However, for the sake of simplicity, I add my own new function to the existing source file. The first two functions shown in Listing 1 are simple examples of system calls. Listing 2 provides a slightly more complex function that uses pointer parameters.
Listing 1. Simple kernel functions for system invocation examples
sys_getjiffies (void) { sys_diffjiffies(Long ujiffies) { return (long) get_jiffies_64 ()-ujiffies;}
In Listing 1, we provide two functions for jiffies monitoring. (For more information on jiffies, see sidebar "Kernel jiffies"). The first function returns the current jiffy, and the second function returns the difference between the current value and the value passed in. Note asmlinkage
the use of modifiers. This macro (defined in linux/include/asm-i386/linkage.h) tells the compiler to pass all the function arguments in the stack.
Listing 2. The last kernel function of the system invocation example
sys_pdiffjiffies (Long ujiffies, long __user *presult) { Long cur_jiffies = (long) get_jiffies_64 (); Long result; int err = 0; if (presult) { result = cur_jiffies-ujiffies; Put_user (result, presult); } return err? -efault:0;}
Listing 2 shows a third function. This function uses two parameters: a long
type, and a pointer to the one that is __user
defined long
. The __user
macro simply tells the compiler (through) that the noderef
pointer should not be dereferenced (because it is meaningless in the current address space). This function calculates the difference between the two jiffies values, and then provides the result to the user through a user-space pointer. The function puts the put_user
result value presult
in the specified user space location. If an error occurs during this operation, it will be returned immediately, and you can also notify the user of the space caller.
For step 2, I made an update to my opponent's file: space is arranged for these new functions in the system call table. For this example, I updated the Linux/include/asm/unistd.h header file with the new system call number. The update is shown in the bold in Listing 3.
Listing 3. Update unistd.h file to schedule space for new system calls
Click to view the code listing
You now have your own kernel system calls, and the numbers that represent these system calls. The next thing to do is to establish a peer relationship between these numbers (the table index) and the function itself. This is the 3rd step to update the system call table. As shown in Listing 4, I'll update the linux/arch/i386/kernel/syscall_table for this new function. S file, it populates the specific index shown in Listing 3.
Listing 4. Updating system call tables with new functions
. Long Sys_getcpu.long sys_epoll_pwait. Long sys_getjiffies/* *. Long sys_diffjiffies. Long Sys_pdiffjiffies
Note: the size of this table is defined by a symbolic constant NR_syscalls
.
We have now completed the update to the kernel. The kernel must then be recompiled and the new image used for the bootstrap to become available before testing the user space application.
Read and write to user memory
The Linux kernel provides several functions that can be used to move system call parameters to or from the user space. Methods include simple functions of some basic type (for example, get_user
or put_user
). To move data together (such as structs or arrays), you can use another set of functions: copy_from_user
and copy_to_user
. You can use a specialized call to move a null-terminated string: strncpy_from_user
and strlen_from_user
. You can also access_ok
test whether a user-space pointer is valid by calling it. These functions are defined in the linux/include/asm/uaccess.h.
You can use access_ok
a macro to validate a user-space pointer for a given operation. This function has 3 parameters, namely, the access type ( VERIFY_READ
or VERIFY_WRITE
), a pointer to a block of user space memory, and the size of the block in bytes. If successful, this function returns 0:
ACCESS_OK (Type, address, size);
To move some simple types (such as int or long types) in the kernel and user space, you can use get_user
and put_user
easily implement them. Both macros contain a value and a pointer to a variable. The get_user
function moves the value specified by the user-space address ( ptr
) to the specified kernel variable ( var
). The put_user
function var
moves the value specified by the kernel variable () to the user-space address ( ptr
). If successful, both functions return 0:
Get_user Put_user (Var, PTR);
To move larger objects, such as structs or arrays, you can use the copy_from_user
and copy_to_user
functions. These functions move the complete block of data between the user space and the kernel. copy_from_user
function moves a piece of data from the user space to the kernel space, copy_to_user
it moves a piece of data from the kernel space to the user space:
Copy_from_user Copy_to_user (void *to, const void __user *from, unsigned long n);
Finally, you can use strncpy_from_user
a function to move a NULL-terminated string from the user space to the kernel space. Before calling this function, you can strlen_user
get the size of the user space string by calling the macro:
Strncpy_from_user (Char *dst, const char __user *src, long Count); Strlen_user (str);
These functions provide basic functionality for memory movement between the kernel and the user space. You can actually use other functions (such as reducing the number of checks performed). You can find these functions in Uaccess.h.
Back to top of page
Using System calls
Now that the kernel has completed the update with the new system call, take a look at what needs to be done with these system calls from the user space application. There are two ways to use a new kernel system call. The first method is very handy (but you may not want to use it in the product code), and the second approach is traditional and requires more work.
Using the first method, you can syscall
call a new function that is identified by its index by a function. Using syscall
a function, you can invoke a system call by specifying its call index and a set of parameters. For example, the simple application shown in Listing 5 uses its index call sys_getjiffies
.
Listing 5. Invoking system calls using Syscall
#include <linux/unistd.h> #include <sys/syscall.h> #define __nr_getjiffies320int main () { long jiffies ; Syscall (__nr_getjiffies); printf ("Current jiffies is%lx\n", jiffies); return 0;}
As you can see, the syscall
function uses the index used in the system call table as the first parameter. If there are other parameters that need to be passed, you can add them after the call index. Most system calls include a SYS_
symbolic constant to specify their own mapping to __NR_
the index. For example, use the syscall
call __NR_getpid
index:
Syscall (Sys_getpid)
syscall
Functions are schema-specific and use a mechanism to give control over to the kernel. Its parameters are based on the __NR
mapping between the index and the symbol provided by/usr/include/bits/syscall.h SYS_
(defined at compile libc). Never refer to this file directly; Use the/usr/include/sys/syscall.h file instead.
The traditional approach requires us to create function calls that must match the system call index in the kernel (so that the correct kernel service can be called), and the parameters must match. Linux provides a set of macros to provide this functionality. _syscallN
macros are defined in/usr/include/linux/unistd.h, in the following format:
_syscall0 (Ret-type, Func-name) _syscall1 (Ret-type, Func-name, Arg1-type, Arg1-name) _syscall2 (Ret-type, Func-name, Arg1-type, Arg1-name, Arg2-type, Arg2-name)
_syscall
A macro can define up to 6 parameters (but only 3 are shown here).
Now, let's look at how to use a _syscall
macro to make a new system call visible to the user space. The application shown in Listing 6 uses _syscall
all of the system calls defined by the macro.
Listing 6. Using _syscall macros for user space application development
__nr_getjiffies __nr_diffjiffies __nr_pdiffjiffies getjiffies diffjiffies pdiffjiffies, Long, ujiffies, long*, presult); int main () { long jifs, result; int err; getjiffies (); diffjiffies (jifs)); pdiffjiffies (Jifs, &result); if (!err) { printf ("Difference is%lx\n", result); } else { printf ("error\n"); } return 0;}
Note that the __NR
index is required in this application because the _syscall
macro is used func-name
to construct the __NR
index getjiffies
__NR_getjiffies
. The result is that you can use their names to invoke kernel functions, just like any other system call.
Back to top of page
Other options for user/kernel interaction
A system call is an efficient way to request a service in the kernel. The biggest problem with this approach is that it is a standard interface that makes it difficult to add new system calls to the kernel, so that similar services can be implemented in other ways. If you have no intention of adding your own system calls to the common Linux kernel, then the system call is a convenient and efficient way to provide the kernel service to the user space.
Another way to make your service visible to user space is through the/proc file system. The/proc file system is a virtual file system that you can use to provide users with a directory and files, and then provide an interface to the new service in the kernel through the file system interface (read, write, etc.).
Back to top of page
Using Strace to track system calls
The Linux kernel provides a very useful way to keep track of the system calls (and the signals received by the process) that a process invokes. The tool is strace
that it can be executed on the command line, using the application you want to track as a parameter. For example, if you want to know date
what system calls were executed when the command was executed, you could type the following command:
Strace Date
The result is a lot of information that shows the date
various system calls performed during the execution of the command. You will see that the shared library is loaded, the memory is mapped, and the last trace is the date information generated in the standard output:
... write (1, "Fri Feb 9 23:06:41 mst 2007\n", 29Fri Feb 9 23:06:41 mst) = 29munmap (0xb747a000, 4096) = 0exit_gr OUP (0) =? $
The trace is completed in the kernel when the current system call request has a syscall_trace
specific set of fields named (which causes a do_syscall_trace
call to the function). You can also see that the trace call is./linux/arch/i386/kernel/entry. Part of the system call request in S (see syscall_trace_entry
).
Back to top of page
Conclusion
System invocation is an efficient way to request kernel space service through user space and kernel space. However, the control of this method is also very strict, the simpler way is to add a new/proc file system entry to provide user/kernel interaction. However, when the speed factor is very important, the system
Kernel command "Go" using a Linux system call