In the morning I heard people say that part of a program is a kernel state, the other part is User state, how it needs to be. Suddenly wanted to know, the user's program can call the kernel function directly. (now suddenly found that the problem is a bit ridiculous, if you can casually tune, that the system is not a mess) from the Internet to find the following article, speaking of a thorough.
It is now our understanding that the user program cannot invoke kernel functions directly unless the interface is invoked through the system. What to do if you want to invoke which kernel function (or the kernel function you write yourself). Adding a system call is all you have to do.
The original text reads as follows:
Linux system calls
As implies, the system call is a set of "special" interfaces that the operating system provides to the user program to invoke. User programs can use this set of "special" interface to obtain the services provided by the operating system kernel, such as the user can use the file system-related call request system to open the file, close the file or read and write files, through the clock-related system call to get the system time or set the system time.
Logically, a system call can be viewed as an interface between the kernel and the user-space program-it is like a middleman that communicates the request of the user process to the kernel and then sends the processing results back to the user space after the kernel has finished processing the request.
The root cause for system services to provide user space through system calls is to "protect" the system because we know that Linux's operating space is divided into kernel space and user space, each of which runs at different levels and is logically isolated from each other. Therefore, user processes in general do not allow access to kernel data, and can not use kernel functions, they can only operate user data in the user space, the user to adjust the space function. For example, we are familiar with the "Hello World" program (execution) is the standard user space process, it uses the print function printf is a function of users space, the printed character "Hello word" string is also a user space data.
But in many cases, the user process needs to obtain the system service (invokes the System program), then must use the system to provide the user "the special" interface--the system calls, its particularity mainly is stipulates the user process to enter the kernel the specific position; in other words, the path that the user accesses the kernel is predetermined, Can only enter the kernel from the specified position, and do not allow wanton jump into the kernel. With such a sinking into the kernel of the unified access path limit to ensure that the kernel security is no risk. We can describe this mechanism graphically: as a tourist, you can buy a ticket to enter the safari park, but you have to sit in the sightseeing car and follow the prescribed route for sightseeing. Of course, do not get off the bus, because it is too dangerous, not to let you lose your life, is to scare the wild animals.
System calls to Linux
For modern operating system, system call is a kind of universal means of kernel and user space communication, Linux system is no exception. But system calls to Linux systems are unique in comparison to many UNIX and Windows systems, and the essence of Linux design-simplicity and efficiency-is everywhere.
Linux system calls have inherited Unix system calls (but not all), but Linux has done a lot of sublation over traditional UNIX system calls, eliminating redundant system calls from many Unix systems, retaining only the most basic and useful system calls, So all Linux system calls are only about 250 (while some operating systems call up more than 1000).
These system calls can be roughly divided into "Process Control", "File system control, system control, storage management, network management, socket control, user management, interprocess communication categories, details can be found in the article system call list
If you want to take a detailed look at the description of the system call, you can use the Man 2 syscalls command to view it, or simply go to the < core source directory >/include/asm-i386/unistd.h source files to find their original.
Proficiency and mastery of these system calls is a prerequisite for system programmers, but it is not enough for a developer or kernel developer to [1] to say that these calls are not sufficient to memorize. If you only know the existence of calls without knowing why they exist, or only know how to use the call without knowing the main purpose of these calls in the system, then you are still a long way from the control system.
To bridge this gap, first, you must understand the main uses of system calls in the kernel. Although the above gives a number of classifications, but generally speaking, system calls are mainly used in the system in the following categories:
L Control Hardware-system calls are often used as abstract interfaces for hardware resources and user space, such as Write/read calls used to read and write files.
L Set the System state or read kernel data-because system calls are the only means of communication for user space and the kernel [2], so the user sets the system state, such as on/off a kernel service (setting a kernel variable), or reading kernel data must be called through the system. Like Getpgid, GetPriority, SetPriority, SetHostName.
L Process Management-a series of call interfaces are used to ensure that processes in the system can multitask and operate in a virtual memory environment. such as fork, clone, Execve, exit, etc.
Second, what services should exist in the kernel, or what functions should be implemented in the kernel rather than in user space. There is no clear answer to this question, some services you can choose to complete in the kernel, or in user space to complete. The choice to complete in the kernel is usually based on the following considerations:
L Services must obtain kernel data, such as some services that must obtain kernel data such as interrupts or system time.
• From a security perspective, services provided in the kernel are no doubt more secure than user space, and are difficult to access illegally.
From efficiency considerations, the implementation of the core services in the kernel to avoid and user space to pass data to and from the protection of the scene, and so on, so the efficiency is often much higher than the implementation in the user space. For example, httpd and other services.
L if the kernel and user space need to use this service, it is best to implement in kernel space, such as random number generation.
Understanding the above reasons for the mastery of the system call is very important, I hope that users can be summed up from the use of more thinking.
System invocation, user programming Interface (API), system commands, and kernel function relationships
System calls are not directly related to programmers or system administrators, it is simply an interface for kernel services to be submitted to the kernel via a soft interrupt mechanism (described later). In actual use, the programmer calls more than the user programming interface--api, while the administrator uses more system commands.
The user programming interface is actually a function definition that shows how to get a given service, such as read (), malloc (), Free (), ABS (), and so on. It may be in line with the system call, such as the Read () interface and read system calls, but this correspondence is not one by one corresponding, there are often several different APIs internal use of a system call, such as malloc (), free () internal use of BRK () System calls to enlarge or shrink the heap of the process, or an API that uses several system invocation combinations to complete the service. Some APIs do not even require any system calls-because it does not require kernel services, such as the ABS () interface, which computes the absolute value of an integer.
The other thing to add is that Linux's user programming interface follows the most popular application programming interface standard--POSIX standard in the UNIX world, which defines a series of APIs. In Linux (as well as UNIX) These APIs are mainly implemented through the C library (LIBC). In addition to some of the standard C functions defined, a very important task is to provide a set of encapsulation routines (wrapper routine) to use the system calls in user-space packaging for user programming.
However, encapsulation is not necessary, if you are willing to call directly, the Linux kernel also provides a syscall () function to implement the call, we look at an example to compare the difference between the C library call and the direct call.
#include <syscall.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
int main (void) {
Long ID1, ID2;
/*-----------------------------*/
/* Direct System call * *
/* SYS_GETPID (func No. is 20) */
/*-----------------------------*/
ID1 = Syscall (sys_getpid);
printf ("Syscall (sys_getpid) =%ld/n", ID1);
/*-----------------------------*/
/* Use the "LIBC" package system call * *
/* SYS_GETPID (Func No. is 20) */
/*-----------------------------*/
ID2 = Getpid ();
printf ("Getpid () =%ld/n", ID2);
return (0);
}
System command relative to the programming interface is higher layer, it is the Internal Reference API executable program, such as our common system command LS, hostname, etc. Linux's system command format follows the tradition of System V, most of which are placed under/bin and/sbin (see Shell and other chapters for related content).
If you are interested you can view the system calls they use by strace ls or strace hostname commands, and you will find that system calls such as Open, BRK, Fstat, ioctl are used in system commands.
The next issue that needs to be explained is the relationship between kernel functions and system calls, and kernel functions we don't want to be overly complex, but they're similar to ordinary functions, but they're implemented in the kernel, so you need to meet some kernel programming requirements [3]. System call is a layer of user access to the kernel of the interface, which itself is not a kernel function, after entering the kernel, different system calls will find corresponding to their respective kernel functions--another professional say is called: System call Service service routines. The actual service to the request is the kernel function rather than the calling interface.
For example, system call Getpid is actually called kernel function sys_getpid.
Asmlinkage long sys_getpid (void)
{
Return current->tpid;
}
Linux systems have a number of kernel functions, some of which are used by kernel files, others can be export for the other parts of the kernel to use together, the specific circumstances of their own decision.
Kernel functions exposed by the kernel are--export-can be viewed using command ksyms or cat/proc/ksyms. In addition, there is an inductive classification of the kernel function of the book called "The Linux Kernel API books", interested readers can go to see.
All in all, from the user's perspective to the kernel, the system commands, programming interfaces, system calls, and kernel functions are followed. After describing the system call implementation, we will look back at the entire execution path.
System Call Implementation
Implementing system calls in Linux leverages software interrupts in the 0x86 architecture [4]. Software outages differ from what we often call interruptions (hardware interrupts) in that they are triggered by software directives rather than peripherals, that is, an exception that programmers start with, specifically calling the INT $0x80 assembly instruction, which produces a programming exception with a vector of 128.
The reason that system calls need to be implemented with exceptions because when a user-state process invokes a system call, the CPU is switched to the kernel state to perform the kernel function [5], and we have already described in the i386 architecture part of entering the kernel--entering the high privilege level--must pass through the system gate mechanism, Here the exception is actually through the system door into the kernel (in addition to the int 0x80 user space can also through the int3--vector 3, into--vector 4, bound--vector 5, and other abnormal instructions into the kernel, and other abnormal user space programs are not available, are used by the system).
Let's explain the process in more detail. The purpose of the Int $0x80 directive is to produce a programming exception numbered 128, which corresponds to the 128th item of the interrupt descriptor IDT-the corresponding system gate descriptor. The door descriptor contains a preset kernel space address that points to the system call handler: System_call () (not confused with the system Invoke service), which is entry. s file is written in assembly language).
It is clear that all system calls will be transferred to this address in a unified way, but Linux has a total of 2, 300 system calls from here into the kernel and how to distribute them to their own service program. Don't get dizzy, the way to solve this problem is very simple: first Linux is numbered for each system call (0-nr_syscall), while a system call table is saved in the kernel, which holds the system call number and its corresponding service routines, so that before the system is transferred through the system door into the kernel, You need to pass the system call number into the kernel, and on the x86, the transfer action is implemented by loading the call number into the EAX register before executing the int0x80. This way, once the system call handler runs, it can get the data from the EAX and then look for the corresponding service routines in the system call table.
In addition to passing system call numbers, many system calls need to pass some parameters to the kernel, such as sys_write (unsigned int fd, const char * buf, size_t count) Calls require the delivery of the file descriptor FD and the content to be written buf and write bytes count to the kernel. In this case, Linux has 6 registers used to pass these parameters: EAX (the system call number), EBX, ecx, edx, ESI, and EDI to hold these additional parameters (in ascending alphabetical order). The practice is to use Save_all macros in System_call () to store the values of these registers in the kernel stack.
When the service routine ends, System_call () obtains the return value of the system call from EAX and stores the return value in the position where the user state EAX register stack unit was saved. It then jumps to Ret_from_sys_call (), terminating the execution of the system call handler.
When the process resumes its user state execution, the Restore_all macro restores the register value that the user was left on the stack before entering the kernel. where EAX returns, the return code of the system call is brought back. (Negative note call error, 0 or positive indicates normal completion)
We can analyze the Getpid system call is really a process to materialize the above concept, analysis Getpid system call a way is to view the code details in Entry.s, step-by-step tracking source to analyze the running process, in addition to the use of some kernel debugging tools, dynamic tracking run path.
Suppose our program source file name is GETPID.C, the content is:
#include <syscall.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
int main (void) {
Long ID;
ID = Getpid ();
printf ("Getpid () =%ld/n", ID);
return (0);
}
Compiling it into the getpid executable file "Gcc–o getpid < path >/GETPID.C", we use KDB to produce the execution path after it enters the kernel.
L Activate KDB (press pause key, of course you must have KDB patch to the kernel), set the kernel breakpoint "BP sys_getpid", Exit KDB "Go", and then execute./getpid. Instantaneous, enter the kernel debugging state, the execution path stops at Breakpoint Sys_getpid place.
L at the kdb> prompt, execute the BT command observation stack, discover the nested path of the call, and see that the sys_getpid is nested in the kernel function System_call.
L at the kdb> prompt, perform an RD command to view the values in registers, and you can see the getpid call number--0x00000014 (=20) stored in the EAX.
L at the kdb> prompt, execute the SSB (or SS) command to trace the kernel code execution path, and you can find that after Sys_getpid executes, the System_call function is returned and then the Ret_from_sys_call routine is transferred. (and then there are some other routines that we don't talk about here.) )
Combined with the execution path of user space, roughly this program can be grouped into several steps:
1 The program calls the LIBC Library's encapsulation function getpid. In this wrapper function, the system call number _nr_getpid (20th) is pressed into the EAX register.
2 Invoke soft interrupt int 0x80 into the kernel.
(Enter the kernel state below)
3 executes the System_call first in the kernel, and then executes the sys_getpid to find the corresponding system call service routines based on the system call number in the call table.
4. Executes the Sys_getpid service routine.
5. After execution, go to the Ret_from_sys_call routine and return it in the system call.
Kernel debugging is a very interesting topic, a variety of methods, I personally think that the more useful is the UML (user mode linux+gdb) and KDB these two tools. In particular, KDB is useful for debugging small-scale kernel modules or for viewing kernel run paths, and for its use you can look inside the Linux kernel Debugger Insider article.
System call Thinking
The inner process of the system call is not complicated, we will not say more, the following section we mainly discuss and analyze some important issues involved in system call, hoping that this will be more helpful to understand the essence of system call.
Call Context Analysis
System calls, although they are going into the kernel execution, are not a purely kernel routine. First, it represents the user process, this determines that although it will fall into the kernel execution, the context is still in the process context, so you can access many of the information about the process (such as the current structure-the control structure of the present process) and can be preempted by other processes (when returning from a system call, The System_call function determines whether the rescheduling is possible, sleeps, and receives signals [6] and so on.
All of these features are involved in the process scheduling problem, we do not delve here, as long as we understand that the system call is completed, and then return to or say the control to the user process initiated the call, the kernel will have a schedule. If a higher priority process is found or the time slice of the current process is exhausted, the high-priority process is selected or the process is selected to run again. Aside from the need to consider the rescheduling, then the kernel needs to check whether there is a suspended signal, if the current process has been found to suspend the signal, then also need to return to the user space processing signal processing routines (in user space), and then back to the kernel, back to the user space, some trouble but this iterative process is necessary.
Calling performance issues
System calls need to get into the kernel space from user space, after processing, they need to return to user space. In addition to the actual time consuming of system invoke service routines, the sink/return process and system call handlers (check the system call table, storage/restore user site) also need to spend a few hours, which adds up to the response speed of a system call. System calls are more demanding than other user programs because they need to be run into the kernel, so the code is as simple and fast as other kernel programs. Fortunately, Linux has an incredibly fast context-switching speed, making its access to the kernel optimized for simplicity and efficiency, and all Linux system call handlers and each system call are very concise.
In most cases, Linux system invocation performance is acceptable, but for some applications with very high performance requirements, although they want to take advantage of the services invoked by the system, they want to speed up the pace and avoid the expense of getting caught/returned and the system call handler. Therefore, using the kernel to invoke the system call service routine directly, the best example is to httpd--it to invoke the service routines from the kernel to call the socket, etc., to avoid the overhead.
When to add a system call
System calls are the only means of user-space and kernel-space interaction, but this is not the time to complete the interaction function to add a new system call. Adding system calls requires modifying the kernel source code and recompiling the kernel, so if you want flexible and kernel interaction information, it's a good idea to use several methods.
L Write character drivers
Character drivers enable you to perform the functions of interacting with the kernel data. Its biggest advantage is that it can be modular loading, so as to avoid the process of compiling the kernel, and call the interface fixed, easy to operate.
L Use the proc file system
Using the proc file system to revise system State is a very common means, for example, by modifying the system parameter configuration file (/proc/sys) under the proc file system, we can dynamically change kernel parameters directly at run time; again, by following this instruction: Echo 1 >/proc/ Sys/net/ip_v4/ip_forward opens the switch in the kernel that controls IP forwarding. Similarly, there are a number of kernel options that can be queried and adjusted directly through the proc file system.
L Use virtual file system
Some kernel developers believe that the use of IOCTL () system calls (character device-driven interfaces) often means that system calls are ambiguous and difficult to control. Putting information into the proc file system can confuse information organizations and therefore does not support overuse. They recommend implementing an isolated virtual file system instead of IOCTL () and/proc, because the file system interface is clear and user-friendly, and the use of virtual file systems makes it easier and more efficient to use scripts to perform system administration tasks.
Experimental section
Introduction to Code Features
We want to collect information about the execution of the Linux system runtime system call and get the system call log in real time. These log messages will be able to be returned in real time to user space in a readable form so that the user can observe or do a near-step log analysis (such as intrusion detection).
So a simple set of experimental code needs to complete the following basic functions:
First: Record the system call log and write it to the buffer (in the kernel) so that the user can read it;
Second: Create a new system call to return the system call log in the kernel buffer to user space.
Third, the system call is recycled so that the system call log can be returned dynamically and in real time.
Introduction to Code Structure system
Basic functions
The Code features section describes the basic features in the corresponding program code set of three subroutines. They are Syscall_auydit, Sys_audit and AUDITD respectively. Next we'll introduce the code specific structure.
Log routine Syscall_audit
Syscall_audit This program is a kernel-state service routine that is responsible for recording the running log of system calls.
The practice of recording system call logs is to modify the system call handler System_call[7 in the kernel, in which each call that needs to be monitored (in our example Clock 222 system calls are monitored, of course you can also follow your own needs selective monitoring) after execution, insert a logging instruction , the directive goes to call the kernel service function Syscall_audit to record the information for that call [8].
The Syscall_audit kernel service routine creates a kernel buffer to hold the logged function. When the amount of data collected reaches a certain threshold (such as a%80 that is set to reach the total size of the buffer, this avoids the loss of a new call) and wakes the system call process to retrieve the data. Otherwise, the system caller will jam on a waiting queue until it wakes up, which means that if the buffer is not nearly full, the system call waits (hangs) it is populated.
System Call Sys_audit
Since system calls are executed in the kernel, logging their execution logs should also be collected in the kernel, so we need to use a new system call to bring the kernel information back to the user space--sys_audit is our newly populated system call, which is very simple, is to fetch data from the buffer to return the user space.
In order to ensure data continuity, to prevent loss. We will create a kernel buffer to store the log data collected every moment, and when the amount of data collected reaches a certain threshold (for example, a%80 that reaches the total size of the buffer), the system call process is awakened [9] to retrieve the data. Otherwise, the system caller will jam on the wait queue until it wakes up when the log is collected, which means that if the buffer is not nearly full, the system call waits for it to be populated.
User Space Service Program AUDITD
Needless to say, we need a user-space service process to continually invoke the audit system call to retrieve the call log information collected in the system. You know, long calls log sequences are valuable for analyzing intrusion or system behavior.
Integrating code into the kernel
In addition to the content described above, we also need some supporting, but very necessary work, these work will help us to the above code flexibly into one, complete the required functions.
n One is to modify the entry. S assembly code, which contains the system call table and System call entry code System_call. We first need to add a new system call in the system call table (named sys_audit,223 number.. long Symbol_name (Sys_audit)), down to add a jump to logging service example in the system call entry thread (jump "Je auditsys", The Auditsys code snippet actually invokes the system call record routine syscall_audit);
n the second is to fill in the code file audit.c, which contains the Syscall_audit and system call Sys_audit two function bodies, we only say that contains the function body, not the function, because here we do not want to write the implementation of the function in the kernel of death, but want to use the function pointer, That is to do two hook functions, to complete the implementation of the specific function in the module to complete, so that can dynamically load, easy to debug (see the next section).
U the third is to modify the I386_ksyms.c file, and then finally join
extern void (*my_audit) (int,int);
Export_symbol (My_audit);
extern int (*my_sysaudit) (unsigned char,unsigned char*,unsigned short,unsigned char);
Export_symbol (My_sysaudit);
, this is done to export the kernel symbol table so that the above function pointers can be hooked up in the module code.
n its four is to modify the kernel of the original code directory under/kernel from the directory under the makefile file, very simple, just to obj-y: = ... Finally, add audit.o, tell the compiler kernel is to AUDIT.O.
Key code Explanation
Our log collection routines and fetch log system calls these two key functions are implemented in the kernel module. Some of the areas that need to be explained are:
1. Module programming, such as the necessary principles, such as initialization, cancellation, etc. should be implemented, the difference is that we initialize and logoff will be suspended or removed [10] The implementation of the two hook function.
2. Our system call logging takes a structure: Syscall_buf, which contains fields such as system call number--syscall, process id--pid, caller name--comm[comm_size, and so on, a total of 52 bytes; Our kernel buffer is Audit_ BUF, which is an array of up to 100 syscall_buf.
3. System call implementation is very simple, only to do is to use __copy_to_user[11] to the kernel buffer in the log data to user space. To improve efficiency, the system call suspends waiting wait_event_interruptible (buffer_wait, Current_pos >= AUDIT_BUF_SIZE*8/10) when the buffer is not full (when the%80 threshold is not reached) , and the wake system call continues to collect log wake_up_interruptible (&buffer_wait) when the buffer collection is fast full.
4. Finally, add that you must use the macro _syscall4 (int, audit, U8, type, U8 *, buf, U16, Len, U8, reset) to "declare" the call before calling our new system call in the AUDITD User Service program--expand into a Udit function prototype, in order to format conversion and parameter transfer, otherwise the system is not recognized.
Step by step
Here's how to add this call.
1 Modify entry. s--in which the audit call is added and a collection routine is added to the System_call. (This function is located under the < kernel source code >/arch/i386/kernel/)
2 Add audit.c file to < kernel source code >/arch/i386/kernel/-this file defines the
Sys_audit and Syscall_audit two functions require a hook function (My_audit and my_sysaudit) that will be in the entry. is used in S.
3 Modify < Kernel source code >/ARCH/I386/KERNEL/I386-KYSMS.C file, in which to export my_audit and my_sysaudit two hook functions. Because it is only exported in the kernel symbol table, it can be used by other kernel functions, that is, it can be hung in the module.
4 Modify < Kernel source code >/arch/i386/kernel/makefile file, compile audit.c into kernel.
This allows the kernel to be recompiled, and the new kernel has been added to the detection point. The next step is to write the module to implement the function of system call and kernel collection service routines.
1 Write a module named Audit, which mainly implements the Mod_sys_audit and Mod_syscall_audit two functions in addition to loading and unloading module functions. They are mounted on the two hooks of My_sysaudit and My_audit respectively.
2 after compiling the module load Insmod audit.o. (You can view by DMESG is loading information)
3 Modify the system call number in which/usr/include/asm/unistd.h--joins the audit. This allows user space to find the audit system call.
4 Finally, we write a user Deamon program to loop through the audit system call and print the collected information to the screen.
[1] We say that the development of the kernel refers to the development of the system kernel, such as the development of the driver module mechanism, development system call mechanism, while the kernel developer refers to the development on the basis of the kernel, such as drive development, system call development, file system development, network communication Protocol development. Our magazine focuses on the core development level, that is, the use of the mechanism provided by the kernel development.
[2] for Linux, system calls are the only means by which the user program accesses the kernel, either/proc or the device file mode is ultimately done using system calls.
[3] Kernel programming has some characteristics compared to User program programming, the simple kernel program can not refer to the C library function (unless you implement it, for example, the kernel implements a string operation function of C library seed); memory protection is missing; the stack is limited (so the call nesting cannot be too much), and because of the scheduling relationship, You must consider the continuity of the kernel execution path, and you cannot have long sleep behavior.
[4] software interrupts, although called interrupts, are actually exceptions (more precisely, traps)--CPU interrupts-and a special exception that is triggered by the programmer.
[5] The system call process can be understood--the kernel is performing tasks on behalf of the application in a nuclear mindset.
[6] In addition to the process context, there is another context in the Linux system-it becomes an interrupt context. The interrupt context is different from the process context, and it represents interrupt execution, so the process is asynchronous and can be said to be irrelevant. The program in this context avoids sleep because it cannot be preempted.
[7] System_call is a general-purpose system call service program, or the system calls the portal program, because any system call has to go through the System_call Uniform processing (Lookup system call table, jump to the corresponding call service routine), So any information about the system call can be recorded by Syscall_audit.
[8] Here we mainly record information such as call time, caller PID, program name, and so on, which can be obtained from xtime or current global variables.
[9] There is a need to use waiting queues, concrete statements see Declare_wait_queue_head (buffer_wait).
[10] hanging or unloading is actually a function pointer to the function implemented in the module or point to the null function, but you know that these function pointers must be exported to the kernel symbol table, otherwise cannot be found.
[11] This is a system-supplied kernel function that is designed to pass data from the kernel to user space.