I recently studied the Linux Kernel Time subsystem to prepare for the next article titled "Date and Time in server programs". I accidentally noticed the impact of several new Linux system calls on writing server code, it is recorded here. This blog can also be counted as a footnote in the previous article "common programming models for multi-threaded servers.
1. the style of the server program may change
The new file descriptor syscall generally supports additional flags parameters. You can directly specify O_NONBLOCK and FD_CLOEXEC. For example:
- Accept4-2.6.28
- Eventfd2-2.6.27
- Inotify_init1-2.6.27
- Pipe2-2.6.27
- Signalfd4-2.6.27
- Timerfd_create 2.6.25
In addition to the last new function, the other six syscalls are called to enhance the original function. Removing the ending number is the original syscall.
O_NONBLOCKThe function is to enable "non-blocking IO", and the file descriptor is blocked by default.
These system calls to create file descriptors can directly set the O_NONBLOCK option, which may reflect the current trend of Linux (server) development, that is the one loop per thread + (non-blocking IO with IO multiplexing) I recommended in my previous blog "common programming models for multithreading servers ). From these kernel changes, we can see that non-blocking IO has been mainstream to the extent that syscall is added to the kernel to save one fcntl (2) Call.
In addition, the FD_CLOEXEC option can be enabled during file descriptor creation for the following new system calls:
- Dup3-2.6.27
- Epoll_create1-2.6.27
FD_CLOEXECWhen the program fork () is enabled, the sub-process will automatically close the file descriptor (see the correction below ). The file description is inherited by the quilt process by default (this is a typical IPC of traditional Unix, such as one-way communication between parent and child processes using pipe (2 ).
The above eight new syscalls allow the direct designation of FD_CLOEXEC, which may indicate that the main purpose of fork () is no longer to create a worker process and maintain communication with the parent process through the shared file descriptor, it is like creating a "clean" process like CreateProcess in Windows, which has little to do with the parent process.
In my opinion, the above two flags show that the mainstream model developed by Linux servers is being transformed from the fork () + worker processes Model to the multi-threaded model I previously recommended. Fork () is frequently used. In the future, fork () may be called only by the "watchdog program" specifically responsible for starting other processes (), generally, server programs (for the definition of "server programs", see my previous article) will no longer fork () Sub-processes. One of the reasons is that fork () is generally not called in multi-threaded programs, because Linux fork () Only clones the thread of control of the current thread, and does not clone other threads. That is to say, fork () cannot generate a multi-threaded sub-process that is the same as the parent process. Linux does not have a system call such as forkall. In fact, forkall () is also very difficult (in terms of semantics), because other threads may wait on the condition variable, block the system call, and possibly wait for the mutex to enter the critical section, it may also be difficult for intensive computing to move the entire process into sub-processes. It can be seen that the "watchdog program" should be a single process and can capture SIGCHLD. IF signal can read like a "file", it can greatly simplify development. The following 2nd points exactly confirm.
In this case, disabling irrelevant file descriptors in fork () becomes a common requirement, and the system calls can be achieved simply.
2. The signalfd added by Kernel 2.6.22 gives signal handling a new approach.
Signal processing is a difficult point in Unix programming, because signal is asynchronous and occurs in the "current thread", it will encounter the "reentrant" problem. In fact, "Thread" is only 1993 added to Unix. For more than 20 years, there was no "main thread". I mean signal handler is called like coroutine, instead of subroutine. Raymond Chen has an article about this issue.
After Unix/Linux supports threads, signal becomes more difficult to process and rules become obscure (think about signal delivery objects ). Moreover, it does not conform to the Unix philosophy of "every thing is a file" and cannot read signal events as files. However, the signalfd added to 2.6.22 makes things a turning point. The program can process signal like processing files, read, select, poll, and epoll, and be integrated into the standard IO multiplexing framework, instead of using another pipe in the program to convert signal into an IO event. (Libev seems to be doing this, and there is also a GHC http://hackage.haskell.org/trac/ghc/ticket/1520)
This multi-threaded program is much easier to deal with signals, and an event loop can handle IO, timer, and signals perfectly.
3. The timerfd added by Kernel 2.6.25 provides a new method for the program's "scheduled task.
In my next blog, I will analyze in detail the date and time in the Linux server program. One part of the content is "Timing", that is, the program will use the timer to do specific things at some time in the future. There are many methods in Linux, such as sleep/nanosleep/clock_nanosleep Based on blocking, rtsignal/timer_create Based on signals, and poll/epoll Based on IO multiplexing. However, the theoretical timing precision of poll/epoll is only Millisecond (the function parameter is the number of milliseconds and cannot specify a higher time precision). The actual waiting accuracy depends on the kernel HZ.
If you need to perform a high-precision timing without blocking in the event loop, you can use timerfd now. Moreover, since it is a fd, it can be easily integrated with non-blocking IO and IO multiplexing, and it will become a natural thing. Of course, file descriptors are scarce resources. If every event loop uses timerfd for timer/timeout, it seems to be a waste (timerfd for every timer is a huge waste, because not every timer needs high-precision timing), I would rather use the traditional priority queue method to manage the timers waiting for expiration (millisecond-level timing accuracy can already meet my needs ), use timerfd only in special cases.
4. The eventfd added by Kernel 2.6.22 provides a new method for "Inter-thread Event Notification.
The common programming model of multi-threaded servers mentioned that inter-process communication only uses TCP, while pipe only functions to asynchronously wake up event loop. Now with eventfd, pipe does not even function. Eventfd is a more efficient inter-thread event notification mechanism than pipe. On the one hand, it uses less file descriper than pipe, saving resources. On the other hand, the buffer management of eventfd is much simpler, all "buffers" have a total of 8 bytes, which may not be as long as pipe.
The future role of pipe may mainly be the stdout/stderr used by the "watchdog program" to intercept sub-processes.
To sum up, the one loop per thread + (non-blocking IO with IO multiplexing) server model promoted in my previous blog relies on a high-quality Reactor-based network library. If you want to write a program, you 'd better use the new kernel later than 2.6.22. It is expected that programming will be much simpler (at least eventfd and signalfd can play a major role). I am going to write a simple try.
Finally, I studied Linux kernel to better compile Linux server applications. I am not a kernel expert and I am not planning to become an expert.
2010-Feb-27 correction: "The FD_CLOEXEC function is to enable the program to fork (), the sub-process will automatically disable this file descriptor", which is incorrect. FD_CLOEXEC is executing exec () as the name suggests () disable the file descriptor during the call to prevent the file descriptor from being leaked to the sub-process. My first response to fork () was to execute exec () immediately, which caused misunderstanding.