First, preface
The main content of this paper is to describe the software framework of kernel time subsystem. First, we introduce the advantages of the new time subsystem from the old time subsystem to the new time subsystem's source. Chapter Three takes the time subsystem's related files and the kernel configuration into full. Finally, the data flow and control flow of the time subsystem under various kernel configurations are described.
Second, Background introduction
1. Software architecture of traditional kernel time subsystem
Let's go back to the ancient 2.4 kernel era, where the software framework for the time subsystem of the kernel is as follows:
First, the clock event and clock source module are implemented in each architecture related code. Here the name is tall, in fact, here is just borrowed the new time subsystem of the noun, for the old kernel, clock event is through the timer hardware interrupt processing function, on this basis can build the Tick module, the Tick module maintains the system tick, For example, the system has a 10ms tick, each time the tick arrives, the timekeeping module increases the system time, if the timekeeping is completely tick driver, then it is only 10ms precision, for higher precision, clock The source module is an interface function that provides the offset time information between ticks.
The initial Linux kernel only supported the low-precision timer, which is the tick-based timer. If the kernel uses a 10ms tick, then the highest precision of the low-precision timer is only 10ms, and it is not possible to achieve the accuracy of us or even NS levels. Each core driver module can use a low-precision timer to implement its own timing function. The time statistics for each process are also tick-based, and the kernel scheduler dispatches them based on this information. The System load and kernel profiling modules are also tick-based, for computing load and profiling cores.
From the perspective of user space, there are two requirements, one is to get or set the current system time interface functions: such as Time,stime,gettimeofday. Another is the timer-related interface functions: For example, Setitimer, alarm, etc., when the timer expires, it sends signal to the process.
2, why introduce a new time Subsystem software architecture?
But with the development of technology, there are two new requirements:
(1) Embedded devices need a better power management strategy. Traditional Linux has a cyclical clock that wakes up even when the system has nothing to do, causing the system to continuously enter a high-power state from a low-power (idle) state. Such a design does not meet the requirements of power management.
(2) Multimedia applications require a very precise timer, for example, to avoid skipping frames in the video and the bounce in the audio playback, which requires the system to provide a timer with sufficient accuracy
Unlike the low-precision timer, the high-precision timer uses the most intuitive time units of the human NS (low-precision timer used by the tick is related to kernel configuration, not enough directly). In essence, Linux kernel provides a high-precision timer, but it does not have to provide a low-precision timer, but because of the low-precision timer has a long history, and in the infiltration into the various parts of the kernel, if the removal of low-precision timer can easily cause Linux Kernel stability and robustness issues, so the current Linux kernel keeps the low-precision timer and high-precision timer coexisting.
Driven by new requirements, kernel developers modify the software framework of the Linux time subsystem to make the code hierarchy clearer and flexible and configurable, and a schematic block diagram looks like this:
After introducing multi-core, the function of the HW timer was divided into two parts, one is the system counter of free running, it is global and does not belong to any CPU. The other part is the HW block that generates the timed event, which we call the Timer,timer hardware embedded in each CPU core, so we're more accurately called the CPU local timer, which is based on a global counter operation. At the drive layer, we provide a clock source chip driver module to drive the hardware, which is the module that is related to the hardware architecture. If there is more than one HW Timer and counter block in the system, there may be multiple clock source chip driver in the system.
In the face of a variety of timer and counter hardware, Linux kernel abstracts the Universal Clock event layer and the Universal Clock source module, which is independent of the hardware of the two modules. The underlying clock source chip driver invokes the universal clock event and the clock source module's interface functions, registering clock source and clock event devices. The clock source device corresponds to the hardware of the system free running counter, which provides a basic timeline. Of course, the actual timeline like a straight line, infinitely extended. For kernel, its timeline is built on the system free running counter, so there is an overflow problem with clocksource corresponding timeline. If you choose a 64bit HW counter, and the input frequency is not so high, then the overflow time may be up to 50 years or more, then from the application point of view, it can maintain 50 years or so timeline is acceptable. If clock source is a time line, then the clock event is the device that generates the clock event at the point specified on the timeline, and the reason that the asynchronous event is generated is based on the interrupt subsystem, clock source chip Driver will request an interrupt and invoke the callback function of the Universal Clock event module to notify such an asynchronous event.
The Tick device layer works on clock event devices: In general, each CPU forms its own small system, has its own schedule, has its own process statistics, and so on, this small system has its own tick device, and is unique. For clock event devices, this is not the case, the hardware has how many timer hardware registers the clock event device, each CPU tick device will choose their own appropriate clock event devices. Tick device can work in periodic mode or one shot mode, which is, of course, related to system configuration. So, on the tick device layer, how many CPUs there will be, how many tick device, we call the local tick device. Of course, some things (such as the load calculation of the entire system) are not suitable for the local tick driver, so all local tick device will have one selected to be the global tick device, which is responsible for maintaining the jiffies of the entire system. Update wall clock, calculate the global load or something.
The high-precision timer requires a high-precision clock event, and the tick device worker working at one shot mode provides a high-precision clock event. Therefore, based on the tick device of one shot mode, the system achieves a high-precision timer, and the various modules of the system can use the interface of the high-precision timer to complete the timer service. Despite the advent of high-precision timer, the kernel did not abandon the old low-precision timer mechanism (kernel developers trying to integrate high-precision timer and low-precision timer, but failed, so the current kernel, two kinds of timer is the same). When the system is in a high-precision timer (tick device is in one shot mode), the system will setup a special high-precision timer (which can be called Sched timer), the high-precision timer will be triggered periodically, This simulates the traditional periodic tick, thus driving the operation of the traditional low-precision timer. As a result, some traditional kernel modules can still invoke the interface of the classic low-precision timer module.
Three, time subsystem of file collation
1. File integration
The source files for the Linux kernel time subsystem are located in the linux/kernel/time/directory, which we collate as follows:
Filename |
Describe |
Time.c Timeconv.c |
The time.c file is a module that provides a time interface to user space. Details include: Time, Stime, Gettimeofday, Settimeofday,adjtime. In addition, the file also provides some time format conversion interface functions (used by other kernel modules), such as conversions between Jiffes and microseconds, calendar time (Gregorian date), and xtime time conversion. The Xtime time format is the second and nanosecond values to the Linux epoch. The timeconv.c contains a conversion function interface from calendar time to broken-down. |
Timer.c |
Traditional low-precision timer module, basic tick. |
Time_list.c Timer_status.c |
The debug interface that is provided to the user space. In user space, information about the time subsystem in the kernel can be obtained through the/proc/timer_list interface. For example, information about the clock source device, clock event device, and tick devices that are currently in use in the system. The statistics of the timer can be obtained by/proc/timer_stats. |
Hrtimer.c |
High Precision Timer Module |
Itimer.c |
Interval Timer module |
Posix-timers.c Posix-cpu-timers.c Posix-clock.c |
POSIX timer module and POSIX clock module |
Alarmtimer.c |
Alarmtimer Module |
Clocksource.c Jiffies.c |
CLOCKSOURCE.C is a universal Clocksource driver. In fact, the system tick can also be seen as a specific clocksource, whose code is in the JIFFIES.C file |
Timekeeping.c Timekeeping_debug.c |
Timekeeping module |
Ntp.c |
NTP module |
Clockevent.c |
Clockevent Module |
Tick-common.c Tick-oneshot.c Tick-sched.c |
These three files are part of the tick device layer. The tick-common.c file is the periodic tick module for managing periodic tick events. The tick-oneshot.c file is for high precision timer, which is used to manage high precision tick time. The TICK-SCHED.C is used for dynamic tick. |
Tick-broadcast.c Tick-broadcast-hrtimer.c |
Broadcast tick module. |
Sched_clock.c |
Universal Sched Clock module. This module mainly provides a Sched_clock interface function that calls the function to obtain the nanosecond value between the current point in time and the start of the system. The underlying HW counter is actually quite diverse, and some platforms can provide 64-bit HW counter, so in such a platform we can not use this universal SCHED Clock module (not configured Config_generic_sched_ Clock this kernel option), while the Sched_clock interface is provided directly in its own clock source chip driver. The advantage of using the Universal Sched Clock module is that the module expands the 64-bit counter even if the underlying HW counter bits are low (some platform HW counter has only 32 bits). |
2. Kernel configuration for universal clock source and clock event
(1) config_generic_clockevents and Config_generic_clockevents_build: Using the new architecture of the time subsystem, if not configured, the old time subsystem schema described in section II will be used.
(2) There was once a config_ generic_time configuration item corresponding to the Clocksource configuration, but in a version of the deletion, that is, the current kernel is the use of generic Clocksource module, Can no longer be returned to the past using arch-related clocksource era. In order to be compatible with the old-style timekeeping interface, kernel still provides config_arch_uses_gettimeoffset this configuration item. Thus, in the evolution of the software framework, if this is a basic component used by other modules, we can not be completely pushed back, we must consider the compatibility of the old software, although it is a heavy burden, but must do so.
3. Tick device configuration
If you select the software architecture for the new time subsystem (configured with config_generic_clockevents), the kernel opens the configuration options for the timers subsystem, mainly related to the tick and the high-precision timer configuration. There are three types of tick-related configurations, including:
(1) Enable periodic tick whenever, even when the system is idle. This is the time to configure the Config_hz_periodic option.
(2) When the system is idle, stop the periodic tick. The corresponding configuration item is config_no_hz_idle. Configuring the Tickless idle system will also enable No_hz_common options.
(3) Full dynticks system. Even in a non-idle state, which means that the CPU is running on a task, it may also stop the tick. This option is relevant for real-time applications. The corresponding configuration item is config_no_hz_full. Configuring the full Dynticks system will also enable No_hz_common options. This article does not describe the system, interested students can read by themselves.
The above three options can only be configured first. Described above is the new kernel configuration method, for the old kernel, config_no_hz used to configure dynamic tick or called tickless idle system (non-idle periodic tick,idle State, timer interrupt no longer periodically triggered, triggered only as needed), the new kernel still supports this option in order to be compatible with the old system.
4, the Timer module configuration
The configuration associated with high-precision timer is relatively simple, with only one config_high_res_timers configuration item. If a high-precision timer is configured, or if the No_hz_common option is configured, you must configure Config_tick_oneshot to indicate that the system supports TICK device support for one-shot types.
5, how to carry out the kernel configuration of the time subsystem
As described in the previous section, the Linux kernel can have the following two kinds of time subsystem architectures:
(1) New Universal Time Subsystem software framework (configured with config_generic_clockevents)
(2) Traditional time Subsystem software framework (do not configure config_generic_clockevents, configure Config_arch_uses_gettimeoffset)
For our engineering staff, unless you are maintaining an old system, you will of course use the new Universal Time Subsystem software framework, when possible configurations include:
(1) Use low-precision timer and cycle tick. Traditional linuxer should be fond of this configuration, maintaining consistency with traditional UNIX.
(2) Use low-precision timer and dynamic tick
(3) Use high-precision timer and cycle tick
(4) Use high-precision timer and dynamic tick. Trendy Linux should like this configuration, a word, cool ...
Note: This article mainly describes the normal dynamic tick system (tickless idle System), followed by a special article describing the full dynamic tick system.
Iv. data flow and control flow of the time subsystem
1. Use low-precision timer + cycle tick
We first look at the implementation of periodic ticks. The starting point must be the underlying clock source chip driver, which invokes the interface function that registers the clock event (Clockevents_config_and_register or Clockevents_ Register_device), once you add a clock event device, you need to notify the top of the tick device layer, after all, it is possible that the newly registered device is better and more suitable for a tick device (by calling Tick_ Check_new_device function Implementation). If the clock event device is hosted by a tick device (either the tick device does not have a matching clock event device, or the new clock event device is more appropriate for the tick device), Then start the configuration of the tick device (refer to Tick_setup_device). Depending on the configuration of the current system (periodic tick), the Tick_setup_periodic function is called, when the clock event device of the tick device corresponding to the clock handler is set to Tick_ Handle_periodic. The underlying hardware periodically interrupts, which periodically calls the tick_handle_periodic to drive the entire system. It is important to note that even if Config_no_hz and Config_tick_oneshot are configured, the clock event device for one shot is not available in the system, in which case the entire system is still running in the cycle tick mode.
Down to the low-precision timer module, in fact, even if there is no high-precision timer, the kernel will also be high-precision timer module code compiled into the kernel image, which can be seen from the makefile file:
Obj-y + = time.o timer.o hrtimer.o itimer.o posix-timers.o POSIX-CPU-TIMERS.O
High-precision timer will always be programmed into the final kernel. In this framework, the various kernel modules can also call the high-precision timer module in the Linux kernel interface function to achieve high-precision timer, however, this time the high-precision timer module is running in a low-precision mode, In other words, although these hrtimer are organized according to the red and black trees of the high-precision timer, the system only calls the Hrtimer_run_queues function at the arrival of each periodic tick to check if there are expire Hrtimer. There is no doubt that the high-precision timer here is meaningless.
Because of the cyclical tick, the low-precision timer operates with no pressure, as in the past.
2. Low-precision timer + Dynamic Tick
When the system starts, it does not go directly to dynamic tick mode, but goes through a switching process. At first, the system runs in the cycle tick mode, and the event handler of the tick device for each CPU (clock event device) is Tick_handle_periodic. In the context of the soft interrupt of the timer, the Tick_check_oneshot_change is called to check whether the switch to one shot mode is checked, if the system has a clock event device that supports one-shot, And without a high-precision timer, the tick mode switch (call tick_nohz_switch_to_nohz) will occur, and the tick device will switch to the one shot mode, and the event Handler is set to Tick_nohz_handler. Because this time clock event device is working in one shot mode, when the system is functioning normally, the clock event should be reprogram in the event handler in order to generate ticks normally. When the CPU is running the idle process, the clock event device no longer reprogram the next tick, so the periodic tick of the entire system stops.
High-precision timer and low-precision timer work the same principle.
3. High-precision timer + Dynamic Tick
Similarly, the system does not go directly into dynamic tick mode, but goes through a switching process. When the system starts is running in the cycle tick mode, event handler is tick_handle_periodic. In the context of the soft interrupt of the cycle tick (refer to RUN_TIMER_SOFTIRQ), if the condition is met, the hrtimer_switch_to_hres is called to switch hrtimer from low-precision mode to high-precision mode. At this point, the system will have the following actions:
(1) The clock event device of Tick device switches to OneShot mode (refer to Tick_init_highres function)
(2) The event handler of the clock event device for Tick devices is updated to Hrtimer_interrupt (reference tick_init_highres function)
(3) Set the Sched timer (that is, the high-precision timer that simulates the cycle tick, refer to the Tick_setup_sched_timer function)
Thus, when the next tick arrives, the system calls Hrtimer_interrupt to handle the tick (which is produced by Sched timer).
In the dynamic tick mode, the tick device of each CPU is working in one shot mode, and the clock event equipment corresponding to the tick devices also works in one shot mode, when the interrupt of the hardware timer is not generated periodically. But many of the modules in Linux kernel are dependent on periodic ticks, so in this case the system uses hrtime to simulate a cyclical tick. This high-precision timer is initialized when switching to dynamic tick mode, and the callback function of the high-precision timer is tick_sched_timer. This function performs a function similar to what the event handler executes in a periodic tick. In the end, however, the high-precision timer is reprogram so that the clock event can be generated periodically. When the system enters idle, it will stop the high-precision timer, so that when there is no user event, the CPU can continue in idle state, thereby reducing power consumption.
4. High-precision timer + periodic tick
This configuration is rare, mostly because the hardware does not support the one shot clock event device, in which case the entire system is still running in the cycle tick mode.
Linux time Subsystem (ii) software architecture