Linux time Subsystem eight: Dynamic clock frame (config_no_hz, tickless)

Source: Internet
Author: User
Tags switches

In the previous section of the discussion, we have been based on the assumption that the clock events in Linux are provided by a cycle clock, regardless of whether the clock_event_device in the system is working in a cycle-triggered mode or in a single-trigger mode, regardless of whether the timer system is operating in a low-resolution mode, or high-precision mode, the kernel does its best to provide cycle clocks in different ways to produce periodic tick events, tick events, or for global time management (Jiffies and time updates), or for process statistics for local CPUs, time-wheel timer frames, and so on. Although the cyclical clock is simple and effective, it also brings some drawbacks, especially in the power consumption of the system, because even if there is nothing to do at the moment, the clock event must be issued periodically to activate the system. To this end, the kernel developer proposed the concept of dynamic clock, we can use the kernel configuration item config_no_hz to activate the feature. Sometimes this feature is also called tickless, but it is more appropriate to call it dynamic clock, because it is not really no tick event, but in the idle phase of the system does nothing, we can stop the cycle clock to achieve the purpose of reducing the system power consumption, as long as there is a process is active, Clock events are still periodically emitted.

/*****************************************************************************************************/
Statement: The content of this blog by Http://blog.csdn.NET/droidphone Original, reproduced please indicate the source, thank you!
/*****************************************************************************************************/

Before the dynamic clock works correctly, the system needs to switch to the dynamic clock mode, and to switch to the dynamic clock mode, some prerequisites are required, the main one is the CPU clock event device must support a single trigger mode, when the condition is met, the system switches to the dynamic clock mode, and then, The idle process determines whether the cycle clock can be stopped, and the cycle clock needs to be resumed when exiting the idle process.

1. Data structure

In the previous chapter, we mentioned that after switching to high-precision mode, the high-precision timer system needed to use a high-precision timer to simulate a traditional cycle clock, which took advantage of some of the fields in the tick_sched structure, in fact, Tick_ Sched structure is also an important data structure to implement dynamic clock, in SMP system, the kernel will define a tick_sched structure for each CPU, which is realized by a PERCPU global variable tick_cpu_sched, it is kernel/time/ Defined in TICK-SCHED.C:

[CPP] view plain copy

    1. /*
    2. * Per CPU NOHZ Control structure
    3. */
    4. Static define_per_cpu (struct tick_sched, tick_cpu_sched);

The tick_sched structure is defined in include/linux/tick.h, and we look at the detailed definition of the tick_sched structure:

[CPP] view plain copy

  1. struct Tick_sched {
  2. struct Hrtimer sched_timer;
  3. unsigned long check_clocks;
  4. Enum Tick_nohz_mode Nohz_mode;
  5. ktime_t Idle_tick;
  6. int inidle;
  7. int tick_stopped;
  8. unsigned long idle_jiffies;
  9. unsigned long idle_calls;
  10. unsigned long idle_sleeps;
  11. int idle_active;
  12. ktime_t Idle_entrytime;
  13. ktime_t Idle_waketime;
  14. ktime_t Idle_exittime;
  15. ktime_t Idle_sleeptime;
  16. ktime_t Iowait_sleeptime;
  17. ktime_t sleep_length;
  18. unsigned long last_jiffies;
  19. unsigned long next_jiffies;
  20. ktime_t Idle_expires;
  21. int do_timer_last;
  22. };

Sched_timer This field is used to simulate a hrtimer of a cycle clock in high precision mode, see the Linux time Subsystem six: the principle and implementation of a high precision timer (hrtimer).

check_clocks This field is used to implement the asynchronous notification mechanism for Clock_event_device and Clocksource, which helps the system switch to high precision mode or dynamic clock mode.

nohz_mode Save Dynamic clock operation mode, based on low resolution and high accuracy mode, dynamic clock implementation slightly different, according to the mode it can be the following values:

    • nohz_mode_inactive system Dynamic Clock has not been activated
    • Nohz_mode_lowres system works with dynamic clocks in low-resolution mode
    • Nohz_mode_highres system works with dynamic clocks in high-precision mode

Idle_tick This field is used to hold the stop cycle clock is the kernel time, when exiting idle to restore the cycle clock, need to use this time to maintain the system in the time Line (jiffies) correctness.

tick_stopped This field is used to indicate that the idle state of the cycle clock has stopped.

The idle_jiffies system enters idle jiffies value for information statistics.

Idle_calls The number of times the system enters idle.

The number of times the idle_sleeps system entered idle and successfully deactivated the cycle clock.

idle_active Indicates whether the system is currently in an idle state.

The idle_entrytime system enters the idle moment.

Idle_waketime Idle state is interrupted at the moment.

The idle_exittime system exits the idle moment.

Idle_sleeptime The total time of the stop cycle clock in each idle time.

Sleep_length The time of the stop cycle clock in this idle.

The jiffies value of the last cycle clock in the last_jiffies system.

Next_jiffies expects the jiffies of the next cycle clock.

idle_expires enters Idle, the next first timer time expires.

We know that, depending on the current mode of operation of the system, the system provides a cycle clock (tick) differently, when in low-resolution mode, the CPU's Tick_device provides a cycle clock, and when in high-precision mode, is a high-precision timer to provide a cycle clock, Let's discuss the dynamic clock implementation in two modes separately.

2. Dynamic clock at low resolution

Back to the previous article: Linux time Subsystem four: Timer engine: Clock_event_device in the Tick_device section, regardless of the tick_device mode of operation (periodic trigger or a single trigger), Tick_ The event callback handler for the clock_event_device associated with the device is: Tick_handle_periodic, which, regardless of whether it is currently idle, provides a periodic tick event in the exact number of Hz, This does not meet the requirements of the dynamic clock, so, to make the dynamic clock work, the system first to switch to support the dynamic clock operating mode: Nohz_mode_lowres.

2.1 Switching to Dynamic clock mode

The first half of the switching process of the dynamic clock mode is the same as the path to the high-precision timer mode, please refer to: Linux time Subsystem Six: the principle and implementation of the high-precision timer (Hrtimer). Here is a brief description of the process: the system works in the periodic clock mode, periodic tick event interrupt, tick event interrupt trigger timer soft interrupt: TIMER_SOFTIRQ, perform soft interrupt processing function Run_timer_softirq,run_timer_ SOFTIRQ calls the Hrtimer_run_pending function:

[CPP] view plain copy

    1. void hrtimer_run_pending (void)
    2. {
    3. if (Hrtimer_hres_active ())
    4. Return
    5. ......
    6. if (Tick_check_oneshot_change (!hrtimer_is_hres_enabled ()))
    7. Hrtimer_switch_to_hres ();
    8. }

The parameters of the Tick_check_oneshot_change function determine whether it is time to switch to a low-resolution dynamic clock mode or a high-precision timer mode, and we now assume that the system does not support high-precision timer mode, Hrtimer_is_hres_ Enabled returns false directly, and the corresponding Tick_check_oneshot_change function parameter is true, indicating the need to switch to dynamic clock mode. Tick_check_oneshot_change after checking that both timekeeper and clock_event_device have dynamic clock conditions, the TICK_NOHZ_SWITCH_TO_NOHZ function switches to the dynamic clock mode:

First of all, the function uses the Tick_switch_to_oneshot function to set the Tick_device's working mode to a single trigger mode, and replaces its interrupt event callback function with Tick_nohz_handler, then Tick_ The schema field in the SCHED structure is set to Nohz_mode_lowres:

[CPP] view plain copy

    1. static void Tick_nohz_switch_to_nohz (void)
    2. {
    3. struct tick_sched *ts = &__get_cpu_var (tick_cpu_sched);
    4. Ktime_t Next;
    5. if (!tick_nohz_enabled)
    6. Return
    7. Local_irq_disable ();
    8. if (Tick_switch_to_oneshot (Tick_nohz_handler)) {
    9. Local_irq_enable ();
    10. Return
    11. }
    12. Ts->nohz_mode = Nohz_mode_lowres;

Then, initialize the Sched_timer timer in the tick_sched structure, get the time of the next tick event and initialize the global variable last_jiffies_update by tick_init_jiffy_update, So that the Jiffies count value can be updated correctly later, and finally, the time of the next tick event is programmed into the tick_device, so that the system completes the switching process to a low-resolution dynamic clock.

[CPP] view plain copy

    1. Hrtimer_init (&ts->sched_timer, Clock_monotonic, hrtimer_mode_abs);
    2. /* Get the next period */
    3. Next = Tick_init_jiffy_update ();
    4. for (;;) {
    5. Hrtimer_set_expires (&ts->sched_timer, next);
    6. if (!tick_program_event (next, 0))
    7. Break
    8. Next = Ktime_add (Next, Tick_period);
    9. }
    10. Local_irq_enable ();
    11. }

In the above code, obviously there is no switch to high-precision mode now, why initialize the high-precision timer in the tick_sched structure? The reason is not to use its timing function, but rather to reuse the Hrtimer_forward function in the Hrtimer code, using this function to calculate the time of the next tick event.

2.2 Event interrupt processing function under low-resolution dynamic clock

As mentioned in the previous section, when switching to a low-resolution dynamic clock mode, the Tick_device event interrupt handler is set to Tick_nohz_handler, which, in general, is the event handler function for the periodic clock mode Tick_handle_ The work done by periodic is similar: Update time, update jiffies count value, call update_process_time update process information, trigger timer soft interrupt, and finally reprogram Tick_device, Causes it to trigger this function again at the next correct tick time:

[CPP] view plain copy

  1. static void Tick_nohz_handler (struct clock_event_device *dev)
  2. {
  3. ......
  4. dev->next_event.tv64 = Ktime_max;
  5. if (unlikely (tick_do_timer_cpu = = Tick_do_timer_none))
  6. TICK_DO_TIMER_CPU = CPU;
  7. /* Check, if the jiffies need an update */
  8. if (tick_do_timer_cpu = = CPU)
  9. Tick_do_update_jiffies64 (now);
  10. ......
  11. if (ts->tick_stopped) {
  12. Touch_softlockup_watchdog ();
  13. ts->idle_jiffies++;
  14. }
  15. Update_process_times (User_mode (regs));
  16. Profile_tick (cpu_profiling);
  17. while (Tick_nohz_reprogram (TS, now)) {
  18. now = Ktime_get ();
  19. Tick_do_update_jiffies64 (now);
  20. }
  21. }

Because it is now working in dynamic clock mode, the tick clock may be stopped more than one tick cycle in the idle process, so when the function is triggered again, the last trigger time may have been more than one tick cycle, tick_nohz_reprogram to Tick_ Device must correctly handle this when programming, using the Hrtimer_forward function described earlier to implement this feature:

[CPP] view plain copy

    1. static int Tick_nohz_reprogram (struct tick_sched *ts, ktime_t now)
    2. {
    3. Hrtimer_forward (&ts->sched_timer, now, tick_period);
    4. Return Tick_program_event (Hrtimer_get_expires (&ts->sched_timer), 0);
    5. }
2.3 Dynamic Clock: Stop cycle tick Clock event

When the dynamic clock mode is turned on, the cycle clock is turned on and off by the idle process, and the idle process is ultimately a loop that starts at the beginning of the loop by Tick_nohz_idle_enter to detect if the cycle clock is allowed to turn off for some time, then enters the low power idle mode, When there is an interrupt event that causes the CPU to exit the low-power idle mode, determine if a new process is activated so that it needs to be re-dispatched, re-enable the cycle clock through tick_nohz_idle_exit if necessary, and then reschedule the process, waiting for the next idle to occur, We can use it to indicate:

Figure 2.3.1 Dynamic clock processing in the idle process

The time to stop the cycle clock is in the Tick_nohz_idle_enter function, which puts the main work to the Tick_nohz_stop_sched_tick function. The kernel does not stop the cycle clock every time it enters the Tick_nohz_stop_sched_tick, then when does it stop? Let's think about it, since the idle process is running, the other processes in the system are waiting for an event, the system is in a state of nothing, the only thing to deal with is the interruption, except for the timer interrupt, we can't predict when it will happen, But we can know the expiry time of the first expired timer, that is, it is not necessary to generate a cycle clock before the time expires, we can deduce the number of ticks that the cycle clock can stop, and then re-program the Tick_device. So that no cycle clock is generated until the earliest timer expires, in fact, Tick_nohz_stop_sched_tick has some limitations: when the expiry time of the next timer differs from the current jiffies value by only 1 o'clock, the cycle clock is not stopped, When the time between expiry of the timer and the current jiffies value is greater than the maximum idle time allowed by timekeeper, the next tick time is set to the maximum idle time allowed by the timekeeper, This is mainly to prevent too long time not to update the system time in timekeeper, it is possible to cause clocksource overflow problem. Tick_nohz_stop_sched_tick function body looks very long, the implementation of this is the logic, so it is not affixed to its code, interested readers can self-read the kernel code: KERNEL/TIME/TICK-SCHED.C.

See the dynamic clock stop process and the implementation of Tick_nohz_handler, in fact, there is a situation is not processed: When the system enters the idle process, the cycle clock is stopped several tick cycles, when the number of tick cycles expire, the tick event will inevitably occur, Tick_ The nohz_handler is triggered and then the first timer that expires is processed. But at the end of the Tick_nohz_handler, the Tick_device will be programmed to be triggered immediately after the next tick cycle, and if the timer has just been processed, the new process is not activated. Our expectation is that the cycle clock can recalculate the time that can be stopped with the next new timer, not the next tick, but Tick_nohz_handler simply sets the Tick_device expiry time to the tick of the next cycle, This causes the cycle clock to be restored, obviously this is not what we want. In order to deal with this situation, the kernel used a little trick, we know that the timer is executed in a soft interrupt, so the kernel in irq_exit after the software interrupt processing, added a small piece of code, KERNEL/SOFTIRQ.C:

[CPP] view plain copy

    1. void Irq_exit (void)
    2. {
    3. ......
    4. if (!in_interrupt () && local_softirq_pending ())
    5. INVOKE_SOFTIRQ ();
    6. #ifdef CONFIG_NO_HZ
    7. /* Make sure this timer wheel updates are propagated */
    8. if (Idle_cpu (smp_processor_id ()) &&!in_interrupt () &&!need_resched ())
    9. Tick_nohz_irq_exit ();
    10. #endif
    11. ......
    12. }

The key call is Tick_nohz_irq_exit:

[CPP] view plain copy

    1. void Tick_nohz_irq_exit (void)
    2. {
    3. struct tick_sched *ts = &__get_cpu_var (tick_cpu_sched);
    4. if (!ts->inidle)
    5. Return
    6. Tick_nohz_stop_sched_tick (TS);
    7. }

Tick_nohz_irq_exit called the Tick_nohz_stop_sched_tick function again, giving the system the opportunity to stop the cycle clock for several tick cycles again.

2.3 Dynamic Clock: Re-open cycle tick clock event

Back to Figure 2.3.1, when a cycle clock is stopped in the idle process, a new process is activated at some point, and Tick_nohz_idle_exit is called before it is re-dispatched, and the function is responsible for recovering the stopped cycle clock. Tick_nohz_idle_exit eventually calls the Tick_nohz_restart function, which is the last time the recovery cycle clock is completed by the Tick_nohz_restart function. The function is not complex: first set the time of the last stop cycle clock to the Sched_timer timer of the tick_sched structure, and then set the expiry time of the timer to the next tick of the current time through the Hrtimer_forward function, For high-precision mode, start the timer, for low-resolution mode, use this time to re-program the Tick_device, and finally update the Jiffies value by TICK_DO_UPDATE_JIFFIES64, in order to prevent a tick at this time the boundary, It is possible that the current moment just crossed the expiry time, and the function uses a while loop:

[CPP] view plain copy

  1. static void Tick_nohz_restart (struct tick_sched *ts, ktime_t now)
  2. {
  3. Hrtimer_cancel (&ts->sched_timer);
  4. Hrtimer_set_expires (&ts->sched_timer, Ts->idle_tick);
  5. while (1) {
  6. /* Forward the time to expire in the future */
  7. Hrtimer_forward (&ts->sched_timer, now, tick_period);
  8. if (Ts->nohz_mode = = nohz_mode_highres) {
  9. Hrtimer_start_expires (&ts->sched_timer,
  10. hrtimer_mode_abs_pinned);
  11. /* Check, if the timer is already in the past */
  12. if (hrtimer_active (&ts->sched_timer))
  13. Break
  14. } else {
  15. if (!tick_program_event (
  16. Hrtimer_get_expires (&ts->sched_timer), 0))
  17. Break
  18. }
  19. /* Reread time and update jiffies */
  20. now = Ktime_get ();
  21. Tick_do_update_jiffies64 (now);
  22. }
  23. }
3. Dynamic clock in high accuracy mode

The main difference between the high-precision mode and the low-resolution mode is how to switch to the high-precision mode during the switching process, as I have already explained in the previous article, when switching to high-precision mode, the dynamic clock is turned on and off and the low-resolution mode is not much different, but also by Tick_nohz_stop_ Sched_tick and Tick_nohz_restart to control, in these two functions, respectively, judged the current two modes:

    • Nohz_mode_highres
    • Nohz_mode_lowres
If it is nohz_mode_highres, the Sched_timer timer of the tick_sched structure is set, and if it is nohz_mode_lowres, the Tick_device is operated directly. 4. Impact of dynamic clock on interrupts

When entering and exiting interrupts, the interrupt system needs to make some mates because of the dynamic clock relationship. First say that the interrupt occurs during the cycle clock stop, if you do not do any processing, in the Interrupt service program if you want to access the Jiffies count value, you may get a lag jiffies value, because the normal state, the Jiffies value will be correctly updated during the recovery cycle clock, so in order to prevent this situation , Tick_check_idle is called during the irq_enter of the interrupt:

[CPP] view plain copy

    1. void Tick_check_idle (int cpu)
    2. {
    3. Tick_check_oneshot_broadcast (CPU);
    4. Tick_check_nohz (CPU);
    5. }

The most important function of the TICK_CHECK_NOHZ function is to update the jiffies count value:

[CPP] view plain copy

    1. static inline void tick_check_nohz (int cpu)
    2. {
    3. struct tick_sched *ts = &per_cpu (tick_cpu_sched, CPU);
    4. ktime_t now;
    5. if (!ts->idle_active &&!ts->tick_stopped)
    6. Return
    7. now = Ktime_get ();
    8. if (ts->idle_active)
    9. Tick_nohz_stop_idle (CPU, now);
    10. if (ts->tick_stopped) {
    11. Tick_nohz_update_jiffies (now);
    12. Tick_nohz_kick_tick (CPU, now);
    13. }
    14. }

Another situation is that when exiting the timer interrupt, it is necessary to re-evaluate the operation of the cycle clock, which has been explained in section 2.3, and is not described here.

Linux time Subsystem eight: Dynamic clock frame (config_no_hz, tickless)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.