Personalized clock nohz and hres in Linux kernel (1)

Source: Internet
Author: User

The guys who designed the Linux kernel think very well. As mentioned above, the character of the Linux kernel is passion. As long as the hardware design is flexible enough, designers will try their best to make full use of it, they never care about the consequences. Sometimes they abandon the hardware suggestions. The guys who design the Linux Kernel on the latest kernel think really well, as mentioned above, the character of Linux kernel is passion. As long as the hardware design is flexible enough, designers will try their best to give full play to the points and dead corners that can be freely put into play, and they will never care about the consequences, sometimes, the hardware advice is abandoned. The nohz of the latest kernel is a pioneering work. Clock interruption is required by the computer system. Just as a human must have a heartbeat, the human heartbeat is a cycle, and the "Heartbeat" of the computer system is also a cycle. Therefore, clock interruption occurs at a fixed time.

Is that true? Linux Kernel designers think that if the cpu is idle, there is no need for heartbeat. After all, computers are not a self-organizing system, and energy is supplied by external power, while humans are self-organizing entities, therefore, people must have a periodic heartbeat to generate their own energy. As long as the computer's external power is constantly working and the clock is programmable, it is possible that the non-cyclic heartbeat or even the heartbeat is stopped, linux Kernel implements this. Before kernel 2.6.21, clock interruption was cyclical. After that, a new clock encapsulation structure clock_event_device and clocksource were introduced, so you can implement your own personalized clock more flexibly, this personalized clock is nohz and hres. Of course, the clock interruption is still periodic when the system is started. When timer_interrupt is called, the timer Soft Interrupt will be triggered, and the next Soft Interrupt Processing will find a chance to switch to nohz or hres, the Code is as follows:

 
 
  1. Void run_local_timers (void)
  2. {
  3. Hrtimer_run_queues (); // process high-precision clock queues first
  4. Raise_softirq (TIMER_SOFTIRQ); // trigger Soft Interrupt. For the processing function, see:
  5. Softlockup_tick ();
  6. }
  7. Static void run_timer_softirq (struct softirq_action * h)
    // Soft interrupt handler
  8. {
  9. Struct tvec_base * base = _ get_cpu_var (tvec_bases );
  10. Hrtimer_run_pending (); // there is a chance to switch to nohz or hres
  11. If (time_after_eq (jiffies, base-> timer_jiffies ))
  12. _ Run_timers (base );
  13. }
  14. Void hrtimer_run_pending (void)
  15. {
  16. Struct hrtimer_cpu_base * cpu_base = & __ get_cpu_var (hrtimer_bases );
  17. If (hrtimer_hres_active () // if yes, no switchover is required.
  18. Return;
  19. If (tick_check_oneshot_change (! Hrtimer_is_hres_enabled ()))
    // This if judgment refers to the Code for switching to hres or nohz
  20. Hrtimer_switch_to_hres ();
  21. Run_hrtimer_pending (cpu_base );
  22. }
  23. Int tick_check_oneshot_change (int allow_nohz)
  24. {
  25. Struct tick_sched * ts = & __ get_cpu_var (tick_cpu_sched );
  26. If (! Test_and_clear_bit (0, & ts-> check_clocks ))
    // The various judgments starting from this indicate the conditions required for switching.
  27. Return 0;
  28. If (ts-> nohz_mode! = NOHZ_MODE_INACTIVE)
  29. Return 0;
  30. If (! Timekeeping_valid_for_hres () |! Tick_is_oneshot_available ())
  31. Return 0;
  32. If (! Allow_nohz) // If hres is allowed, 1 is returned, and the hres high-precision mode is switched.
  33. Return 1;
  34. Tick_nohz_switch_to_nohz ();
    // If you do not have the chance to switch to the high-precision mode, all the previous verifications have passed. At least the nohz mode is switched here
  35. Return 0;
  36. }

The specific switchover between hres mode and nohz mode is handled by hrtimer_switch_to_hres and tick_nohz_switch_to_nohz. There is no trace of code. What is the significance of the association between hres and nohz? Hres is not actually a periodic interruption, but a very precise determination of the interruption. It uses the hrtimer trigger time to program the clock so as to trigger the interruption at that time, nohz only indicates that non-cyclic time can be used for clock programming, and there is no requirement for accuracy.

In hres, all things are handled by one hrtimer. For example, operations such as the original cycle scheduling and statistics of the current process are directly performed in timer_interrupt, while in hres mode, the above operation has a dedicated hrtimer. When the event_handler of clock_event_device is executed, all the operations are encapsulated into the event_handler of clock_event_device, And the event_handler is assigned a value when switching to hres or nohz ), this function traverses all hrtimers, and all hrtimers are organized into a red/black tree. The expired hrtimer is linked to a linked list, and then the hrtimer callback function of this linked list is executed in the Soft Interrupt, for other hrtimers, execute immediately: All hrtimers are classified into two types. One class cannot be executed in Soft Interrupt, which is urgent. The other class can be executed in Soft Interrupt, which is not urgent. For pure nohz non-hres mode, event_handler is still a traditional processing method, but the next interruption time can be programmed at will. In this way, the time measurement can achieve the accuracy of sodium seconds.

Whenever the cpu executes cpu_idle, the kernel will find a chance to stop the heartbeat of the system, and trigger the heartbeat at the right time, instead of the cycle heartbeat. What is this time? If everything is done by hrtimer, this time is to find out the expiration time of the recently expired timer. Although the cycle clock is stopped, other hardware interruptions are not stopped, hardware interruption may trigger some events, such as scheduling, such as releasing a new timer. Therefore, you must check the expiration status of the latest hrtimer and re-schedule requests after each hardware interruption, if so, stop immediately and switch out the idle process in the skip mode. The following code demonstrates this and calls irq_enter every time a hardware interrupt is handled:

 
 
  1. Void irq_enter (void)
  2. {
  3. # Ifdef CONFIG_NO_HZ
  4. Int cpu = smp_processor_id ();
  5. If (idle_cpu (cpu )&&! In_interrupt ())
  6. Tick_nohz_stop_idle (cpu );
  7. # Endif
  8. _ Irq_enter ();
  9. # Ifdef CONFIG_NO_HZ
  10. If (idle_cpu (cpu ))
  11. Tick_nohz_update_jiffies (); // update the timing. The nohz mode triggers the next
    The timing of interruption. How can this problem be solved? Look at this call condition, only when the cpu is in the idle status
    Update time, because the cycle clock may have been stopped when the cpu is in the idle, in order not to lose
    The information must be added during the interruption.
  12. # Endif
  13. }

In nohz mode, the interruption is "almost" Periodic. the literal meaning of nohz is non-cyclic, but it is still a basic cycle because it does not have any time point basis for the next clock interruption; however, hres is completely interrupted by random clock because its event_handler is the hrtimer operating on the red/black tree. Therefore, it can use the expiration time of the next expired hrtimer as the time when the next clock interruption is triggered. You must know that in hres mode, all time-related operations, such as timing, the cycle scheduling and so on are all undertaken by hrtimer. If you want to choose the next time the clock interruption is triggered, it cannot be arbitrated in a certain hrtimer's processing function, it must be arbitrated in the event_handler function that processes all hrtimer globally. This is everything. Let's take a look at cpu_idle:

 
 
  1. void cpu_idle(void)  
  2. {  
  3. int cpu = smp_processor_id();  
  4. current_thread_info()->status |= TS_POLLING;  
  5. /* endless idle loop with no priority at all */  
  6. while (1) {  
  7. tick_nohz_stop_sched_tick(1);   
  8. while (!need_resched()) {  
  9. check_pgt_cache();  
  10. rmb();  
  11. if (rcu_pending(cpu))  
  12. rcu_check_callbacks(cpu, 0);  
  13. if (cpu_is_offline(cpu))  
  14. play_dead();  
  15. local_irq_disable();  
  16. __get_cpu_var(irq_stat).idle_timestamp = jiffies;  
  17. /* Don't trace irqs off for idle */  
  18. stop_critical_timings();  
  19. pm_idle();  
  20. start_critical_timings();  
  21. }  
  22. tick_nohz_restart_sched_tick();  
  23. preempt_enable_no_resched();  
  24. schedule();  
  25. preempt_disable();  
  26. }  

In the token, next_jiffies = get_next_timer_interrupt (last_jiffies) is called. This statement is used to find the latest timer or hrtimer and use its expiration time as the next clock interruption time. In tick_nohz_stop_sched_tick, check the rescheduling flag. If it is set to a bit, the system will immediately return no more than nohz, in fact, the tick_nohz_stop_sched_tick function must be called in irq_exit after each hardware interruption to re-compile the clock program whenever possible.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.