Timer mechanism and process scheduling under nohz

Source: Internet
Author: User

Before kernel 2.6.21, clock interruption is a period, that is, the frequency is Hz. The system always passively accepts clock interruption and then processes the interruption.Program. If there is no task to run, execute the idle. This may also be a kind of idea, but it may be because the clock is interrupted or the idle is periodically broken, And then you can check whether there is anything you need to do, if you do not continue idle
In the past, the process was running in a specific fixed time slice. The scheduled interruption of the clock provided the monitoring work for the time slice. Everything seemed very harmonious, but the system kernel itself was not sovereign, everything is done under the hardware arrangement.
Later than 2.6.22, nohz appeared. In fact, nohz dynamically sets the next interruption time rather than using the system's unconditional default Hz interruption.
In this way, the CFS scheduler no longer needs to be subject to the underlying clock and time slice allocation features. in Linux, You can dynamically set the length of time slice and schedule it in your own way.

 

Nohz is actually a blessing for the abstracted clocksource and clock_event_device, clocksource and clock_event_device. These two structures are the abstraction of clock and clock behavior.

 

familiar with two data structures
struct timer_list: Software Clock, which records the expiration time of the Software Clock and the operations to be performed after the expiration.
struct tvec_base: structure used to organize and manage software clocks. In an SMP system, each CPU has one.
struct timer_list {
struct list_head entry; // linked list
unsigned long expires; // expiration time, unit: Tick
void (* function) (unsigned long); // callback function, operation executed after expiration
unsigned long data; // callback function parameter
struct tvec_t_base_s * base; // record the struct tvec_base variable where the Software Clock is located
# ifdef config_timer_stats
void * start_site;
char start_comm [16];
int start_pid;
# endif
};

Struct tvec_t_base_s {
Spinlock_t lock;
Struct timer_list * running_timer; // Software Clock being processed
Unsigned long timer_jiffies; // The expiration time of the Software Clock currently being processed
Tvec_root_t TV1; // saves all software clocks from timer_jiffies to timer_jiffies + 2 (including edge values ).
Tvec_t TV2; // saves all software clocks from the 8th power of timer_jiffies + 2 to the 14th power of timer_jiffies + 2 (including edge values) of the expiration time
Tvec_t TV3; // 14 ~ 20
Tvec_t TV4; // 20 ~ 26
Tvec_t TV5; // 26 ~ 32
}____ Cacheline_aligned;

Typedef struct tvec_t_base_s tvec_base_t;

// Start tracking the timer'sCode, Kernel version 2.6.24
/*
* This function runs timers and the timer-TQ in bottom half context.
*/
Static void run_timer_softirq (struct softirq_action * h) // the lower half of the timer interruption
{
Tvec_base_t * base = _ get_cpu_var (tvec_bases); // obtain the tvec_base_t structure data of the CPU.

Hrtimer_run_queues (); // there is a chance to switch to nohz or hres

If (time_after_eq (jiffies, base-> timer_jiffies) // if the current jiffies> = the timer expires, base-> timer_jiffies
_ Run_timers (base); // run the timer callback function
}

 

/*
* Called from timer softirq every jiffy, expire hrtimers:
*
* For XR its the fall back code to run the softirq In the timer
* Softirq context in case the hrtimer initialization failed or has
* Not been done yet.
*/
Void hrtimer_run_queues (void)
{
Struct hrtimer_cpu_base * cpu_base = & __ get_cpu_var (hrtimer_bases );
Int I;

If (hrtimer_hres_active ())
Return;

/*
* This _ is _ uugly: we have to check in the softirq context,
* Whether we can switch to highres and/or nohz mode.
* Clocksource switch happens in the timer interrupt
* Xtime_lock held. Notification from there only sets
* Check bit in the tick_oneshot code, otherwise we might
* Deadlock vs. xtime_lock.
*/
If (tick_check_oneshot_change (! Hrtimer_is_hres_enabled () // This If judgment refers to the Code for switching to hres or nohz
If (hrtimer_switch_to_hres ())
Return;

Hrtimer_get_softirq_time (cpu_base );

For (I = 0; I Run_hrtimer_queue (cpu_base, I );
}

/**
* Check, if a change happened, which makes oneshot possible.
*
* Called cyclic from the hrtimer softirq (driven by the timer
* Softirq) allow_nohz signals, that we can switch into low-res nohz
* Mode, because high resolution timers are disabled (either compile
* Or runtime ).
*/
Int tick_check_oneshot_change (INT allow_nohz)
{
Struct tick_sched * Ts = & __ get_cpu_var (tick_cpu_sched );

If (! Test_and_clear_bit (0, & TS-> check_clocks ))
Return 0;

If (ts-> nohz_mode! = Nohz_mode_inactive)
Return 0;

If (! Timekeeping_is_continuous () |! Tick_is_oneshot_available ())
Return 0;

If (! Allow_nohz)
Return 1;

Tick_nohz_switch_to_nohz (); // switch to nohz if the adjustment is met
Return 0;
}

/**
* Tick_nohz_switch_to_nohz-switch to nohz Mode
*/
Static void tick_nohz_switch_to_nohz (void)
{
Struct tick_sched * Ts = & __ get_cpu_var (tick_cpu_sched );
Ktime_t next;

If (! Tick_nohz_enabled)
Return;

Local_irq_disable ();
If (tick_switch_to_oneshot (tick_nohz_handler) {// change timer to oneshot mode (one-time timer), and specify the callback function tick_nohz_handler
Local_irq_enable ();
Return;
}

Ts-> nohz_mode = nohz_mode_lowres;

/*
* Recycle the hrtimer IN ts, so we can share
* Hrtimer_forward with the highres code.
*/
Hrtimer_init (& TS-> sched_timer, clock_monotonic, hrtimer_mode_abs );
/* Get the next period */
Next = tick_init_jiffy_update ();

For (;;){
Ts-> sched_timer.expires = next;
If (! Tick_program_event (next, 0 ))
Break;
Next = ktime_add (next, tick_period );
}
Local_irq_enable ();

Printk (kern_info "switched to nohz mode on CPU # % d \ n ",
Smp_processor_id ());
}

/*
* The nohz low res interrupt handler
*/
Static void tick_nohz_handler (struct clock_event_device * Dev)
{
Struct tick_sched * Ts = & __ get_cpu_var (tick_cpu_sched );
Struct pt_regs * regs = get_irq_regs ();
Int CPU = smp_processor_id ();
Ktime_t now = ktime_get ();

Dev-> next_event.tv64 = ktime_max;

/*
* Check if the do_timer duty was dropped. We don't care about
* Concurrency: this happens only when the CPU IN CHARGE WENT
* Into a long sleep. If two CPUs happen to assign themself
* This duty, then the jiffies update is still serialized
* Xtime_lock.
*/
If (unlikely (tick_do_timer_cpu =-1 ))
Tick_do_timer_cpu = CPU;

/* Check, if the jiffies need an update */
If (tick_do_timer_cpu = CPU)
Tick_do_update_jiffies64 (now );

/*
* When we are idle and the tick is stopped, we have to touch
* The Watchdog as we might not schedule for a really long
* Time. This happens on complete idle SMP systems while
* Waiting on the login prompt. We also increment the "Start
* Of idle "jiffy stamp so the idle accounting adjustment we
* Do when we go busy again does not account too much ticks.
*/
If (ts-> tick_stopped) {// the tick of idle has been stopped. Feed the dog to idle.
Touch_softlockup_watchdog ();
Ts-> idle_jiffies ++;
}

Update_process_times (user_mode (regs); // call the callback function of process scheduling.
Profile_tick (cpu_profiling );

/* Do not restart, when we are in the idle loop */
If (ts-> tick_stopped)
Return;

While (tick_nohz_reprogram (TS, now) {// reset the timer
Now = ktime_get ();
Tick_do_update_jiffies64 (now); // modify jiffies here
}
}

// Tick_sched Structure
/**
* Struct tick_sched-sched tick emulation and no idle tick control/stats
* @ Sched_timer: hrtimer to schedule the periodic tick in high
* Resolution Mode
* @ Idle_tick: store the last idle tick expiry time when the tick
* Timer is modified for idle sleeps. This is necessary
* To resume the tick timer operation in the timeline
* When the CPU returns from idle
* @ Tick_stopped: indicator that the idle Tick has been stopped // The idle Tick has been stopped
* @ Idle_jiffies: jiffies at the entry to idle for idle time accounting
* @ Idle_cils: Total number of idle CILS
* @ Idle_sleeps: Number of idle CILS, where the sched tick was stopped
* @ Idle_entrytime: time when the idle call was entered
* @ Idle_sleeptime: Sum of the time slept in idle with sched tick stopped
* @ Sleep_length: duration of the current idle sleep
*/
Struct tick_sched {
Struct hrtimer sched_timer;
Unsigned long check_clocks;
Enum tick_nohz_mode nohz_mode;
Ktime_t idle_tick;
Int tick_stopped;
Unsigned long idle_jiffies;
Unsigned long idle_cils;
Unsigned long idle_sleeps;
Ktime_t idle_entrytime;
Ktime_t idle_sleeptime;
Ktime_t sleep_length;
Unsigned long last_jiffies;
Unsigned long next_jiffies;
Ktime_t idle_expires;
};

/*
* Called from the timer interrupt handler to charge one tick to the current
* Process. user_tick is 1 if the tick is user time, 0 for system.
*/
Void update_process_times (INT user_tick)
{
Struct task_struct * P = current;
Int CPU = smp_processor_id ();

/* Note: This timer IRQ context must be accounted for as well. */
account_process_tick (p, user_tick);
run_local_timers ();
If (rcu_pending (CPU)
rcu_check_callbacks (CPU, user_tick );
scheduler_tick (); // Process Scheduling for each tick
run_posix_cpu_timers (p);
}

/*
* This function gets called by the timer code, with Hz frequency.
* We call it with interrupts disabled.
* It also gets called by the fork code, when changing the parent's
* timeslices.
*/
void scheduler_tick (void)
{< br> int CPU = smp_processor_id ();
struct RQ * RQ = cpu_rq (CPU );
struct task_struct * curr = RQ-> curr;
u64 next_tick = RQ-> tick_timestamp + tick_nsec;

Spin_lock (& RQ-> lock );
_ Update_rq_clock (RQ );
/*
* Let RQ-> clock advance by at least tick_nsec:
*/
If (unlikely (RQ-> clock <next_tick ))
RQ-> clock = next_tick;
RQ-> tick_timestamp = RQ-> clock;
Update_cpu_load (RQ );
If (curr! = RQ-> idle)/* fixme: needed? */
Curr-> sched_class-> task_tick (RQ, curr); // common process tick scheduling: task_tick_fair; or real-time process: task_tick_rt; or task_tick_idle
Spin_unlock (& RQ-> lock );

# Ifdef config_smp
RQ-> idle_at_tick = idle_cpu (CPU );
Trigger_load_balance (RQ, CPU );
# Endif
}

// Each process scheduling policy has such a struct. The following is fair_sched_class, which is a common process scheduling policy.
/*
* All the scheduling class methods:
*/
Static const struct sched_class fair_sched_class = {
. Next = & idle_sched_class,
. Enqueue_task = enqueue_task_fair,
. Dequeue_task = dequeue_task_fair,
. Yield_task = yield_task_fair,

. Check_preempt_curr = check_preempt_wakeup,

. Pick_next_task = pick_next_task_fair,
. Put_prev_task = put_prev_task_fair,

# Ifdef config_smp
. Load_balance = load_balance_fair,
. Move_one_task = move_one_task_fair,
# Endif

. Set_curr_task = set_curr_task_fair,
. Task_tick = task_tick_fair, // tick scheduling of common processes
. Task_new = task_new_fair,
};

// It will be tracked here. Task_tick_fair and task_tick_rt for further analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.