Linux time management involves data structures and implementations of traditional low-resolution clocks

Source: Internet
Author: User

The previous article roughly describes the basic situation of Linux time management, read some of Daniel's blog feel that the content of their writing is very scarce, but there is no way, can only improve themselves in this way ... Gossip doesn't say, this section describes the important data structures under time management

Device-related data structures

Clock source structure

struct clocksource{}

Clock device structure

struct Tick_device {struct Clock_event_device *evtdev;Enum Tick_device_mode mode;//records the mode of the corresponding clock event device};

Enum Tick_device_mode {tickdev_mode_periodic,//Cycle modetickdev_mode_oneshot,//single-point trigger mode};

Clock event Device Structure

struct Clock_event_device {}

Timer-related data structures

Low-Resolution timers

struct timer_list{}

struct tvec_base{}

struct Timerqueue_head {}

struct Timerqueue_node {}

High-Resolution Timers

struct hrtimer_cpu_base{}

struct hrtimer_clock_base{}

Time-related definitions

Union Ktime {S64tv64; #if bits_per_long! = &&!defined (config_ktime_scalar)struct {# ifdef __big_endianS32SEC, nsec;# ElseS32Nsec, sec;# endif} TV; #endif};

Low-resolution clock implementation

In low-resolution mode, the cycle clock and dynamic clock can be implemented (in a single-point-triggered mode). However, the current low-resolution dynamic clock incorporates a high-resolution processing framework, so this section only describes the implementation of the periodic clock at low resolution.

As mentioned earlier, when the device is in Tickdev_mode_periodic mode, it runs in cycle mode. The timer based on this implementation becomes a low-resolution timer. In this mode, the event occurs periodically, Hz times per second. Hz generally takes 250, that is, the interval between two interrupts is 4ms. This frequency is a bit low for computers. Of course, at compile time, the configuration option config_hz settings. The greater the Hz indicates that the more times a clock interrupt occurs within one second, the more tasks can be processed in a more timely manner, which is more applicable for systems with higher interactivity requirements. However, an increase in the number of interrupts also means that the CPU has been interrupted too many times, more kernel events need to be processed, and there is no small overhead for performance. Low-resolution timers based on low-resolution clocks are simpler to implement due to the direct periodic interruption of the clock device and the need to manually set the next event trigger time.

In low-resolution mode, the processing function for clock interrupts is timer_interrupt (under the IA 32 architecture). This function updates the global information primarily jiffies and updates the process time information. See function Xtime_update (nticks), and function Update_process_times.

Xtime_update
void xtime_update (unsigned long ticks) { write_seqlock (&jiffies_lock); Do_timer (ticks); Write_sequnlock (&jiffies_lock);}

The function calls the Do_timer, where ticks is the updated tick count

Do_timer
void Do_timer (unsigned long ticks) { /* Update jiffies*/jiffies_64 + = ticks; /* Update the wall time */update_wall_time (); /* Calculate global load */calc_global_load (ticks);}

In Do_timer not only updated the jiffies, but also updated the wall time. On the jiffies and wall time, the follow-up will be described in detail. The global load is also calculated at the end. In the Update_process_times function, the time of the current process is updated, the local timer is processed, and the periodic scheduler is invoked through Scheduler_tick. The code is as follows

update_process_times
void update_process_times (int user_tick) { struct task_struct *p = current; int cpu = SMP_PROCESSOR_ID (); /* Note:this Timer IRQ Context must be accounted to as well. */ Account_process_tick (P, User_tick);//update process time run_local_timers ();//handling Local timers rcu_ Check_callbacks (CPU, User_tick), #ifdef config_irq_work if (IN_IRQ ()) Irq_work_run (); #endif Scheduler_tick (); run_posix_cpu_timers (p);}

When the local timer is processed by a soft interrupt, that is, the timer processing time is in the processing of soft interrupts, because the soft interrupt is not a hardware interrupt, can not arbitrarily trigger execution, need to accept the system arrangements, so the execution of the timer may be delayed, but definitely not in advance. Recall the previous article on soft interrupts, in the soft middle of the type of TIMER_SOFTIRQ, is the corresponding ordinary timer processing. And here the timer processing is very simple, look at the code

Run_local_timers
void run_local_timers (void) { hrtimer_run_queues (); Raise_softirq (TIMER_SOFTIRQ);} The hrtimer_run_queues is designed to handle high-precision timers in low-precision mode, mainly when the high-precision mode is not started, and the function is empty after the high-precision mode is started. A soft interrupt of type TIMER_SOFTIRQ is then triggered. This will process the timer at the next soft interrupt processing time.

To the last call of the periodic scheduler Scheduler_tick, the function will eventually call to a specific scheduling class such as CFS cycle Scheduler, the Cycle Scheduler will update the current scheduling entity run time, update the current scheduling entity and the queue's virtual run time vruntime. If the dispatch entity is a process, you also need to update the time information for the group in which it resides, Cgroup related, and not in depth. Finally calculate whether the current queue time is still sufficient, if not enough to try to extend the runtime, if the extension runtime fails and the current task is not empty, set the reschedule bit. Check if there are other processes waiting to run, and if so, check for preemption, regardless of the high-resolution clock.

Handling of ordinary Timers

The function Run_timer_softirq is the handler function of the timer soft interrupt.

RUN_TIMER_SOFTIRQ
static void Run_timer_softirq (struct softirq_action *h) { struct tvec_base *base = __this_cpu_read (tvec_ bases); hrtimer_run_pending (); if (time_after_eq (jiffies, base->timer_jiffies)) __run_timers (base);}

The hrtimer_run_pending is a constant check on the normal timer to see if it can be turned into a high-resolution mode, if it can be converted. Then determine the current time and timer time, in the introduction of specific processing before introducing the general timer organization.

Organization of ordinary Timers

Because the timer is local to the CPU, each CPU maintains a timer management structure

Static DEFINE_PER_CPU (struct tvec_base *, tvec_bases) = &boot_tvec_bases;

The structure is described as follows

struct Tvec_base { spinlock_t lock; struct Timer_list *running_timer;//records the timers currently being processed unsigned long timer_jiffies;//before this timer has been processed, so each time a timer to increment the value unsigned long next_timer; unsigned long active_timers; /* Save the timer on the CPU */ struct Tvec_root TV1; struct TVEC TV2; struct Tvec TV3; struct Tvec TV4; struct Tvec TV5;} ____cacheline_aligned;struct Tvec { struct List_head vec[tvn_size];}; struct Tvec_root { struct List_head vec[tvr_size];};

There are two important structures for Tvec_root and Tvec recording timers. The system is mainly extracted from the first structure, and the latter is a standby storage. As can be seen, tvec_root and Tvec are a list header array, the former tvr_size items are generally 256, corresponding to 0-255 clock cycles expire timer, if there are multiple timers corresponding to the same time, then use the chain list maintenance. From 2-5 is the backing store, for the capacity descriptions of these groups are shown in the following table

group

time interval / clock cycle

individual capacity

tv1

0~ 255

1

tv2

256~214-1

tv3

214~220-1

214

TV4

220~226-1

tv5

226~232-1

226

Thus, a corresponding clock interval of the successor group is the whole interval of the whole precursor group, and when it is filled, one item from the successor group can be filled in the whole precursor group, for example, when the TV1 is finished, the first item can be taken out from Tv2, and the TV1 is filled. And so on There is also a timer_jiffies field in Tvec_base that indicates that the previous timers have been processed, so you need to increment this value each time you finish processing one of the table entries in Tv1. While the normal timer structure is timer_list, we only focus on a few fields

struct Timer_list {

/*

* All fields, during normal runtime grouped to the * Same cacheline */ struct L Ist_head entry; unsigned long expires; struct tvec_base *base; Void (*function) (unsigned long); unsigned Long data;

......

}

The first field entry as a node maintains its presence in a doubly linked list. Expires records the expiry time, in which jiffies,base points to the tvec_base to which it belongs, followed by a function pointer and a data field, which is the callback function for the timer registration, and data is the parameter. OK, see the specific process below, see __run_timers function

static inline void __run_timers (struct tvec_base *base)

{

struct Timer_list *timer; SPIN_LOCK_IRQ (&base->lock); while (Time_after_eq (Jiffies, base->timer_jiffies)) { struct List_head work_list; struct List_head *head = &work_list; int index = base->timer_jiffies & tvr_mask; /* * Cascade Timers: */ if (!index && (!cascade (base, &BASE->TV2, INDEX (0))) && (!cascade (base, &BASE->TV3, INDEX (1))) && !cascade (base, &BASE->TV4, INDEX (2))) Cascade (Base, &BASE->TV5, INDEX (3)); ++base->timer_jiffies; /* Get a timer list for a jiffies */ List_replace_init (Base->tv1.vec + index, &work_list); while (!list_empty (head)) { void (*FN) (unsigned long); unsigned long data; BOOL Irqsafe; /* Get timer */ Timer = list_first_entry (head, struct timer_list,entry); /* Get the callback function of the timer */ fn = timer->function; /* Get the parameters of the timer */ data = timer->data; Irqsafe = Tbase_get_irqsafe (timer->base); Timer_stats_account_timer (timer); Base->running_timer = timer; Detach_expired_timer (timer, base); if (Irqsafe) { Spin_unlock (&base->lock); /* Process the timer */ CALL_TIMER_FN (Timer, FN, data); Spin_lock (&base->lock); } else { SPIN_UNLOCK_IRQ (&base->lock); CALL_TIMER_FN (Timer, FN, data); SPIN_LOCK_IRQ (&base->lock); } } } Base->running_timer = NULL; SPIN_UNLOCK_IRQ (&base->lock);}

The function body is a large while loop, the loop condition is that the current time is greater than base->timer_jiffies, this period of time the timer has not been processed, this period of time is probably no timer, but always need to check. As already mentioned, the TV1 array of items corresponds to 0-255 clock cycles, one for each cycle, so here the subscript is obtained by Base->timer_jiffies & Tvr_mask, and the next if is to populate those arrays, Note that the initial execution is generally not populated, because base->timer_jiffies is not much larger than the current jiffies when it increases from multiples of 256. This follow-up is discussed. There is nothing unusual about self-increment base->timer_jiffies, then get the list based on index, then another loop to handle all timers at this time. There is nothing special here, then go to the timer structure timer_list, then get its callback function and parameters, and then execute the callback function through CALL_TIMER_FN. The timer has been removed from the list before processing. ( I have a question here, why not take it off after the process has been completed?) )

The filling problem of timer vectors

Refer to the code again


int index = base->timer_jiffies & tvr_mask; / * appears to be filled once per processing of 256 jiffies */* * Cascade Timers : */if (!index && ; (!cascade (base, &BASE->TV2, index (0))) && (!cascade (base, &BASE->TV3, Index ( 1))) && !cascade (base, &BASE->TV4, INDEX (2))) Cascade (Base, &BASE->TV5, INDEX (3));

......

After the system starts, the base->timer_jiffies is incremented, and the timer vector is populated once for each increment of 256 clock cycles. 256 clock cycles may have been processed in batches. May also be a long time no processing timer, the cumulative number of timers more, one-time processing a lot. It's not important here. The important thing is that every time base->timer_jiffies increments,

After 256, index is 0, and then it is populated from the next level of the vector group. And so on

#define INDEX (N) ((Base->timer_jiffies >> (tvr_bits + (n) * tvn_bits) & Tvn_mask)

The index macro is used to calculate the subscript in the source vector group, why this is not easy to understand, for example analysis

Now the base->timer_jiffies is incremented to 0x0000c300, which triggers the first vector group to be filled, the first vector group has a capacity of 256, so the index macro has a parameter of 0, where the right shift 8 bits are processed in 256 clock cycles; When the second vector group is filled with a capacity of 2^ (8+6), it is necessary to move the 8+6=14 bit to the right, and so on; we can see that the base->timer_jiffies of the last round must be 0x0000c2**, After the processing is completed, it is incremented to 0x0000c300, calculated according to the above formula 0x0000c300>>8&0x1f, get 3, that is, starting from the third item of the source vector group to populate. Because the previous item must have been filled up to the previous level and has already been processed, the next level of padding is triggered whenever index loops to 0 o'clock, one thing to be aware of, because Jiffies is constantly incrementing, and the arrangement in the vector group is arranged according to the timeline, for example, TV4 's first table entry is definitely empty. Because its content is dispersed in the former TV3-TV1, the first table entry of TV3 is also empty, its content is dispersed in tv1-tv2, so each fill handle is not specified to fill the fixed TV, but the uniform function __internal_add_timer is used. It is added according to the expiry time of each timer. When added from the latter vector group, it is added to all previous vector groups.

Come on, Emmanuel!

Resources:

linux3.10.1 Source

Deep Linux kernel architecture

Linux time management involves data structures and implementations of traditional low-resolution clocks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.