Linux time Subsystem 5: Principle and Implementation of low-resolution timer

Source: Internet
Author: User

With the timer, we can set to trigger a specific event at a certain time point in the future. The so-called low-resolution timer refers to the timer unit counting based on the jiffies value, that is, it has only 1/Hz precision, if your Kernel configuration Hz is 1000, that means the precision of the low-resolution timer in the system is 1 ms. In earlier kernel versions, the kernel does not support high-precision timers. Of course, we can only use this low-resolution timer. Sometimes we turn this Hz-based timer mechanism into a time wheel: Time wheel. Although there was a high-resolution timer later, it was only an optional configuration item for the kernel, so until now the latest kernel version, this low-resolution timer is still widely used.
/*************************************** **************************************** **********************/
Statement: the content of this blog is created at http://blog.csdn.net/droidphone. please refer to it for help. Thank you!
/*************************************** **************************************** **********************/

1. How to Use the timer

Before discussing the implementation principle of the timer, let's take a look at how to use the timer. To use a timer in kernel programming, first we need to define a time_list structure, which is defined in include/Linux/Timer. h:

struct timer_list {/* * All fields that change during normal runtime grouped to the * same cacheline */struct list_head entry;unsigned long expires;struct tvec_base *base;void (*function)(unsigned long);unsigned long data;int slack;        ......};

EntryFields are used to form a linked list of timers. As for how the kernel groups timers, we will explain them later.

ExpiresField indicates the expiration time of the timer, that is, the jiffies Count value of the expected expiration time of the timer.

BaseEach CPU has its own tvec_base structure for timer management. This field points to the tvec_base structure corresponding to the CPU to which the timer belongs.

FunctionThe field is a function pointer. When the timer expires, the system will call the callback function to respond to the expiration event of the timer.

DataThis field is used for parameters of the preceding callback function.

SlackFor some timer that is not sensitive to the accuracy of the expiration time, the expiration time can be properly delayed for a short period of time. This field is used to calculate the number of Hz for each delay.

To define a timer_list, we can use static and dynamic methods. The static method uses the define_timer macro:

# Define define_timer (_ name, _ function, _ expires, _ data)

The Macro will get a name named _ name, and fill the related fields of timer_list with the _ function, _ expires, and _ DATA parameters respectively.

If you want to use a dynamic method, you can declare a timer_list structure and manually initialize its fields:

struct timer_list timer;......init_timer(&timer);timer.function = _function;timer.expires = _expires;timer.data = _data;

To activate a timer, we only need to call add_timer:

add_timer(&timer);

To modify the timer expiration time, we only need to call mod_timer:

mod_timer(&timer, jiffies+50);

To remove a timer, we only need to call del_timer:

del_timer(&timer);

The timer system also provides the following APIs for our use:

  • Void add_timer_on (struct timer_list * timer, int CPU); // Add a timer on the specified CPU
  • Int mod_timer_pending (struct timer_list * timer, unsigned long expires); // modify the expiration time of timer only when the timer is activated.
  • Int mod_timer_pinned (struct timer_list * timer, unsigned long expires); // when
  • Void set_timer_slack (struct timer_list * time, int slack_hz); // sets the maximum latency allowed by timer for timer that is not accurate-sensitive.
  • Int del_timer_sync (struct timer_list * timer); // If the timer is being processed, it will not be removed until the timer completes processing.
2. Software Architecture of Timer

The low-resolution timer is implemented based on Hz. That is to say, a timer may expire in every tick cycle. For how to generate a tick, see four of the Linux time subsystems: timer engine: clock_event_device. There may be hundreds of thousands of timers in the system. Is it possible to traverse all the timers in each tick interrupt and check whether they expire? The kernel certainly won't use such a stupid method. It uses a smarter method: grouping timers Based on the timer expiration time. Because the current multi-core processor is widely used, the processor that connects to the smart phone is still 4 cores, and the kernel has good support for multi-core processors, low-resolution timers fully consider the support and optimization of multi-core processors. To make better use of the cache
Line. To avoid mutual locks between CPUs, the kernel allocates data structures and resources for the Management timer for each CPU in a multi-core processor. Each CPU independently manages its own timer.

2.1 timer grouping

First, the kernel defines a tvec_base structure pointer for each CPU:

static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;

The tvec_base structure is defined as follows:

struct tvec_base {spinlock_t lock;struct timer_list *running_timer;unsigned long timer_jiffies;unsigned long next_timer;struct tvec_root tv1;struct tvec tv2;struct tvec tv3;struct tvec tv4;struct tvec tv5;} ____cacheline_aligned;

Running_timerThis field points to the timer_list structure corresponding to the timer currently being processed by the CPU.

Timer_jiffiesThis field indicates the number of jiffies that the current CPU timer has experienced. In most cases, this value is equal to the jiffies Count value. When the CPU idle status continues for multiple jiffies consecutive times, when you exit the idle status, the jiffies Count value is greater than this field. After the next tick interruption, the timer system will make the value of this field catch up with the jiffies value.

Next_timerThis field points to the next timer about to expire for the CPU.

TV1 -- TV5These five fields are used to group the timer. In fact, TV1 -- TV5 is a linked list array, where the size of the TV1 array is tvr_size, and the size of the TV2 TV3 TV4 TV5 array is tvn_size, depending on the config_base_small configuration items, they have different sizes:

#define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)#define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)#define TVN_SIZE (1 << TVN_BITS)#define TVR_SIZE (1 << TVR_BITS)#define TVN_MASK (TVN_SIZE - 1)#define TVR_MASK (TVR_SIZE - 1)struct tvec {struct list_head vec[TVN_SIZE];};struct tvec_root {struct list_head vec[TVR_SIZE];};

By default, config_base_small is not enabled, tvr_size is 256, and tvn_size is 64. To save memory space, you can also enable config_base_small, in this case, the tvr_size is 64 and the tvn_size is 16. The following discussions are based on the absence of config_base_small. When a new timer is to be added, the system determines which array of TV1 to TV5 is put into the timer based on the difference between the jiffies value expired by the timer and the timer_jiffies field, the organizational structure of all the timers in the system is shown in:

Figure 2.1.1 organizational structure of the timer IN THE SYSTEM
2.2 Add a timer

To add a new timer, we can use the API function add_timer or mod_timer. The final work will be handled by the internal_add_timer function. Follow these steps:

  • Calculate the difference between the timer expiration time and the timer_jiffies field in the tvec_base structure of the CPU, which is recorded as idx;
  • According to the idx value, select the list array that the timer should be put in TV1 -- TV5, it can be considered that the tv1-tv5 occupies a 32-bit different bit, TV1 occupies the lowest 8 bits, TV2 occupies the next 6 bits, and TV3 occupies the space again. Similarly, the highest 6 bits are allocated to tv5. The final selection rules are shown in the following table:
Linked List Array Idx range
TV1 0-255 (2 ^ 8)
TV2 256--16383 (2 ^ 14)
TV3 16384--1048575 (2 ^ 20)
TV4 1048576--67108863 (2 ^ 26)
TV5 67108864--4294967295 (2 ^ 32)

After determining the linked list array, determine which linked list to put the timer into the array. If the time difference idx is less than 256, add the timer to TV1 according to the rules, because TV1 contains 256 linked lists, therefore, you can simply use the low 8 bits of timer_list.expires as the index subscript of the array and link the timer to the corresponding linked list in TV1. If the idx value of the time difference is between 256--18383, you need to put the timer into TV2. Similarly, use the 8-14 bits of timer_list.expires as the index subscript of the array, link the timer to the corresponding linked list in TV2 ,. The timer should be added to TV3 TV4 TV5 to use the same principle. After grouping the timer, in the subsequent tick events, the system can conveniently locate and retrieve the corresponding expiration timer for processing. The above discussion is reflected in the Code of internal_add_timer:

static void internal_add_timer(struct tvec_base *base, struct timer_list *timer){unsigned long expires = timer->expires;unsigned long idx = expires - base->timer_jiffies;struct list_head *vec;if (idx < TVR_SIZE) {int i = expires & TVR_MASK;vec = base->tv1.vec + i;} else if (idx < 1 << (TVR_BITS + TVN_BITS)) {int i = (expires >> TVR_BITS) & TVN_MASK;vec = base->tv2.vec + i;} else if (idx < 1 << (TVR_BITS + 2 * TVN_BITS)) {int i = (expires >> (TVR_BITS + TVN_BITS)) & TVN_MASK;vec = base->tv3.vec + i;} else if (idx < 1 << (TVR_BITS + 3 * TVN_BITS)) {int i = (expires >> (TVR_BITS + 2 * TVN_BITS)) & TVN_MASK;vec = base->tv4.vec + i;} else if ((signed long) idx < 0) {                ......} else {                ......i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;vec = base->tv5.vec + i;}list_add_tail(&timer->entry, vec);}

2.2 timer expiration Processing

After processing in section 2.1, the timer in the system is periodically placed in the list arrays of TV1-TV5 according to the expiration time, among them, TV1 is placed in the next 256 jiffies timer lists to be expired. It should be noted that not in tv1.vec [0], there is a list of timers that will expire immediately, in tv1.vec [1], there is a list of timers that will expire in jiffies + 1. Because base. the value of timer_jiffies keeps increasing dynamically as the system runs. In principle, each tick event will add 1, base. timer_jiffies indicates the current time of the CPU timer system. The timer is also dynamically added to the first 256 linked list TV1, which is discussed in section 2.1, the subscript index used by the timer in TV1 is the 8-bit lower of the timer expiration time expires, So assume that the current base. if the timer_jiffies value is 0x34567826, the timer that expires immediately is in tv1.vec [0x26]. If the system adds a timer that expires at the jiffies value 0x34567828, it will be added to tv1.vec [0x28]. After running two ticks, base. the value of timer_jiffies will change to 0x34567828. Obviously, in each tick event, the timer system only needs to base. the low 8 bits of timer_jiffies are used as indexes to retrieve the corresponding linked list in TV1, which exactly contains all List of timers whose Es value expires.

So when will the timer in TV2 -- TV5 be processed? Every time base. when the low 8 bits of timer_jiffies are 0, this indicates base. the 8-13 bits of timer_jiffies have an increment. These 6 bits represent TV2. In this case, you only need to press base. the value of the 8-13 bits of timer_jiffies is used as the subscript, removed from the timer linked list in TV2, and added to the timer system using internal_add_timer, because these timers will expire in the next 256 tick period, they will be added to the TV1 array, so that the TV2 migration to TV1 is completed. Similarly, when base. when the 8-13 bits of timer_jiffies are 0, this indicates base. timer_jiffies's 14-19 BITs occur. These 6 bits represent TV3, which is based on base. the value of the 14-19 bits of timer_jiffies is used as the subscript, removed from the timer linked list corresponding to TV3, and added them to the timer system using internal_add_timer. Obviously they will be added to TV2, so as to complete the migration from TV3 to TV2, the processing of TV4 and TV5 can be like this. The specific migration code is as follows. The index parameter is the array index to be migrated for a pre-computed high-level TV:

Static int cascade (struct tvec_base * base, struct tvec * TV, int index) {/* cascade all the timers from TV up one level */struct timer_list * timer, * TMP; struct list_head TV _list; list_replace_init (TV-> VEC + index, & TV _list); // remove the linked list to be migrated/** we are removing _ all _ timers from the list, so we * Don't have to detach them individually. */list_for_each_entry_safe (timer, TMP, & TV _list, entry) {bug_on (tbase _ Get_base (timer-> base )! = Base); // re-join to the timer system. In fact, it will be migrated to the internal_add_timer (base, timer);} Return Index ;}

When each tick event arrives, the kernel will activate the timer Soft Interrupt during tick scheduled Interrupt Processing: timer_softirq. For software interruption, refer to another blog article: Linux interrupt (Interrupt) subsystem 5: software interruption (softirq. The execution function of timer_softirq is _ run_timers. It implements the logic discussed in this section, retrieves the expired timer in TV1, and executes the callback function of the timer, the callback function of the low-resolution timer is executed in the context of the software interrupt, which requires attention when writing the callback function of the timer. The code for _ run_timers is as follows:

Static inline void _ run_timers (struct tvec_base * base) {struct timer_list * timer; spin_lock_irq (& base-> lock);/* synchronize jiffies, in no_hz, base-> timer_jiffies may lag behind more than one tick */while (time_after_eq (jiffies, base-> timer_jiffies) {struct list_head work_list; struct list_head * head = & work_list; /* calculate the index of the expiration timer linked list in TV1 */INT Index = base-> timer_jiffies & tvr_mask; /**/* TV2 -- TV5 timer list migration process */If (! Index &&(! Cascade (base, & base-> TV2, index (0 )))&&(! Cascade (base, & base-> TV3, index (1 )))&&! Cascade (base, & base-> TV4, index (2) cascade (base, & base-> TV5, index (3 )); /* the running time of the CPU timer system increases by a tick */++ base-> timer_jiffies;/* retrieve the expired timer linked list */list_replace_init (base-> tv1.vec + index, & work_list);/* traverse all expiration timers */while (! List_empty (head) {void (* fN) (unsigned long); unsigned long data; timer = list_first_entry (Head, struct timer_list, entry); fn = timer-> function; data = timer-> data; timer_stats_account_timer (timer); base-> running_timer = timer;/* mark the timer being processed */detach_timer (timer, 1 ); spin_unlock_irq (& base-> lock); call_timer_fn (timer, FN, data);/* call the timer callback function */spin_lock_irq (& base-> lock );}} base-> running_timer = NULL; spin_unlock_irq (& base-> lock );}

Through the above discussion, we can find that the implementation of the kernel's low-resolution timer is very subtle, not only implements a large number of timer management, but also achieves fast O (1) the ability to search for expiration timers uses a clever array structure to process a migration operation at an interval of 256 tick times. Five arrays are like five gears, they keep turning with the growth of base-> timer_jifffies. Each time you only need to handle a certain tooth section of the first gear, the lower gear turns around and the higher gear turns one tooth, at the same time, the timer that is about to expire is automatically migrated to the previous gear, so the low-resolution timer is usually called the time wheel: Time
Wheel. In fact, its implementation is a good space-for-time software algorithm.

3. software interruption of Timer

During system initialization, start_kernel calls the initialization function init_timers of the timer system:

Void _ init init_timers (void) {int err = timer_cpu_notify (& timers_nb, (unsigned long) cpu_up_prepare, (void *) (long) smp_processor_id (); Forward (); bug_on (Err! = Policy_ OK); Register (& timers_nb);/* Register CPU quota y for timer migration between CPUs during hotplug */open_softirq (timer_softirq, run_timer_softirq );}

It can be seen that open_softirq registers run_timer_softirq as a processing function of timer_softirq. In addition, when every tick event of the CPU arrives, update_process_times will be called during the event processing interruption, this function will call run_local_timers further, and run_local_timers will trigger the timer_softirq Soft Interrupt:

void run_local_timers(void){hrtimer_run_queues();raise_softirq(TIMER_SOFTIRQ);}

The processing function of timer_softirq is run_timer_softirq:

static void run_timer_softirq(struct softirq_action *h){struct tvec_base *base = __this_cpu_read(tvec_bases);hrtimer_run_pending();if (time_after_eq(jiffies, base->timer_jiffies))__run_timers(base);}

Okay, now we can see the _ run_timers function. As described in section 2.2, this function completes the processing of the expiration timer and the constant rotation of the time wheel.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.