We have introduced the implementation principles of low-resolution timer and high-precision timer in the previous chapters. In order to facilitate other subsystems, the kernel provides some APIs for delay or scheduling in the time subsystem, for example, msleep, hrtimer_nanosleep, etc. These APIs are implemented based on low-resolution timer or high-precision timer, this chapter discusses how convenient and useful APIs use the timer system to complete required functions.
/*************************************** **************************************** **********************/
Statement: the content of this blog is created at http://blog.csdn.net/droidphone. please refer to it for help. Thank you!
/*************************************** **************************************** **********************/
1. msleep
Msleep is believed to have been used by everyone. It may be one of the most widely used delay functions in the kernel. It will schedule the current process and give the CPU a while, because of this feature, it cannot be used for interrupt context, but only for process context. To use the latency function in the interrupt context, use the non-scheduled version mdelay that blocks the CPU. Msleep function prototype is as follows:
void msleep(unsigned int msecs)
The delay time is specified by the msecs parameter in milliseconds. In fact, msleep is implemented based on a low-resolution timer, so the actual precision of msleep can only be 1/Hz. The kernel also provides another similar latency function msleep_interruptible:
unsigned long msleep_interruptible(unsigned int msecs)
The unit of latency is the same as the number of milliseconds. Their differences are as follows:
Function |
Unit of Delay |
Return Value |
Can be interrupted by signals? |
Msleep |
Millisecond |
None |
No |
Msleep_interruptible |
Millisecond |
Incomplete milliseconds |
Yes |
The main difference is that msleep ensures that the required latency will be completed, while msleep_interruptible can be interrupted by a signal halfway through the latency and exit the delay, the number of remaining latencies is returned through the return value. The final code of the two functions will reach the schedule_timeout function. Their call sequence is shown in:
Figure 1.1 call sequence of two delayed Functions
Let's take a look at the implementation of the schedule_timeout function. The function first processes two special cases. One is that the number of passed-in delayed jiffies is a negative number. Then, a warning message is printed and a message is returned immediately, the other is that the number of delayed jiffies is max_schedule_timeout, which indicates that the delay must be prolonged and the scheduling can be directly executed:
signed long __sched schedule_timeout(signed long timeout){struct timer_list timer;unsigned long expire;switch (timeout){case MAX_SCHEDULE_TIMEOUT:schedule();goto out;default:if (timeout < 0) {printk(KERN_ERR "schedule_timeout: wrong timeout ""value %lx\n", timeout);dump_stack();current->state = TASK_RUNNING;goto out;}}
Then calculate the number of expired jiffies, create a low-resolution timer on the stack, set the expiration time to the timer, and start the timer, use schedule to schedule the current process out of the CPU running queue:
expire = timeout + jiffies;setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);schedule();
At this time, the process has been scheduled, so how does it return to continue execution? We can see that the expiration callback function of the timer is process_timeout, and the parameter is the task_struct pointer of the current process to see its implementation:
static void process_timeout(unsigned long __data){wake_up_process((struct task_struct *)__data);}
Oh, that's right. Once the timer expires, the process will be awakened and continues to run:
del_singleshot_timer_sync(&timer);/* Remove the timer from the object tracker */destroy_timer_on_stack(&timer);timeout = expire - jiffies; out:return timeout < 0 ? 0 : timeout;}
After schedule is returned, it indicates that the timer expires or the process is awakened due to other times. The function must delete the timer created on the stack and return the remaining jiffies.
After the key schedule_timeout function is completed, let's see how msleep is implemented:
signed long __sched schedule_timeout_uninterruptible(signed long timeout){__set_current_state(TASK_UNINTERRUPTIBLE);return schedule_timeout(timeout);}void msleep(unsigned int msecs){unsigned long timeout = msecs_to_jiffies(msecs) + 1;while (timeout)timeout = schedule_timeout_uninterruptible(timeout);}
Msleep first converts the number of milliseconds to the number of jiffies, and ensures that all the latencies are completed through a while loop. The delayed operation is completed through the schedule_timeout_uninterruptible function, only after the process state is changed to task_uninterruptible, call schedule_timeout to complete the specific latency operation. The task_uninterruptible status ensures that msleep will not be awakened by signals, which means that the process cannot be killed during msleep.
Let's take a look at the implementation of msleep_interruptible:
signed long __sched schedule_timeout_interruptible(signed long timeout){__set_current_state(TASK_INTERRUPTIBLE);return schedule_timeout(timeout);}unsigned long msleep_interruptible(unsigned int msecs){unsigned long timeout = msecs_to_jiffies(msecs) + 1;while (timeout && !signal_pending(current))timeout = schedule_timeout_interruptible(timeout);return jiffies_to_msecs(timeout);}
Msleep_interruptible transfers data through schedule_timeout_interruptible. The only difference between schedule_timeout_interruptible and schedule_timeout_interruptible is to set the process status to task_interruptible, indicating, the number of remaining jiffies is converted to milliseconds. In fact, you can also use schedule_timeout_interruptible or schedule_timeout_uninterruptible to construct your own latency function. At the same time, the kernel also provides another similar function, so you don't need to explain it. You can see what it means:
signed long __sched schedule_timeout_killable(signed long timeout){__set_current_state(TASK_KILLABLE);return schedule_timeout(timeout);}
2. the msleep function discussed in section 1 of hrtimer_nanosleep is based on the time wheel timing system and can only provide millisecond-level precision. In fact, its precision depends on the Hz configuration value. If Hz is less than 1000, it cannot even achieve millisecond-level precision. To get a more precise latency, we naturally think of using a high-precision timer. That's right, Linux provides an API for user space: nanosleep, which can provide latency precision in nanoseconds. The kernel implementation of this user space function is sys_nanosleep, its work is implemented by the hrtimer_nanosleep function of the high-precision timer system, and most of the final work is completed by do_nanosleep. The call process is shown in figure 2.1. The call process of nanosleep is similar to that of msleep. The hrtimer_nanosleep function first creates a high-precision timer in the stack and sets its expiration time, then, do_nanosleep is used to complete the final delay. After the current process suspends the corresponding delay time, it exits the do_nanosleep function, destroys the timer in the stack, and returns 0 to indicate that the execution is successful. However, do_nanosleep may exit due to other reasons when the required latency is not reached. In this case, the last part of hrtimer_nanosleep records the remaining latency time in the restart_block of the process, the error code erestart_restartblock is returned. The system or user space can decide whether to re-call nanosleep Based on the returned value to continue the execution of the remaining latency. The following is the hrtimer_nanosleep code:
long hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp, const enum hrtimer_mode mode, const clockid_t clockid){struct restart_block *restart;struct hrtimer_sleeper t;int ret = 0;unsigned long slack;slack = current->timer_slack_ns;if (rt_task(current))slack = 0;hrtimer_init_on_stack(&t.timer, clockid, mode);hrtimer_set_expires_range_ns(&t.timer, timespec_to_ktime(*rqtp), slack);if (do_nanosleep(&t, mode))goto out;/* Absolute timers do not update the rmtp value and restart: */if (mode == HRTIMER_MODE_ABS) {ret = -ERESTARTNOHAND;goto out;}if (rmtp) {ret = update_rmtp(&t.timer, rmtp);if (ret <= 0)goto out;}restart = ¤t_thread_info()->restart_block;restart->fn = hrtimer_nanosleep_restart;restart->nanosleep.clockid = t.timer.base->clockid;restart->nanosleep.rmtp = rmtp;restart->nanosleep.expires = hrtimer_get_expires_tv64(&t.timer);ret = -ERESTART_RESTARTBLOCK;out:destroy_hrtimer_on_stack(&t.timer);return ret;}
Next, let's take a look at the implementation code of do_nanosleep. It first sets the timer callback function to hrtimer_sleeper through the hrtimer_sleeper function, and stores the task_struct structure pointer of the current process in the task field of the hrtimer_sleeper:
void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, struct task_struct *task){sl->timer.function = hrtimer_wakeup;sl->task = task;}EXPORT_SYMBOL_GPL(hrtimer_init_sleeper);static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode){hrtimer_init_sleeper(t, current);
Then, start the timer in a do/while loop, pause the current process, and wait for the timer or other events to wake up the process. The implementation of the loop body here is weird. It uses the hrtimer_active function to indirectly determine whether the timer expires. If hrtimer_active returns false, it indicates that the timer has expired, and then sets the task field of the hrtimer_sleeper structure to null, this leads to the completion of the loop body. The other condition is that the current process receives a signal event. Therefore, do_nanosleep returns true if it exits because the timer expires. Otherwise, false is returned, the above hrtimer_nanosleep uses this feature to determine its return value. The following code shows the do_nanosleep loop body:
do {set_current_state(TASK_INTERRUPTIBLE);hrtimer_start_expires(&t->timer, mode);if (!hrtimer_active(&t->timer))t->task = NULL;if (likely(t->task))schedule();hrtimer_cancel(&t->timer);mode = HRTIMER_MODE_ABS;} while (t->task && !signal_pending(current));__set_current_state(TASK_RUNNING);return t->task == NULL;}
In addition to hrtimer_nanosleep, the high-precision timer system also provides several APIs for delaying/suspending processes:
- Schedule_hrtimeout enables the current process to sleep at a specified time, using the clock_monotonic timing system;
- Schedule_hrtimeout_range specifies the time range for the current process to sleep. The clock_monotonic timing system is used;
- Schedule_hrtimeout_range_clock specifies the time range for the current process to sleep. You can specify the timing system on your own;
- Usleep_range enables the current process to sleep at a specified number of nuances, using the clock_monotonic timing system;
The call relationships between them are as follows:
Figure 2.2 schedule_hrtimeout_xxxx functions all the implementations will go to the schedule_hrtimeout_range_clock function. Note that before calling schedule_hrtimeout_xxxx series functions, it is best to use the set_current_state function to set the state of the process. Before these functions are returned, the status of the city is set to task_running again. If you set the status to task_uninterruptible in advance, they will ensure that the latency required before the function is returned. If you set the status to task_interruptible in advance, it is possible that other signals will wake up the process before it expires, resulting in function return. The implementation principle of the schedule_hrtimeout_range_clock function is basically the same as that of the previous do_nanosleep function. You can refer to the kernel code at: kernel/hrtimer. C.