Linux kernel "task" timer, kernel thread, system call _

Linux kernel "task" timer, kernel thread, system call __linux

Last Update:2018-07-26 Source: Internet

Author: User

Tags network function semaphore

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, the kernel timer 1. Basic ConceptsIn some scenarios, we need to do some action after a certain time, but do not want to wait and waste the CPU, this time the timer is a very appropriate mechanism. The timer is used to perform a function at a future point in time to accomplish a specific task.
The kernel timer tells the kernel to use specific parameters to invoke a particular function at a specified point in time. The timer is run asynchronously to its registrant, and the task of registering the timer may run on the other processor or may even have exited when the timer is run.
The kernel timer in Linux is implemented based on (soft) interrupts, which means that it is in the interrupt context rather than the process context. There are some principles to follow in a non-process context:
Do not allow access to user space current is meaningless and is therefore not available for sleep or scheduling. You cannot call schedule or some kind of wait_event, nor can you invoke any function that might cause sleep. The semaphore is also not available because the semaphore may cause hibernation. The kernel code can determine whether it is currently in an interrupt context by calling the function In_interrupt (), as long as the return of a non-0 indicates that it is in the interrupt context. The kernel can determine whether scheduling is currently allowed by calling In_atomic (). Situations where scheduling is not allowed include the context in which the interrupt context is in and the spin lock is owned.

Because the timer is executed asynchronously, the timer handler function must be aware of mutual exclusion protection. the 2.linux kernel-supported timer Linux kernel supports two types of timers:
Classic timer: Precision depends on the computer clock interrupt the frequency of the timer. The accuracy of the timer is generally low, which is the precision of 1000/hz Ms. The timer is generated at a fixed frequency, i.e. every 1000/hz MS is produced once. If there is no dynamic clock characteristics, then when the timer expires, there may not be a real timing event, such as the system only added the following timer: 11ms,52ms,78ms expired timer, but the classic timer will be in the 4 multiples of MS (4,8,12 ...), Therefore, the timer expiration time point does not necessarily have timed events occur. High-resolution timer: The accuracy of the classic timer is low, in some cases need more high-precision timer, such as multimedia applications, so the system introduced the type of timer. The timer can happen at any time in nature. There is also a need for a special two-point concept:
Dynamic clock: The cycle clock is activated only when there are tasks that need to be actually performed, otherwise the technology of the cycle clock is disabled. If you need to schedule idle to run, disable the cycle clock, and then enable the cycle clock until the next timer expires or a break occurs. A single trigger clock is a prerequisite for implementing a dynamic clock, because the key feature of the dynamic clock is that it can stop or restart the clock as needed, and the pure cycle clock does not apply to this scenario. Cycle clock: A clock that periodically produces a clock time. From the application, the timer has two main uses:
Timeout: Represents an event that occurs after a certain amount of time, in fact, in most cases when using a timeout, the timeout is not expected to occur, and the timer is often canceled before timing out. In addition, even if not canceled, the timeout event is often not an accurate event, for example, the use of various timeout timers in the network, they are often expressed in the sense that if not at this point in time before ..., you can think ..., this time is often a value of experience or estimates, not accurate time requirements. The classic timer is sufficient on this occasion. Timer: Used to achieve time series, such as playing sound, you need to send data to the sound card regularly, this situation, there is a more stringent requirements, if not at a point in time to send data to the sound card, there will be sound distortion. Use a high-precision timer at this point. The configuration allows Linux to work in the following mode:
High resolution dynamic clock high resolution cycle clock low resolution dynamic clock low resolution cycle clock

3. Low resolution kernel timer

The low resolution timer is the most common kernel timer, and the kernel uses the processor's clock interrupt or any other appropriate periodic clock source as the timer's time baseline. Clock interrupts occur on a regular basis, with Hz times per second. The interrupt corresponding to the timer processing function is generally timer_interrupt, in the process of the function, will eventually be transferred to Do_timer and Update_process_timers. Do_timer will be responsible for the whole system-wide, Global tasks: Update jiffies, Process statistics, and then a function will carry out process statistics, generating TIMER_SOFTIRQ, to provide time awareness to the scheduler.

When the timer expires, the timer is removed from the activation list before the timer processing function is invoked, so you will need to add the timer again if you want to do it again after a period of time after this execution. In an SMP system, the timer function is executed by the CPU that registers it.

The implementation of the kernel timer should meet the following requirements and assumptions:
Timer management must be simplified as much as possible. The design must have a very good scalability when the active timer increases substantially. Most timers expire in seconds or up to a few minutes, while timers with long delays are quite rare. A timer should be running on the same CPU that registers it.

The implementation of the low resolution kernel timer is very ingenious. It is based on a data structure of every-cpu. The base field for Timer_list contains pointers to the structure. If base is NULL, this timer is not invoked to run; Otherwise, this pointer tells which data structure (that is, which CPU) is running it. Whenever the kernel code registers a timer (via Add_timer or Mod_timer), the operation is ultimately performed by Internal_add_timer (in kernel/timer.c), which adds a new timer to the cascading table associated with the current CPU The timer in the bidirectional linked list.
How the cascading table works:
If the timer expires within the next 0 to 255 jiffies, it is added to one of the 256 linked lists for the short timer, and the low 8-bit decision to use expires (that is, the bit that is added to that list by the due time) is appended to that list if it expires in the future longer ( But before the 16,384 jiffies), it was added to one of the 64 linked lists, and the 64 lists were related to the 8-13 bits of expires, and the expires 6 bits determined which linked list to use. Similar techniques have been applied to expires bit 14-19,20-25 and 26-31. If the timer in a longer period of arrival, then put it into the 26-31-bit corresponding list, the specific use of that list depends on (0xffffffff+base->timer_jiffies results of the high 6-bit) The low 8:6:6:6:6 bit groups described here are not particularly accurate (each bit defines the size of the cascading table), and the 6:4:4:4:4 bit groups may also be used depending on the configuration. Here is just a description of its design principle.
When __run_timers is activated, it executes all the pending timers on the current timer tick. If the current jiffies is a multiple of 256, this function hashes the next-level timer list back into the 256-short list, and may also cascade the other levels of timers based on the bit-bit partitioning on the jiffies. It is based on the fact that the cascading table is organized as a timer-expiration jiffies, while Base->timer_jiffies records the current jiffies, so if __run_timers is executed, it first takes base->timer_ Jiffies the lowest number of bits to determine the timer in the linked list and execute them, and if Base->timer_jiffies is 0, start checking the next bit group. For all bit groups except the lowest bit group, the processing logic is: If the bit group is not 0, the timer in the linked list determined by that bit group is added to the system again, and the next bit group is no longer checked, otherwise the check checks the next bit group.

This technique uses a lot of the chain header, takes up some extra memory, but satisfies the requirements and assumptions of the kernel timer well. But because it is affected by other interruptions, it is not absolutely accurate. The graphical schematic diagram of this technique is as follows:

4. Low resolution timer related APIstruct timer_list
{
/* ... */
unsigned long expires;
void (*function) (unsigned long);
unsigned long data;
};
All the elements of Timer_list are not listed here, and only the more important ones are listed:
Expires: Timeout time in jiffies. Function: Timer processing functions data: Parameters of the timer processing function The structure must be initialized before it can be used, and initialization ensures that all fields are set correctly, including those that are not visible to the user. The timer data structure is initialized with Init_timer, or the static data structure can be initialized by assigning the timer_initializer to it.
void Init_timer (struct timer_list *timer);
Completes initialization of the timer data structure for the dynamic application.
struct Timer_list timer_initializer (_function, _expires, _data);
Completes initialization of the static timer data structure
void Add_timer (struct timer_list * timer);
Adding timers to the system
int Del_timer (struct timer_list * timer);
Remove the timer from the system.
After initialization, call Add_timer to modify the three domains of the timer data structure listed before adding the timer to the system.
int Mod_timer (struct timer_list *timer, unsigned long expires);
Updates the timeout for a timer.
int Del_timer_sync (struct timer_list *timer);
The function is similar to Del_timer, but it guarantees that when it returns, the timer function does not run on any CPU. The use of Del_timer_sync in SMP architectures avoids race behavior. In most cases, the use of del_timer_sync should be given priority. For this function, be aware:
If the function is called in a non-atomic context, the function may hibernate if it is invoked in the atomic context, it will remain busy waiting for it to be invoked in the interrupt context call it cannot hold the lock that might block the timer handler to complete processing. When owning a lock, it is especially careful to call Del_timer_sync, because if the timer function attempts to acquire the same lock, it can cause a deadlock. If this function is called, the timer handler function is required to not re-register itself. If the timer function will re-register itself, special processing is required to ensure that the timer is not re-register when the function is called to remove the timer. You can set a "close" flag and then check this flag by the timer handler function to ensure that this requirement is met. int timer_pending (const struct timer_list * timer);

Returns TRUE or false to indicate whether the timer is currently being scheduled to execute. 5. High resolution timer Conventional low resolution kernel timer for "timeout" of the application of the optimization, while the implementation of the low resolution timer is closely connected with the jiffies, and therefore can not be applied to the need for high-precision timing of the occasion, This linux provides high resolution timer Hrtimer.

The

Hrtimer is built on the PER-CPU clock event device, and if only one global clock event device exists in the SMP system, such a system cannot support high-resolution timers. The high resolution timer needs to be supported by the CPU's local clock event device, which is per-cpu. To support Hrtimer, the kernel needs to configure Config_high_res_timers=y. Hrtimer has two modes of operation: Low precision mode (low-resolution mode) and high precision mode (high-resolution). Although the Hrtimer subsystem is prepared for High-precision timer, the system may dynamically switch to different precision clock source devices during the operation, so Hrtimer must be able to switch freely between low precision mode and high precision mode. Because the low precision mode is based on high precision mode, even if the system supports only low precision mode, some code that supports high precision mode will still be compiled into the kernel. The high resolution timer is based on the red and black tree. It is independent of the cycle clock and uses nanoseconds as a time unit. High-resolution timers can be based on two kinds of clocks: Monotone clock clock_monotonic: Starting from 0 monotonically increasing real-time clock clock_realtime: System actual time, may jump, such as system time change. The high resolution timer's clock event device in high resolution mode causes an interrupt when the timer expires, which is handled by Hrtimer_interrupt. If a high-resolution clock is not provided, the expiration of the high resolution timer is done by hrtimer_run_queues. In the latest code, if the processing function of the high-precision timer does not return Hrtimer_norestart, the timer frame automatically restarts the high-precision timer when the timer processing function is processed, and the relevant code fragment is as follows:

        Restart = fn (timer);
        ...
         * * Note:we clear the CALLBACK bit after Enqueue_hrtimer and
         * We did not reprogramm the event hardware. Happens either in
         * Hrtimer_start_range_ns ()/In Hrtimer_interrupt ()
         /
        if (restart!=) {
                bug_on (timer->state!= hrtimer_state_callback);
                Enqueue_hrtimer (timer, base);

6. High resolution timer data structure and APIHrtimer_bases is the core data structure that implements Hrtimer, through which Hrtimer_bases,hrtimer can manage all the timer that hangs on each CPU. The timer list on each CPU is no longer implemented using the multilevel list in timer wheel, but is managed using the red-black tree (Red-black). The definition is as follows:

DEFINE_PER_CPU (struct hrtimer_cpu_base, hrtimer_bases)

The high resolution clock uses the structural body hrtimer_clock_base to represent the clock base. Includes: cpu_base: The clock base is included in the CPU clock infrastructure get_time: Get the time function resolution: the resolution of the timer active: The root of the red-black tree-related information structure hrtimer_cpu_ Base represents the base per CPU clock. Mainly include:
Expires_next: The absolute time of the next event that will expire hres_active: whether the high precision mode structure Hrtimer is enabled for timed high-precision timers. Includes: node: Used to maintain the timer in the red and black tree _softexpires: Absolute time function with timer expiration: function to execute when timer expires base: Point to Clock Base state: status Hrtimer_state_ INACTIVE: Timer is inactive hrtimer_state_enqueued: Timer is queued on the clock basis, waiting for expiration Hrtimer_state_callback: The callback function that is due to execute Hrtimer_state_ MIGRATE: Was migrated to another CPU.

void Hrtimer_init (struct Hrtimer *timer, clockid_t which_clock, enum Hrtimer_mode mode); initialization timer

int Hrtimer_cancel (struct hrtimer *timer); Try canceling the timer, and if the timer is being executed, wait for it to finish.
int Hrtimer_try_to_cancel (struct hrtimer *timer); Try canceling the timer, and if the timer is active then return 1 if it is inactive, return 0 if it is being executed and return-1, The first two cases will be canceled, the last case timer is not canceled.
int Hrtimer_start (struct Hrtimer *timer, ktime_t Tim, const enum Hrtimer_mode mode); Start timer on current CPU
U64 hrtimer_forward (struct Hrtimer *timer, ktime_t now, ktime_t interval); Time out the timer after the now interval second, kernel thread 1. Basic Concepts

Kernel threads are actually processes that are started directly by the kernel itself. Some of the work in the current system is done by kernel threads: periodically synchronizing modified Memory pages with page source block devices if the memory pages are rarely used, the write Swap Zone management delay action implements the file system's transaction log execution soft interrupt (KSOFTIRQD) kernel thread may be used in two scenarios:
Start a kernel thread and wait until it wakes up to complete a service to start a periodically running kernel thread to check the usage of specific resources and make appropriate reflection kernel threads are generated by the kernel itself, and are characterized by:
They execute in the kernel state, not the user state. They can access only the kernel portion of the virtual address space (above all addresses in task_size), but they cannot access user space.

The Task_struct process descriptor contains two fields related to the process address space mm, active_mm, for ordinary user processes, mm points to the user space portion of the virtual address space, and for kernel threads, MM is null. ACTIVE_MM is primarily used for optimization, because kernel threads are not related to any particular user layer process, the kernel does not need to switch the user layer portion of the virtual address space, leaving the old settings. Since the kernel thread may have been executed by any user-layer process before, therefore, the content of the user space is essentially random, the kernel thread must not modify its content, so the MM set to null, and if the switch out is the user process, the kernel of the original process of the MM store in the new kernel thread of the active_mm. If the kernel threads run after the same process as before, the kernel does not need to modify the User space Address table, the information in the TLB is still valid, and only if the process executed after the kernel thread is not the same as the previous user-layer process, you need to switch and clear the corresponding TLB data. 2. Signal processing of kernel threads the kernel thread and the Daemonize user thread block all signals by default. If the kernel thread wants to allow the signal to be sent over, it needs to invoke allow_signal to allow the signal to be sent to itself. However, the kernel thread signal processing and user space is different, the user thread signal processing program is automatically called by the system, its entrance is do_signal, the function can be executed when the user thread is again scheduled. But for the kernel thread, it's not going to go to that function, which means that the signal processing of kernel threads must be in a different way. Simply put, the kernel thread needs to handle the signal sent to it as follows:

do {/
    * The process procedure of the kernel thread *

3. Implementation means the relevant API

Kernel threads can be implemented in two ways: passing a function to kernel_thread the more common method of creating a kernel is the auxiliary function kthread_create, which creates a new kernel thread. The original thread is stopped and needs to be started with wake_up_process. Or using Kthread_run, unlike Kthread_create, it wakes up the new thread as soon as it is created.

Long kernel_thread (int (FN) (void *), void *arg, unsigned long flags)

Parameters and Significance: fn: pointer arg for the function to be executed: function parameter flags: Thread flags struct task_struct *kthread_create (int (*THREADFN) (void *data), void *data, cons T Char namefmt[], ...)
Its parameters and significance:
Threadfn:thread's entry function DATA:THREADFN the name of the parameter namefmt:thread the function is used to create a kernel thread that was initially stopped. When awakened, THREADFN is invoked and data as its arguments, THREADFN can end in two ways:
Call Do_exit directly to exit or call Kthread_stop Kthread_run (THREADFN, data, namefmt, ...) on that thread somewhere.
It is a macro that wakes up immediately after creating a thread. It actually calls the wake_up_process immediately after the call is over Kthread_create
int kthread_stop (struct task_struct *k)
This function is used to stop a thread created by Kthread_creat. If thread makes its own call to Do_exit, the thread_stop cannot be invoked on the thread
void Kthread_bind (struct task_struct *k, unsigned int cpu)
Bind the thread you just created to the specified CPU. The CPU is not necessarily online, but the thread must be in a stopped state when the function is invoked (that is, the kthread_create is just finished).
third, system call 1. Basic ConceptsSystem call: System call refers to a set of "special" interfaces that the operating system provides to application calls. Applications can use this set of "special" interfaces to obtain services provided by the operating system kernel.
Logically, a system call can be viewed as an interface between the kernel and the user-space program-the system call passes the application's request to the kernel, invokes the corresponding kernel function to complete the required processing, and returns the processing result to the application.
The fundamental reason system services need to be used by system calls to provide user space is to "protect" the system, prevent malicious users from destroying the system, or damage the system because of carelessness. The special point of the system call is to specify the location of the user process into the kernel.

The Linux system provides only the most basic and useful system calls that can be viewed through the Man 2 Syscalls command, or in the./include/linux/syscalls.h source file. 2. Function of System call

System calls complete the following tasks primarily:

Process management time Operation signal Processing dispatch module file system memory management interprocess communication network function System Information and setup system security

From the other side, no matter what task you accomplish, you can belong to the following categories:

Control hardware-system calls are often used as abstract interfaces for hardware resources and user space, such as Write/read calls used to read and write files. Set the system state or read kernel data-because system calls are the only means of communication between user space and the kernel, the user sets the system state, such as on/off a kernel service (setting a kernel variable), or when reading kernel data must be called through the system. Process Management-Create, execute, and obtain information about the status of a process.

System services are used because:
Some services must obtain kernel data, such as kernel data that some services must obtain, such as interrupts or system time. From a security standpoint, services provided in the kernel are no doubt more secure than user space, and are difficult to access illegally. In terms of efficiency, the implementation of services in the kernel avoids the need to pass data back and forth to the user space and protect the scene, so the efficiency is often much higher than in the user space. For example, httpd and other services. If the kernel and user space need to use this service, then it is best to implement in kernel space, such as random number generation. 3. The relationship with C library

In most cases, the application directly using the API in C library rather than system calls, for C library, some C library provides functionality is completely done by the user state, some to use a system call to implement, and some to use more than one system call to achieve. 4. Reboot system call

If the system call being executed is interrupted, the kernel needs to notify the caller that the system call is interrupted, and the-EINTR error is returned, and it is important to note that the interrupt here refers to the interruption of the signal sent to the process by the system call, rather than the general interruption (hardware interruption, The soft interrupt itself does not cause the system call to return-EINTR, otherwise the system call is interrupted and becomes normal. ）。 However, this increases the caller's workload, it must check the return value, and if it returns the error, the system call will be restarted. The system call can be automatically restarted when the signal is interrupted by setting the Sa_restart identity of the signal. 5. Implementation of System call

When a system call is initiated, a system call number is passed to the kernel, a system call number corresponds to a system call, and the CPU then enters kernel mode. Regardless of how the system call is implemented (for example, through the X86 int $0x80 or through the sysenter instruction), the kernel gets a system call number, executes the corresponding system call based on the system call number, and returns an integer value to the user program after the execution completes, 0 indicates a successful system call, and a negative number indicates failure. System call Processing: Save the contents of most registers in the kernel state call the appropriate service program to exit the system caller: Load registers with values stored in the kernel stack and switch back to User state Linux to define Nr_syscalls system calls. You can pass parameters to a system call, but pass parameters that meet two requirements:
Each parameter length cannot exceed 6 of the number of register length parameters, because 6 registers are used to pass the parameter if the argument exceeds 6, a register is used to hold the memory area where the values of the parameters reside. If you use the C library, users don't care about these details. The kernel also performs some validation checks, such as parameter verification, before executing the system call. If a system call returns a large amount of data to its caller, the data must be exchanged through the memory area specified by the caller, which must be an area accessible to the caller.
The system call is performed in the kernel, but it is not a purely kernel thread. It only runs on behalf of the user process in the kernel, so it can access much of the process's information (such as the current structure-the control structure of the ongoing process), and it can be preempted by other processes (when returned from the system call, judged by the System_call function to be scheduled), can hibernate, can also receive signals and so on. It should be noted that after the system call is complete, the kernel will have a dispatch once it returns or says the control is returned to the user process initiating the call. If a higher priority process is found or the time slice of the current process is exhausted, the high-priority process is selected or the process is selected to run again. Aside from the need to consider the rescheduling, then the kernel needs to check whether there is a suspended signal, if the current process has been found to suspend the signal, then also need to return to the user space processing signal processing routines (in user space), and then back to the kernel, back to the user space, some trouble but this iterative process is necessary.
Calling performance issues

System calls involve switching back and forth between user space and kernel space, which takes some extra time. This time is acceptable in most cases, and if the application is high on performance, but wants to use the services provided by the system, you can put the program into the kernel.

The

can use Ptrace or strace to track system calls.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More