Introduction to timer subsystem in linux subsystem set (2)

Source: Internet
Author: User

Introduction to timer subsystem in linux subsystem set (2)

I have not translated an article in my blog. Although the English level is limited, it is also a good choice to translate an article slowly using google translation. Let's take a look at the hrtimer documentation.


Hrtimers-high-precision kernel timers Subsystem
----------------------------------------------------
This patch introduces a new high-precision kernel timers subsystem.
Someone may ask: we already have a timer subsystem (kernel/timers. c). Why do we need two timer subsystems? After a long-term round-trip attempt to integrate high-resolution and high-precision functions into the existing timer framework, and after various tests on the implementation of high-precision timers, we finally come to the conclusion that timer's time wheel code is basically not suitable for this practice. We initially did not believe this (there must be a solution), and The concurrency cost was a considerable effort to integrate into the timer time wheel, but we failed. Afterwards, it seems that such integration is difficult/impossible for many reasons:
1. Forced processing of low-resolution and high-resolution timers results in a lot of compromise and a lot of # ifdef Macros in the same way. Timers. c code is very strict about jiffies and takes 32 digits, and has been honed and optimized to a relatively narrow use case for many years, so even small extensions, it is also easy to destroy the concept of time wheel, resulting in worse compromise. Timer time wheel code is very good and compact, and there is no problem in current use, but it is not suitable for high-resolution tmers.
2. the unpredictable overhead of cascade O (n) may lead to a more complex latency for processing high-precision timers, thus reducing robustness. Such a design may cause a considerable amount of time inaccuracy. Cascade is a basic attribute of the timer time wheel concept. It cannot be designed to reduce timer. c code in an unacceptable way.
3. At present, the implementation of the timer time wheel on the posix timer subsystem has introduced our experience of processing CLOCK_REALTIME timers settimeofday or ntp time in a rather complex manner. Further examples: the timer time wheel data structure is too strict for high-precision timers.
4. timer time wheel code is a deterministic "timeout" Use Case. This timeout is usually set to overwrite various I/O paths, such as network and block I/O. The vast majority of timers never expire and have very few cascade operations. Because the expected correct events come in time, they can be deleted in the timer time wheel before they need to be processed. Therefore, the timeout users can accept the trade-off between the granularity and precision of the timer time wheel, and expect the overhead of the timer sub-system to be close to 0 to a large extent. Accuracy is not a core objective for them-in fact, most of the timeout values used are hot spots. For them, at most it is necessary to ensure the completion of timeout, so this should be the cheapest and inconspicuous.
The main users of precision timer are user space programs. They use interfaces of nanosleep, posix timers, and itimer. In addition, in the kernel, users like drivers and subsystems and also use timer events (such as multimedia) to benefit from a separate high-precision timer subsystem.
Although this subsystem does not yet provide a high-precision clock source, the high-precision subsystem can easily expand the high-precision clock function, and the patch already exists and rapidly matures. Another reason is provided for the growing demand for real-time and multimedia applications, as well as for other potential users of precise timers to separate the subsystems of time-out and precision timers.
Another potential benefit is that such separation allows even more special users to optimize the use case for low resolution and low accuracy in the existing timer wheel-once the precise sensitive API is detached from the timer wheel, and migrated to hrtimers. For example, we can reduce the frequency in the timeout subsystem from 100 HZ to HZ (or smaller ).

Implementation Details of the hrtimer Subsystem
----------------------------------------------------
The basic design considerations are:
-Simplicity
-The data structure is not bound to jiffies or any other granularity. All kernel logic works at 64-bit nanosecond resolution-no compromise
-Simplify existing time-related kernel code

Another basic requirement is to directly enter the queue and arrange and activate timers. After reading some possible solutions, such as the base tree and hash, we chose the red/black tree as the basic data structure. The red/black tree can be used as the kernel library and is used in many fields with strict performance requirements such as memory management and file systems. Here, the red and black trees are only used for time sorting. A separate list is used to give the code a quick access to the sort timer without traversing the red and black trees.
(This separate list is very useful for the high-precision clock we will introduce later. In addition, we need to separate the applied and expired queues while keeping the time order intact)
The timer sorting inbound queue is not purely broken for high-precision clock, although it also simplifies the absolute Processing Based on the low-resolution CLOCK_REALTIME timer. Existing implementations require absolute CLOCK_REALTIME timers and complex locks for all additional lists. In case of settimeofday and NTP, and even all the timers cannot go out of the queue, the code changed by timer has to solve these one-on-one problems, and all these must go into the queue again. Timer sorts the queue entry and expiration time, and the Unit Storage in absolute time will generate posix timer to implement all the complicated and poorly scaled code-the clock can be easily set without the need to touch the red/black tree. This also makes processing posix timers easier.
The behavior of each CPu Of the lock and hrtiemers is mostly taken from the existing timer wheel code, because it is very mature and suitable. Code sharing fails because of different data structures. In addition, the hrtimer function is clear over the past month-for example, hrtimer_try_to_cancel () and hrtimer_cancel () [This is roughly equivalent to del_timer () and del_timer_sync ()] -Therefore, they are not directly mapped to, so there is no real potential for code sharing.
Basic data type: the value of time at each time point, absolute or relative, is in a special Nanosecond-level resolution type: ktime_t. In the kernel, the values and operations of ktime_t are expressed through macros and inline functions, and the switching between the hybrid joint type and the 64-bit nanoseconds can be realized. The hybrid union type optimizes a 32-bit CPU conversion. You can select the ktime_t storage format to avoid the performance impact of the 64-bit multiplication and division on the 32-bit CPU. This operation usually requires conversion between the kernel and the user space interface and the storage format provided by the Internal time format. (For more information, see/include/linux/ktime. h ).

Hrtimers-rounding the timer Value
----------------------------------------------------
The hrtimer code will surround the timer event with a lower-resolution clock because it must, otherwise it will not do any manual rounding.
One problem is what resolution value should be returned to the clock_getres () interface described to the user. This will return any actual resolution of a given clock-either low resolution, high resolution, or manual low resolution.

Hrtimers-test and verification
----------------------------------------------------
We used high-resolution clock subsystem verification at the top of hrtimers. In practice, we ran a posix timer test to ensure compliance. We are also experimenting with low-resolution clocks.
The hrtimer patch converts the following kernel functions using hrtimers:
-Nanosleep
-Itimers
-Posix-timers

For the transformation of nano sleep drinking posix timer, the unified nanosleeo and clock_nanosleep are enabled.

The Code has been compiled successfully on the following platforms:
I386, x86_64, ARM, PPC, PPC64, IA64

Hrtimers is also integrated into the-rt tree, along with the implementation of a hrtimers high-resolution clock, so the hrtimers Code has also been tested in a large amount and used in practice.
 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.