A familiar and unfamiliar udelay

Last Update:2015-07-05 Source: Internet

Author: User

Tags mul sleep function

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Application CSDN Blog certification expert through, really let me flattered, I still have this self-knowledge, and experts Daniel these words still have a long distance.
But certification by giving themselves a motivation, on the blog to share more of their own learning, and we learn to communicate.

The time delay function is often used in kernel development, and the most familiar is Mdelay msleep. Although often used, but the specific implementation is not understood, today to study.

These 2 functions have a vastly different implementation.

Msleep implementation is based on scheduling, delay during the call schedule_timeout generation scheduling, waiting time expires after the continuation of the operation, the function is implemented in the KERNEL/TIMER.C.

Because the Linux kernel is not a real-time system, the msleep involved in scheduling must not be accurate.

today do not elaborate msleep, there is time to analyze it, today focus on learning Mdelay.
Mdelay is the most time-delay function to use. Its implementation is busy cycle, using the kernel Loop_peer_jiffy, the delay is more accurate than the msleep.

Mdelay Ndelay are all based on Udelay. In Include/linux/delay.h, the following:

#ifndef max_udelay_ms#define Max_udelay_ms   5#endif#ifndef mdelay#define mdelay (n) (    (__builtin_constant_p (n ) && (n) <=max_udelay_ms)? Udelay ((n) *1000):     ({unsigned long __ms= (n), while (__ms--) udelay (1000);})) #endif #ifndef ndelaystatic inline void ndelay (unsigned long x) {    Udelay (div_round_up (x, 1000));} #define NDELAY (x) ndelay (x) #endif # define DIV_ROUND_UP (N,d) (((n) + (d)-1)/(d))

GCC's built-in function, __builtin_constant_p, is used to determine whether N is a compile-time constant, and if n is a const, returns 1, otherwise returns 0.
Mdelay implementation, if the parameter is constant, and less than 5, then call Udelay directly, indicating that the Udelay maximum support 5000us delay. Otherwise, the loop calls the Udelay to achieve the delay.

The Ndelay implementation can be seen to be very imprecise and is computed by calling Udelay. So the ndelay is at least 1us delayed.

So let's look at Udelay implementation. This discussion is based on the implementation of the ARM processor architecture, Udelay implemented in Arch/arm/include/asm/delay.h.

#define Max_udelay_ms 2#define udelay (n)                               (__builtin_constant_p (n)?                        ((n) > (Max_udelay_ms *) __bad_udelay ():                  __const_udelay ((N) * ((2199023u*hz) >>11)):          __udelay ( N))

will eventually call __const_udelay or __udelay,2 implementation in ARCH/ARM/LIB/DELAY.S, as follows:

. LC0:      . Word   Loops_per_jiffy. LC1:      . Word   (2199023*hz) >>11/* * r0  <= 2000 * lpj <= 0x01ffffff (ma X. 3355 bogomips)  * HZ  <= 1000 */entry (__udelay)         LDR R2,. lc1        mul r0, R2, R0entry (__const_udelay)               @ 0 &L t;= r0 <= 0x7fffff06        mov r1, #-1        LDR R2,. lc0        LDR R2, [R2]        @ max = 0x01ffffff        Add R 0, R0, R1, LSR #32 -14        mov r0, r0, LSR #14     @ max = 0x0001ffff        Add R2, R2, R1, LSR #32 -10        mov r2, R2, LSR #10     @ max = 0x00007fff  &N Bsp      Mul r0, R2, r0      @ max = 2^32-1        Add r0, R0, R1, LSR #32 -6  &nbSp     movs    r0, r0, LSR #6         MOVEQ   PC, LR above this assembly operation rule can be summed up as the following formula, n for the incoming parameter Number: Loops = ((((N * ((2199023*hz) >>11) >>) * (Loops_per_jiffy >>)) >> 6 /* * Loop s = r0 * HZ * loops_per_jiffy/1000000 * * Oh, if only we had a cycle counter... */@ Delay routineentry (__ Delay)         Subs    r0, R0, #1         BHI __delay        mov pc, Lrendproc (__udelay) endproc (__const_udelay) endproc (__delay)

The implementation of the __udelay takes advantage of the Loop_per_jiffy, which is the kernel global variable, which is computed when the kernel starts, and is calculated by calling Calibrate_delay, which represents the number of loops in a jiffy of the processor.
Calibrate-delay implementation before writing an article to analyze, the link is as follows:
http://blog.csdn.net/skyflying2012/article/details/16367983

Loop_per_jiffy kernel conversion to bogomips feedback to the user, we execute the command cat/proc/cpuinfo, we can see bogomips, characterizing the processor to execute millions of instructions per second, is a CPU performance test number.

according to the above assembly implementation can be seen, first calculate the delay US required loop number, and finally call __delay cycle decrement completion delay, it is clear that the Udelay implementation is ultimately a processor busy loop.

a detail is needed here, and the Calibrate_delay implementation is also implemented by calling __delay, which is loops_per_jiffy.
The Loops_per_jiffy unit is __delay, which means that a loop is a __delay.
__delay implementation is the parameter has been subs decrement, repeatedly jump.
so I understand that a loop is an arm decrement instruction + jump instruction.

But the biggest question for __udelay is what it means to have a strange number (2199023*hz) >>11, and what is the meaning of the various shifts in the computational rules implemented in the Assembly.

The first most common way, with the help of Loop_per_jiffy, is to calculate the loop number based on the delay us, and the formula should be the same as in the assembly note:
Loops = n * HZ * loops_per_jiffy/1000000
Hz to characterize the number of cores per second Jiffy, hz*loops_per_jiffy/1000000 represents the number of loops in 1us.

Find A variety of data to find out why, for the processor this formula has a great flaw, if the processor does not have a floating-point processing unit, that is, non-floating-point processor (integer processor), run, this formula calculation can easily become 0.
Because the divisor is 1000000 great, Loops_per_jiffy * hz/1000000=0. Inability you want to delay how many microseconds, always for 0.
The kernel solution is that, except that 1000000 becomes multiply by 1/1000000, to maintain precision, 1/1000000 moves left 30 bits first, and becomes
(1/1000000) <<30 = 2^30/1000000 = 2199023u>>11

This will clear the source of (2199023*hz) >>11.

The recurring shift in the assembly is to move back the 30 bits that move the 2199023u>>11 implementation to the left. Given the overflow, it is divided into >>14, >>10, >>6, and finally the equivalent of >>30.

Here we thoroughly understand the ingenious loops calculation formula of the assembler, and understand the Udelay implementation method of arm.

It can be seen that the kernel does not directly divide the Big Data division operations, but instead uses the shift operation, and I understand that there are two possible reasons:
(1) If the problem encountered above, the accuracy problem, the divisor is very large, the results of the calculation may appear 0.
(2) prior to the development of the driver encountered a situation, the kernel compile-time compiler for division will be replaced with the Gcc.so library mathematical operation function __aeabi_ldivmod, but kernel compilation does not depend on any library, so there will be a compilation error. Instead, you can use the kernel-provided do_div substitution.

Udelay analysis is here, 2 little inspiration:
(1) The delay function implementation of the kernel is indeed a busy loop. is different from the sleep function.
(2) When using the division operation in kernel development, consider clearly.

A familiar and unfamiliar udelay

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More