A familiar and unfamiliar udelay

Source: Internet
Author: User
Tags mul sleep function

Application CSDN Blog certification expert through, really let me flattered, I still have this self-knowledge, and experts Daniel these words still have a long distance.
But certification by giving themselves a motivation, on the blog to share more of their own learning, and we learn to communicate.

The time delay function is often used in kernel development, and the most familiar is Mdelay msleep. Although often used, but the specific implementation is not understood, today to study.

These 2 functions have a vastly different implementation.


Msleep implementation is based on scheduling, delay during the call schedule_timeout generation scheduling, waiting time expires after the continuation of the operation, the function is implemented in the KERNEL/TIMER.C.

Because the Linux kernel is not a real-time system, the msleep involved in scheduling must not be accurate.

today do not elaborate msleep, there is time to analyze it, today focus on learning Mdelay.
Mdelay is the most time-delay function to use. Its implementation is busy cycle, using the kernel Loop_peer_jiffy, the delay is more accurate than the msleep.

Mdelay Ndelay are all based on Udelay. In Include/linux/delay.h, the following:

#ifndef max_udelay_ms#define Max_udelay_ms   5#endif#ifndef mdelay#define mdelay (n) (    (__builtin_constant_p (n ) && (n) <=max_udelay_ms)? Udelay ((n) *1000):     ({unsigned long __ms= (n), while (__ms--) udelay (1000);})) #endif #ifndef ndelaystatic inline void ndelay (unsigned long x) {    Udelay (div_round_up (x, 1000));} #define NDELAY (x) ndelay (x) #endif # define DIV_ROUND_UP (N,d) (((n) + (d)-1)/(d))

GCC's built-in function, __builtin_constant_p, is used to determine whether N is a compile-time constant, and if n is a const, returns 1, otherwise returns 0.
Mdelay implementation, if the parameter is constant, and less than 5, then call Udelay directly, indicating that the Udelay maximum support 5000us delay. Otherwise, the loop calls the Udelay to achieve the delay.

The Ndelay implementation can be seen to be very imprecise and is computed by calling Udelay. So the ndelay is at least 1us delayed.


So let's look at Udelay implementation. This discussion is based on the implementation of the ARM processor architecture, Udelay implemented in Arch/arm/include/asm/delay.h.

#define Max_udelay_ms 2#define udelay (n)                               (__builtin_constant_p (n)?                        ((n) > (Max_udelay_ms *) __bad_udelay ():                  __const_udelay ((N) * ((2199023u*hz) >>11)):          __udelay ( N))

will eventually call __const_udelay or __udelay,2 implementation in ARCH/ARM/LIB/DELAY.S, as follows:

. LC0:      . Word   Loops_per_jiffy. LC1:      . Word   (2199023*hz) >>11/* * r0  <= 2000 * lpj <= 0x01ffffff (ma X. 3355 bogomips)  * HZ  <= 1000 */entry (__udelay)         LDR R2,. lc1        mul r0, R2, R0entry (__const_udelay)               @ 0 &L t;= r0 <= 0x7fffff06        mov r1, #-1        LDR R2,. lc0        LDR R2, [R2]        @ max = 0x01ffffff        Add R 0, R0, R1, LSR #32 -14        mov r0, r0, LSR #14     @ max = 0x0001ffff        Add R2, R2, R1, LSR #32 -10        mov r2, R2, LSR #10     @ max = 0x00007fff  &N Bsp      Mul r0, R2, r0      @ max = 2^32-1        Add r0, R0, R1, LSR #32 -6  &nbSp     movs    r0, r0, LSR #6         MOVEQ   PC, LR above this assembly operation rule can be summed up as the following formula, n for the incoming parameter Number: Loops = ((((N * ((2199023*hz) >>11) >>) * (Loops_per_jiffy >>)) >> 6 /* * Loop s = r0 * HZ * loops_per_jiffy/1000000 * * Oh, if only we had a cycle counter... */@ Delay routineentry (__ Delay)         Subs    r0, R0, #1         BHI __delay        mov pc, Lrendproc (__udelay) endproc (__const_udelay) endproc (__delay)


The implementation of the __udelay takes advantage of the Loop_per_jiffy, which is the kernel global variable, which is computed when the kernel starts, and is calculated by calling Calibrate_delay, which represents the number of loops in a jiffy of the processor.
Calibrate-delay implementation before writing an article to analyze, the link is as follows:
http://blog.csdn.net/skyflying2012/article/details/16367983

Loop_per_jiffy kernel conversion to bogomips feedback to the user, we execute the command cat/proc/cpuinfo, we can see bogomips, characterizing the processor to execute millions of instructions per second, is a CPU performance test number.

according to the above assembly implementation can be seen, first calculate the delay US required loop number, and finally call __delay cycle decrement completion delay, it is clear that the Udelay implementation is ultimately a processor busy loop.

a detail is needed here, and the Calibrate_delay implementation is also implemented by calling __delay, which is loops_per_jiffy.
The Loops_per_jiffy unit is __delay, which means that a loop is a __delay.
__delay implementation is the parameter has been subs decrement, repeatedly jump.
so I understand that a loop is an arm decrement instruction + jump instruction.

But the biggest question for __udelay is what it means to have a strange number (2199023*hz) >>11, and what is the meaning of the various shifts in the computational rules implemented in the Assembly.

The first most common way, with the help of Loop_per_jiffy, is to calculate the loop number based on the delay us, and the formula should be the same as in the assembly note:
Loops = n * HZ * loops_per_jiffy/1000000
Hz to characterize the number of cores per second Jiffy, hz*loops_per_jiffy/1000000 represents the number of loops in 1us.

Find A variety of data to find out why, for the processor this formula has a great flaw, if the processor does not have a floating-point processing unit, that is, non-floating-point processor (integer processor), run, this formula calculation can easily become 0.
Because the divisor is 1000000 great, Loops_per_jiffy * hz/1000000=0. Inability you want to delay how many microseconds, always for 0.
The kernel solution is that, except that 1000000 becomes multiply by 1/1000000, to maintain precision, 1/1000000 moves left 30 bits first, and becomes
(1/1000000) <<30 = 2^30/1000000 = 2199023u>>11


This will clear the source of (2199023*hz) >>11.

The recurring shift in the assembly is to move back the 30 bits that move the 2199023u>>11 implementation to the left. Given the overflow, it is divided into >>14, >>10, >>6, and finally the equivalent of >>30.

Here we thoroughly understand the ingenious loops calculation formula of the assembler, and understand the Udelay implementation method of arm.

It can be seen that the kernel does not directly divide the Big Data division operations, but instead uses the shift operation, and I understand that there are two possible reasons:
(1) If the problem encountered above, the accuracy problem, the divisor is very large, the results of the calculation may appear 0.
(2) prior to the development of the driver encountered a situation, the kernel compile-time compiler for division will be replaced with the Gcc.so library mathematical operation function __aeabi_ldivmod, but kernel compilation does not depend on any library, so there will be a compilation error. Instead, you can use the kernel-provided do_div substitution.


Udelay analysis is here, 2 little inspiration:
(1) The delay function implementation of the kernel is indeed a busy loop. is different from the sleep function.
(2) When using the division operation in kernel development, consider clearly.




Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

A familiar and unfamiliar udelay

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.