「 」 Brannan's Guide to Inline Assembly

Source: Internet
Author: User
Document directory
  • By Brennan "Bas" Underwood

Brannan's Guide to Inline Assemblyby brnan "Bas" Underwood

Document version 1.1.2.2

OK. this is meant to be an introduction to inline assembly under DJGPP. DJGPP is based on GCC, so it uses the AT&T/UNIX syntax and has a somewhat unique method of inline assembly. I spent sends hours figuring some of this stuff out and toldInfoThat I hate it, please times.

Hopefully if you already know Intel syntax, the examples will be helpful to you. I 've put variable names, register names and other literals inBold type.

The Syntax

So, DJGPP uses the AT&T assembly syntax. What does that mean to you?

  • Register naming:
    Register names are prefixed with "%". To referenceEax:

    AT&T:  %eaxIntel: eax
  • Source/Destination Ordering:
    In AT&T syntax (which is the UNIX standard, BTW) the source isAlwaysOnLeft, And the destination isAlwaysOnRight.
    So let's loadEbxWith the value inEax:

    AT&T:  movl %eax, %ebxIntel: mov ebx, eax
  • Constant value/immediate value format:
    You must prefix all constant/immediate values with "$ ".
    Let's loadEaxWith the address of the "C" variableBooga, Which is static.

    AT&T:  movl $_booga, %eaxIntel: mov eax, _booga

    Now let's loadEBXWith0xd00d:

    AT&T:  movl $0xd00d, %ebxIntel: mov ebx, d00dh
  • Operator size specification:
    You must suffix the instruction with oneB,W, OrLTo specify the width of the destination register asByte,WordOrLongword. If you omit this, GAS (GNU extends ER) will attempt to guess.You don't have want GAS to guess, and guess wrong!Don't forget it.

    AT&T:  movw %ax, %bxIntel: mov bx, ax

    The equivalent forms for Intel isByte PTR,Word PTR, AndDWORD PTR, But that is for when you are...

  • Referencing memory:
    DJGPP uses 386-protected mode, so you can forget all that real-mode addressing junk, including the restrictions on which register has what default segment, which registers can be base or index pointers. now, we just get 6 general purpose registers. (7 if you useEbp, But be sure to restore it yourself or compile-Fomit-frame-pointer.)
    Here is the canonical format for 32-bit addressing:

    AT&T:  immed32(basepointer,indexpointer,indexscale)Intel: [basepointer + indexpointer*indexscale + immed32]

    You coshould think of the formula to calculate the address:

      immed32 + basepointer + indexpointer * indexscale

    You don't have to use all those fields, but youDoHave to have at least 1 of immed32, basepointer and youMustAdd the size suffix to the operator!
    Let's see some simple forms of memory addressing:

    • Addressing a participant C variable:

      AT&T:  _boogaIntel: [_booga]

      Note: The underscore ("_") is how you get at static (global) C variables from aggreger.This only works with global variables. Otherwise, you can use extended ASM to have variables preloaded into registers for you. I address that farther down.

    • Addressing what a register points:
      AT&T:  (%eax)Intel: [eax]
    • Addressing a variable offset by a value in a register:
      AT&T: _variable(%eax)Intel: [eax + _variable]
    • Addressing a value in an array of Integers (scaling up by 4 ):
      AT&T:  _array(,%eax,4)Intel: [eax*4 + array]
    • You can also do offsets with the immediate value:
      C code: *(p+1) where p is a char *AT&T:  1(%eax) where eax has the value of pIntel: [eax + 1]
    • You can do some simple math on the immediate value:
      AT&T: _struct_pointer+8

      I assume you can do that with Intel format as well.

    • Addressing a participant char in an array of 8-character records:
      EaxHolds the number of the record desired.EBXHas the wanted Char's offset within the record.

      AT&T:  _array(%ebx,%eax,8)Intel: [ebx + eax*8 + _array]

    Whew. Hopefully that covers all the addressing you'll need to do. As a note, you can putESPInto the address, but only as the base register.

Basic inline assembly

The format for basic inline assembly is very simple, and much like Borland's method.

asm ("statements");

Pretty simple, no? So

asm ("nop");

Will do nothing of course, and

asm ("cli");

Will stop interrupts,

asm ("sti");

Of course enabling them. You can use_ Asm __InsteadAsmIf the keywordAsmConflicts with something in your program.

When it comes to simple stuff like this, basic inline assembly is fine. You can even push your registers onto the stack, use them, and put them back.

asm ("pushl %eax/n/t"     "movl $0, %eax/n/t"     "popl %eax");

(The/n's and/t's are there so. SFile that GCC generates and hands to GAS comes out right when you 've got multiple statements perAsm.)
It's really meant for issuing instructions for which there is no equivalent in C and don't touch the registers.

But if youDoTouch the registers, and don't fix things at the end of yourAsmStatement, like so:

asm ("movl %eax, %ebx");asm ("xorl %ebx, %edx");asm ("movl $0, _booga");

Then your program will probably blow things to hell. This is because GCC hasn't been told that yourAsmStatement clobberedEbxAndEdxAndBooga, Which it might have been keeping in a register, and might plan on using later. For that, you need:

Extended inline assembly

The basic format of the inline assembly stays much the same, but now gets Watcom-like extensions to allow input arguments and output arguments.

Here is the basic format:

asm ( "statements" : output_registers : input_registers : clobbered_registers);

Let's just jump straight to a nifty example, which I'll then explain:

asm ("cld/n/t"     "rep/n/t"     "stosl"     : /* no output registers */     : "c" (count), "a" (fill_value), "D" (dest)     : "%ecx", "%edi" );

The above stores the value inFill_value CountTimes to the pointerDest.

Let's look at this bit by bit.

asm ("cld/n/t"

We are clearing the direction bit ofFlagsRegister. You never know what this is going to be left at, and it costs you all of 1 or 2 cycles.

     "rep/n/t"     "stosl"

Notice that GAS requiresRepPrefix to occupy a line of it's own. Notice also thatStosHasLSuffix to make it moveLongwords.

     : /* no output registers */

Well, there aren't any in this function.

     : "c" (count), "a" (fill_value), "D" (dest)

Here we loadEcxWithCount,EaxWithFill_value, AndEdiWithDest. Why make GCC do it instead of doing it ourselves? Because GCC, in its register allocating, might be able to arrange for, say,Fill_valueTo already be inEax. If this is in a loop, it might be able to preserveEaxThru the loop, and saveMovlOnce per loop.

     : "%ecx", "%edi" );

And here's where we specify to GCC, "you can no longer count on the values you loadedEcxOrEdiTo be valid. "This doesn' t mean they will be reloaded for certain. This is the clobberlist.

Seem funky? Well, it really helps when optimizing, when GCC can know exactly what you're doing with the registers before and after. it folds your assembly code into the code it's generates (whose rules for generation lookRemarkablyLike the above) and then optimizes. it's even smart enough to know that if you tell it to put (x + 1) in a register, then if you don't clobber it, and later C code refers to (x + 1), and it was able to keep that register free, it will reuse the computation. whew.

Here's the list of register loading codes that you'll be likely to use:

a        eaxb        ebxc        ecxd        edxS        esiD        ediI        constant value (0 to 31)q,r      dynamically allocated register (see below)g        eax, ebx, ecx, edx or variable in memoryA        eax and edx combined into a 64-bit integer (use long longs)

Note that you can't directly refer to the byte registers (Ah,Al, Etc.) or the word registers (Ax,Bx, Etc.) when you're loading this way. Once you 've got it in there, though, you can specifyAxOr whatever all you like.

The codesHaveTo be in quotes, and the expressions to load inHaveTo be in parentheses.

When you do the clobber list, you specify the registers as abveWithThe%. If you write to a variable, youMustInclude"Memory"As one of the clobbered. this is in case you wrote to a variable that GCC thought it had in a register. this is the same as clobbering all registers. while I 've never run into a problem with it, you might also want to add"Cc"As a clobber if you change the condition codes (the bits inFlagsRegisterJnz,Je, Etc. Operators look .)

Now, that's all fine and good for loading specific registers. But what if you specify, say,Ebx, AndEcx, And GCC can't arrange for the values to be in those registers without having to stash the previous values. it's possible to let GCC pick the register (s ). you do this:

asm ("leal (%1,%1,4), %0"     : "=r" (x)     : "0" (x) );

The above example multiplies x by 5 really quickly (1 cycle on the Pentium). Now, we cocould have specified, sayEax. But unless we really need a specific register (like when usingRep movslOrRep stosl, Which are hardcoded to useEcx,Edi, AndEsi), Why not let GCC pick an available one? So when GCC generates the output code for GAS, % 0 will be replaced by the register it picked.

And where did"Q"And"R"Come from? Well,"Q"Causes GCC to allocate fromEax,Ebx,Ecx, AndEdx."R"Lets GCC also considerEsiAndEdi. So make sure, if you use"R"That it wocould be possible to useEsiOrEdiIn that instruction. If not, use"Q".

Now, you might wonder, how to determine how% NTokens get allocated to the arguments. It's a straightforward first-come-first-served, left-to-right thing, mapping to"Q"'S and"R"'S. But if you want to reuse a register allocated with"Q"Or"R", You use"0","1","2"... Etc.

You don't need to put a GCC-allocated register on the clobberlist as GCC knows that you're messing with it.

Now for output registers.

asm ("leal (%1,%1,4), %0"     : "=r" (x_times_5)     : "r" (x) );

Note the use=To specify an output register. you just have to do it that way. if you want 1 variable to stay in 1 register for both in and out, you have to respecify the register allocated to it on the way in with"0"Type codes as mentioned above.

asm ("leal (%0,%0,4), %0"     : "=r" (x)     : "0" (x) );

This also works, by the way:

asm ("leal (%%ebx,%%ebx,4), %%ebx"     : "=b" (x)     : "b" (x) );

2 things here:

  • Note that we don't have to putEBXOn the clobberlist, GCC knows it goes into x. Therefore, since it can know the valueEBX, It isn' t considered clobbered.
  • Notice that in extended asm, you must prefix registers%Instead of just%. Why, you ask? Because as GCC parses along for % 0's and % 1's and so on, it wocould interpret % edx as a % e parameter, see that's non-existent, and ignore it. then it wowould bitch about finding a symbol named dx, which isn't valid because it's not prefixed%And it's not the one you meant anyway.

Important note:If your assembly statementMustExecute where you put it, (I. e. must not be moved out of a loop as an optimization), put the keywordVolatileAfterAsmAnd before the ()'s. To be ultra-careful, use

__asm__ __volatile__ (...whatever...);

However, I wowould like to point out that if your assembly's only purpose is to calculate the output registers, with no other side effects, you showould leave offVolatileKeyword so your statement will be processed into GCC's common subexpression elimination optimization.

Some useful examples

#define disable() __asm__ __volatile__ ("cli");#define enable() __asm__ __volatile__ ("sti");

Of course,LibcHas these defined too.

#define times3(arg1, arg2) /__asm__ ( /  "leal (%0,%0,2),%0" /  : "=r" (arg2) /  : "0" (arg1) );#define times5(arg1, arg2) /__asm__ ( /  "leal (%0,%0,4),%0" /  : "=r" (arg2) /  : "0" (arg1) );#define times9(arg1, arg2) /__asm__ ( /  "leal (%0,%0,8),%0" /  : "=r" (arg2) /  : "0" (arg1) );

These multiply arg1 by 3, 5, or 9 and put them in arg2. You shoshould be OK to do:

times5(x,x);

As well.

#define rep_movsl(src, dest, numwords) /__asm__ __volatile__ ( /  "cld/n/t" /  "rep/n/t" /  "movsl" /  : : "S" (src), "D" (dest), "c" (numwords) /  : "%ecx", "%esi", "%edi" )

Helpful Hint: If you sayMemcpy ()With a constant length parameter, GCC will inline it toRep movslLike above. But if you need a variable length version that inlines and you're always moving dwords, there ya go.

#define rep_stosl(value, dest, numwords) /__asm__ __volatile__ ( /  "cld/n/t" /  "rep/n/t" /  "stosl" /  : : "a" (value), "D" (dest), "c" (numwords) /  : "%ecx", "%edi" )

Same as abve butMemset (), Which doesn't get inlined no matter what (for now .)

#define RDTSC(llptr) ({ /__asm__ __volatile__ ( /        ".byte 0x0f; .byte 0x31" /        : "=A" (llptr) /        : : "eax", "edx"); })

Reads the TimeStampCounter on the Pentium and puts the 64 bit result into llptr.

The End

"The End "?! Yah, I guess so.

If you're wondering, I personally am a big fan of AT&T/UNIX syntax now. (It might have helped that I cut my teeth on iSCSI assembly. of course, that machine actually had a decent number of general registers .) it might seem weird to you at first, but it's really more logical than Intel format, and has no ambiguities.

If I still haven' t answered a question of yours, look inInfoPages for more information, especially on the input/output registers. You can do some funky stuff like use""To allocate two registers at once for 64-bit math or"M"For static memory locations, and a bunch more that aren't really used as much"Q"And"R".

Alternately, mail me, and I'll see what I can do. (If you find any errors in the above,Please, E-mail me and tell me about it! It's frustrating enough to learn without buggy docs !) Or heck, mail me to say "boogabooga ."

It's the least you can do.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.