At&t compilation Study Notes

Source: Internet
Author: User
Differences between at&t assembly and Intel assembly

(1) In intel format, most uppercase letters are used, while in at&t format, lowercase letters are used.

(2) In at&t format, Register names must be prefixed with "%", but not intel format.

(3) In at&t's 386 assembly language, the order of the source and target operands of commands is the opposite of that in Intel's 386 assembly language. In intel format, the target is in the front and the source is in the back; In at&t format, the source is in the front and the target is in the back. For example, the content of register eax is sent to EBX, which is "Move EBX, eax" in Intel format, and "Move % eax, % EBX" in at&t format ".

(4) In at&t format, the size (width) of the operand of the access command is determined by the last letter of the operation code name (that is, the suffix of the operation code. The letters used as the operation code suffix are B (8 bits), w (16 bits), and L (32 bits ). In intel format, "Byte PTR", "word PTR", and "dword ptr" are added before the operands indicating memory units. For example, if the byte in the memory unit indicated by foo is taken into the 8-bit register Al, it is not used in the two formats as follows:

MoV Al, byte PTR Foo (Intel format)

Movb Foo, % Al (at&t format)

(5) In at&t format, the "$" prefix must be added for the direct operand, but not for the Intel format. Therefore, in Intel format, "Push 4" is changed to "pushl $4" in at&t format"

(6) In the at&t format, the operands of the absolute transfer or call command jump/call (that is, the destination address of the transfer or call ), add "*" as the prefix (do not think it is a pointer in C language, haha), but not in Intel format.

(7) The name of the operation code for the remote Transfer Instruction and subroutine call instruction, in at&t format: "ljmp" and "lcall", while in Intel format, it is "JMP far" and "Call far ". When the transfer or call target is a direct operand, two different representations are as follows:

Call far section: offset (Intel format)

JMP far section: offset (Intel format)

Lcall $ section, $ offset (at&t format)

Ljmp $ section, $ offset (at&t format)

The corresponding remote return command is:

RET far stack_adjust (Intel format)

LRET $ stack_adjust (at&t format)

(8) general format of indirect addressing. The differences between the two are as follows:

Section: [base + Index * scale + disp] (Intel format)

Section: disp (base, index, scale) (at&t format)

This addressing method is often used to access a field in a specific element in an array of data structures. Base is the starting address of the array, scale is the size of each array element, and index is the subscript. If the array element is a data structure, DISP is the offset of a specific field in the structure.

Note that the calculations are implicitly performed in the at&t format. For example, if section is omitted, index and scale are also omitted, base is EBP, and disp (offset) is 4, It is shown as follows:

[Ebp-4] (Intel format)

-4 (% EBP) (at&t format)

If there is only one base in the brackets in at&t format, you can omit the comma. Otherwise, it cannot be omitted. Therefore, (% EBP) wants to be (% EBP ,,), further equivalent to (% EBP, 0, 0 ). For example, if index is eax, scale is 4 (32 bits), DISP is Foo, and others are omitted, it is:

[Foo + eax * 4] (Intel format)

Foo (, % eax, 4) (at&t format)


386 assembly language section embedded in C code

When you need to embed an Assembly Language Segment in a C program, you can use the "ASM" Statement function provided by GCC. For example: # DEFINE _ slow_down_io _ ASM _ volatile _ ("outb % Al, $0x80 ")

This is an 8-bit output command. As mentioned above, adding the suffix "B" to the operator indicates that this is 8-bit, while 0x80 is a constant, that is, the so-called "direct operand ", therefore, the prefix "$" is added, and the Register name Al is also prefixed with "%".

The Assembly statement above is easy to understand. Let's look at an example of a little difficult:

static __inline__ void atomic_add(int i, atomic_t *v){__asm__ __volatile__(LOCK "addl %1,%0":"=m" (v->counter):"ir" (i), "m" (v->counter));}

In general, inserting assembly language code segments into C code is much more than copying the "pure" assembly language code, because there is a way to allocate and use registers here, how to combine with the variables in C code. For this purpose, the assembly language used must be expanded to provide guidance for the Assembly Tool. The result is that its syntax is actually programmed in an intermediate language different from the assembly language or C language.

An Assembly Language Segment inserted into the C code can be divided into four parts, separated by the ":" number, the general form is:

Command: Output: input: damaged

The first part is the Assembly Statement itself. Its format is basically the same as that used in assembly language programs, but there are also differences. This part can be called the "command department" and is mandatory, while other departments can be omitted based on specific situations. Therefore, in the simplest case, the addition is basically the same as the conventional Assembly statement.

When the Assembly Language code snippet is embedded into the C code, how to combine the operands with the variables in the C code is obviously a problem. This is because when programmers write embedded assembly code, they clearly know what commands should be used according to the requirements of program logic, however, it is impossible to know exactly which variable the GCC will allocate to before and after the embedding point, and which or which registers are idle. In addition, it is still not enough to passively know the situation of GCC's register allocation. There is also a way to inform GCC of the requirements for using registers, which in turn affects its allocation of registers. Of course, if GCC has very powerful functions, the embedded assembly code should be able to sum up these requirements, and then be optimized to achieve the goal. However, even so, the introduced uncertainty is still a problem, let alone it is not easy to do so. To address this problem, GCC adopts a compromise: programmers only provide specific instructions, while the use of registers generally only provides a sample and some constraints, leaving the issue of how to combine variables to GCC and gas for processing.

In the instruction department, the number is prefixed with %, for example, % 0, % 1, and so on, indicating that the Register's sample operand needs to be used. The total number of such operations can be used depends on the number of General registers in the CPU. In this way, the instruction Department uses several different such operations, which indicates that there are several variables that need to be combined with registers. The GCC and gas are automatically modified according to the following constraints during compilation and assembly. Because these sample operands also use the "%" prefix, two "%" characters must be added before the register name when specific registers are involved to avoid confusion.

So how can we express the constraints on variable combination? This is the role of the other parts. Next to the instruction department, the "output Department" is used to specify the constraints on the output variables, that is, how the target operands are combined. Each such condition is called a "constraint ". When necessary, the output department may have multiple constraints separated by commas. Each output constraint starts with "=", followed by a letter indicating the operand type, followed by a constraint on variable combination. For example, in the preceding example, the output part is

:”=m”(v->counter)

Here there is a constraint. "= m" indicates that the target operand (% 0 in the instruction Department) is a memory unit. All registers or operands that combine with the operands described in the output part do not retain the content before execution after the embedded assembly code is executed, this provides GCC with the basis for scheduling to use these registers.

 

The output part is followed by the "input part ". The format of the input constraint is similar to that of the output constraint without the "=" sign. In the preceding example, the input part has two constraints. The first one is "Ir (I)", indicating that % 1 in the instruction can be a direct operand in the register (I indicates immediate ), and the operand comes from the variable name in the C code (the call parameter here) I. The second constraint is "M" (V-> counter), which means the same as that in the output constraint. If a register is required for an input constraint, GCC allocates a register for it during preprocessing and automatically inserts necessary commands to load the value of the operand, namely the variable, into the register. Registers or operands combined with the operands described in the input part are not reserved after embedded assembly code is executed. For example, here 1% requires a register, so GCC will allocate a register for it and automatically insert a movl command to load the value of parameter I into this register, but the original value of this register no longer exists. It doesn't matter if the register is idle, but if all registers are in use and you have to borrow one temporarily, You have to ensure that the original content is restored after use. At this time, GCC will automatically insert a pushl command at the beginning, save the original content of the register in the stack, and insert a popl command after the end to restore the register content.

 

In some operations, in addition to the registers used for input and output operations, several registers are also used for calculation and operation intermediate results, the original content of these registers is corrupted. Therefore, we need to describe the side effects of the operation in the corrupt part so that GCC can take appropriate measures. However, sometimes these instructions are directly put in the output part, and that is not necessary.

 

The number of the operand starts from the first constraint in the output part (the serial number is 0). The number of the operands is counted once. When you reference these operands in the instruction department or assign registers for these operands, add "%" before the serial number. When referencing an operand in the instruction department, it is always treated as a 32-bit "Long character", but the operations performed on it can also be byte operations or word operations as needed. By default, byte operations on operands are performed on lower bytes, and word operations are the same. However, in some special operations, when performing the byte operation on the operands, you can also specify which byte operation is performed on, insert "B" between "%" and "Serial Number" to indicate the lowest byte, and insert "H" to indicate the second low byte.

The following letters indicate constraints:

"M", "V", "O" -- indicates the memory unit "R" -- indicates any register "Q" -- indicates the registers eax, EBX, ECx, one of edX's "I" and "H" -- represents the direct operands "E" and "F" -- represents the floating point "G" -- represents any "A", "B ", "C", "D" -- indicate that registers eax, EBX, ECx, and EDX "S" are required ", "D" -- requires the use of register ESI or EDI "I" -- represents a constant (0-31)

In addition, if an operand requires the same register as the one specified in the preceding constraint, the operand number corresponding to the constraint is placed in the constraint. In the damaged part, "Memory" is often used as the constraint, indicating that the content in the memory has changed after the operation is completed. If the content of an original register comes from the memory, it may be inconsistent now.

 

Note that when the output part is empty, that is, there is no output constraint. If an input constraint exists, the separator ":" must be retained.

 

Back to the example above, this code is used to add the value of parameter I to V-> counter. The keyword lock in the Code indicates that the system bus should be locked when the addl command is executed, do not disturb other CPUs. Adding two numbers is a simple operation. The C language clearly contains the corresponding language components, for example, "V-> counter + = I;". Why should I use assembly? The reason is that only one command is required to complete the operation and the bus is locked to ensure the "atomicity" of the operation ". In contrast, the number of commands in the C statement after compilation is not guaranteed, and the bus lock cannot be required during the computing process.

Let's look at a piece of Embedded Assembly Code:

// From include/asm-i386/bitops. hstatic inline void set_bit (int nr, volatile void * ADDR) {ASM volatile (lock; "BTS % 1, % 0": "= m" (* (volatile long *) ADDR): "ir" (NR): "Memory ");}

The command btsl sets a bit in a 32-bit operand to 1, and the NR parameter indicates the position of this bit.


Let's look at a complicated example:

// From include/asm-i386/string. hstatic _ always_inline void * _ memcpy (void * To, const void * From, size_t N) {int D0, D1, D2; __asm _ volatile _ ("rep; movsl/n/t" "testb $2, % B4/n/t "" Je 1f/n/t "" movsw/N "" 1:/ttestb $1, % B4/n/t "" Je 2f/n/t "" movsb/N "" 2: ":" = & C "(D0 ), "= & D" (D1), "= & S" (D2): "0" (N/4), "G" (N ), "1" (long) to), "2" (long) from): "Memory"); Return ();}

_ Memcpy is the underlying implementation of memcpy () in the kernel. It is used to copy the content of a piece of memory space and ignore its data structure. This is a function that is frequently used, so its running efficiency is very important.

 

First look at the combination of constraints and variables and registers. The output department has three constraints, corresponding to the operands % 0 to % 2. The variable D0 is the operand % 0, which must be placed in the register ECx. The reason will be understood. Similarly, D1 (% 1) must be placed in the register EDI; d2 (2%) must be placed in the register ESI. Let's look at the input part. Here there are four constraints, corresponding to the operands % 3 to % 6. The operand % 3 and the operand % 0 use the same register, so it must also be the register ECx. In addition, the necessary commands must be automatically inserted by GCC to set it to N/4, in fact, the copy length is converted from the number of bytes n to the number of characters N/4. As for N itself, GCC is required to allocate any register for storage. The operands 5% and 6%, that is, the parameters to and from, use the same registers as % 1 and % 2 respectively, so they must also be the registers EDI and ESI.

 

Look at the instruction Department again. The first instruction is "Rep", which indicates that the next instruction movsl should be executed repeatedly. Each time it is repeated, the content in the register ECX will be reduced by 1 until it becomes 0. Therefore, the code is executed N/4 times in total. So what does movsl do? It copies a long word from the place indicated by ESI to the place indicated by EDI, and adds 4 to ESI and EDI respectively. In this way, when "rep; movsl/n/t" in the code is executed, all the long words have been copied, and only three bytes are left at most. In this process, the ECX, EDI, and ESI registers are used. That is, the three operands % 0 (also % 3), % 2 (also % 6), and 1% (also % 5) are hidden in the command, not literally. This also explains why these operation books must be stored in the specified register.

 

The next step is to process the remaining three bytes. Test the operand % 4 through testb, that is, copy bit1 in the lowest byte of n. If this bit is 1, it indicates that there are at least two bytes, therefore, use movsw to copy a short word (ESI and EDI add 2), otherwise it will be skipped. Test bit0 of the operand % 4 through testb. If this bit is 1, the next byte is left. Therefore, run the command movsb to copy another byte. Otherwise, skip it. When the number 2 is reached, the execution is complete.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.