GCC Embedded Assembly

Source: Internet
Author: User
Tags processing instruction

GCC Embedded Assembly

Hedgehog @ http://blog.csdn.net/littlehedgehog

I have sorted out the at&t manual for easy reading.

 

Most of the kernel code is written in C language, and only a small part is written in assembly language, such as Code related to a specific architecture and code that has a great impact on performance. GCC provides the Embedded Assembly function, which can directly embed assembly language statements in C code, greatly facilitating program design.

Simple embedded assembly is easy to understand
For example:

_ ASM ____ volatile _ ("hlt ");

"_ ASM _" indicates that the subsequent code is embedded assembly, and "ASM" is the alias of "_ ASM.
"_ Volatile _" indicates that the compiler should not optimize the code, and the subsequent commands will remain unchanged,
Volatile is its alias. The brackets contain Assembly commands.

 

To use embedded assembly, you must first compile the assembly instruction template, then associate the C language expression with the instruction operand, And tell GCC What restrictions are imposed on these operations. For example, in the following Assembly Statement: [the lower part of the explanation must be readable and understandable.]

_ ASM _ violate _ ("movl % 1, % 0": "= r" (result): "M" (input ));

"Movl % 1, % 0" is the instruction template; "% 0" and "% 1" represent the instruction operands, which are called placeholders, embedded Assembly relies on them to correspond the C language expression to the instruction operand. The instruction template is enclosed in parentheses in C language expressions. In this example, there are only two expressions: "result" and "input". They are displayed in the order of occurrence respectively with the instruction operand "% 0 ", "% 1," corresponds to; Note the corresponding order: the first c expression corresponds to "% 0"; the second expression corresponds to "% 1", and so on. There are up to 10 operands, use "% 0", "% 1 ".... "% 9," indicates. There is a string enclosed in quotation marks before each operand. The content of the string is a limitation or requirement on the operand. The limit string before "result" is "= r", where "=" indicates that "result" is the output operand, "R" indicates that the "result" needs to be associated with a general register. First, read the value of the operand into the Register, and then use the corresponding register in the instruction, instead of the "result" itself, of course, after executing the command, you need to store the value in the Register into the variable "result". On the surface, it seems that the command directly performs operations on the "result". In fact, GCC implements implicit processing, in this way, we can write less commands. The "R" in front of "input" indicates that the expression needs to be put into a register first, and then use this register in the instruction for calculation.

 

Let's take a look at a small example of Embedded Assembly:

 

Extern int input, result;

Void test (void)
...{
Input = 1;
_ ASM _ volatile _ ("movl % 1, % 0 ":
"= R" (result): "R" (input ));
Return;
}

The compilation code is as follows;

Line number code explanation

1
7
8 movl $1: Input corresponds to the C language statement input = 1;
9 movl input, % eax
10 # comments inserted by app gcc, indicating that the embedded assembly starts
11 movl % eax, % eax our Embedded Assembly Statement
12 # note inserted by no_app GCC, indicating that the embedded assembly is complete
13 movl % eax. The result is saved to the result variable.

From the assembly code, we can see that the 9th and 13th lines are GCC and the code is automatically added. GCC decides how to process the c expression based on the limited string, in this example, both expressions are specified as the "r" type.
Movl input, % eax reads the input into the register % eax;

GCC also specifies that a register is related to the output variable result. In this example, % eax is used. Run the following command after obtaining the operation result:

Movl % eax, result

Write the register value back to the result of the C variable. From the assembly code above, we can see that the registers related to result and input are both % eax. GCC uses % eax to replace % 0, % 1 in the embedded assembly instruction template.

Movl % eax, % eax
Obviously, this sentence can be avoided. This statement is not optimized, so it is not removed.

It can be seen that the relationship between the c expression or variable and the register is automatically handled by GCC. We only need to use a restricted string to guide GCC in how to process it. The limit character must match the instruction's requirements on the operands. Otherwise, the resulting assembly code may be wrong. You can change the two "R" in the above example to "M" (m, indicates that the operands are stored in the memory, rather than in registers). The result after compilation is:

Movl input, result

Obviously this is an invalid instruction, so the character string must match the instruction's requirements on the operand. For example, the instruction movl allows the Register to the register and counts immediately to the register, but does not allow the memory to the memory. Therefore, the two operands cannot use "M" as the delimiter at the same time.

From this we can summarize the format of Embedded Assembly:

_ ASM __(
Assembly statement template:
Output part:
Input:
Damage description)

There are four parts in total: The Assembly statement template, the output part, the input part, and the destruction description part. Each part uses the ":" format. The Assembly statement template is required. The other three parts are optional, if the latter part is used, and the former part is empty, use the ":" format. The content of the corresponding part is empty. For example:

_ ASM _ volatile __(
"CLI ":
:
: "Memory ")

Let's explain separately:

Output part

The output part describes the output operands. Different operands are separated by commas. each operand descriptor consists of a limited string and C language variables. The limited string of each output operand must contain "=", indicating that it is an output operand.

Example:

_ ASM _ volatile _ ("pushfl; popl % 0; CLI": "= G" (x ))


The descriptor string represents the constraints on the variable, so that GCC can determine how to allocate registers based on these conditions and how to generate the connection between the necessary code Processing Instruction operands and C expressions or C variables.

Input

The input part describes the input operands. Different operands are separated by commas (,). Each operand descriptor consists of a limited string, a C expression, or a C variable.

For example:

Static _ inline _ void _ set_bit (int nr, volatile void * ADDR)
...{
_ ASM __(
"Btsl % 1, % 0 ":
"= M" (ADDR ):
"Ir" (NR)
);
}

This example sets the NR bit of (* ADDR) to 1. The first placeholder % 0 corresponds to C, the language variable ADDR, and the second placeholder % 1 corresponds to C, the language variable Nr. Therefore, the Assembly statement code above is equivalent to the pseudocode below:
Btsl NR and ADDR, the two operands of this instruction cannot be all memory variables. Therefore, the qualified string of NR is specified as "LR" (which will be explained below) and associated with the immediate number or register, in this way, only ADDR is the memory variable in the two operands.

Character limit
There are many types of restricted characters, some of which are related to the specific architecture. Here, only the commonly used qualified characters and some common qualifiers that may be used in i386 are listed. They are used to indicate how the compiler processes the relationship between the C language variables and the instruction operands. For example, whether to place the variables in registers or in memory, the following table lists frequently used qualified letters.

"B" puts the input variable into EBX
"C" puts input variables into ECx
"D" puts the input variable in edX
"S" puts the input variable into ESI
"D" puts the input variable into EDI
"Q" puts the input variables into one of eax, EBX, ECx, and EDX.
"R" puts the input variables into a general register, that is, one of eax, EBX, ECx, EDX, ESI, and EDI.
"A" combines eax and EDX into a 64-bit register (uselong longs)
"M" memory variable
The "O" operand is a memory variable, but its addressing method is offset type, that is, base address addressing, or base address plus address change addressing.
The "v" operand is a memory variable, but the addressing method is not an offset type.
"," The operand is a memory variable, but the addressing mode is auto increment.
The "p" operand is a valid memory address (pointer)


Register or memory

"G" puts the input variable into one of eax, EBX, ECx, and EDX or as the memory variable.
The "X" operand can be of any type.


Instant count
Immediate number between "I" 0-31 (for 32-bit displacement instructions)
"J" 0-63 "(for 64-bit displacement instructions)
"N" 0-255, the immediate number between (used for the out command)
"I" immediate count
"N" immediate number. Some systems do not support immediate numbers except words. These systems should use "N" instead of "I"


Match

"0", "1"... "9"
It indicates that the operands restricted by it match a specified operand, that is, this operand is the specified operand. For example, "0" is used to describe the "% 1" operand, "% 1" references the "% 0" operations. Note the difference between 0-9 as the delimiter and "% 0"-"% 9" in the command, the former describes the operands, and the latter represents the operands.

Operand type
The "=" operand is only written in the instruction (the output operand)
The "+" operand is read/write type in the instruction (input/output operand)

Floating Point Number
"F"

Floating point register
"T" the first floating point register
"U" second floating point register
80387 of the "G" standard

Now let's continue with the example above:
"= M" (ADDR) indicates that ADDR is a memory variable ("M") and an output variable ("="); "ir" (NR) indicates NR, it is an immediate number between 0 and 31 ("I") or a register operand ("R ").

 

The matching Delimiter is a digit "0", "1 "..... "9", respectively, indicates the c expression restricted by it and the placeholder % 0, % 1 ,...... The C variable corresponding to % 9 matches. For example, if "0" is used as the limit character of % 1, % 0 and % 1 indicate the same C variable.
Let's look at an example:

Extern int input, result;
Void test_at_t ()
...{
Result = 0;
Input = 1;
_ ASM __
_ Volatile _ ("addl % 2, % 0": "= r" (result): "0" (result), "M" (input ));

}

The result in the input part is limited by the matching delimiter "0", indicating that % 1 and % 0 represent the same variable. The input part describes the input function of this variable, the output part describes the output function of the variable. The two are combined to indicate the result, which is a read/write type. Because % 0 and % 1 indicate the same C variable, they are placed in the same location, whether in registers or in memory.

Register corruption Descriptor

Generally, only one language is used for programming: advanced language or assembly language. The steps for compiling advanced languages are as follows:

Preprocessing;
L
Compile
L
Assembly
L
Link

Here we only care about Step 2 compilation (converting C code into assembly code): because all the code is written in advanced languages, the compiler can recognize the functions of various statements, in the conversion process, all registers are determined by the compiler for allocation and use. It can ensure that the use of registers does not conflict with each other. It can also be used as a buffer for variables, because register access is much faster than memory. If all the language is used, the programmer controls the use of registers, and the programmer can only ensure the correctness of the use of registers. However, the mixed use of the two languages becomes more complicated, because the embedded assembly code can directly use registers, during conversion, the compiler does not check the registers used by embedded assembly code (because it is difficult to check which registers are used by assembly instructions, for example, some commands implicitly modify registers, sometimes the embedded assembly code calls other sub-processes, and the sub-processes also modify registers ), therefore, a mechanism is required to notify the compiler of the registers we use (the programmer knows which registers are used in the embedded assembly code). Otherwise, the use of these registers may lead to errors, modifying the description can play this role. Of course, the registers specified in the input and output sections of the embedded assembly can be specified as "r". The registers allocated by the compiler of the "G" type do not need to be described in the damage description section, because the compiler already knows.

Let's take a look at the example below to figure out why we need to notify GCC of the registers used in Embedded Assembly Code that are implicitly (called implicit because GCC does not know.

Some registers may be directly referenced in embedded assembly instructions. We already know that in at&t-format assembly languages, Register names are prefixed with "%, to retain this "%" in the generated assembler, register references in the ASM statement must use "%" as the prefix of Register names. The reason is that "%" plays the same role in ASM and Embedded Assembly statements as "/" in C language. Therefore, "%" indicates "%" after conversion ".

 

Int main (void)
...{
Int input, output, temp;
Input = 1;
_ ASM _ volatile _ ("movl $0, % eax;
Movl % eax, % 1;
Movl % 2, % eax;
Movl % eax, % 0 ;"
: "= M" (output), "= m" (temp)/** // * output */
: "R" (input)/** // * input */
);
Return 0;
}
 

This Code uses % eax as a temporary register. The function is equivalent to C Code: "temp = 0; Output = input ",
The compilation code is as follows:

Movl $1,-4 (% EBP)
Movl-4 (% EBP), % eax/APP
Movl $0, % eax;
Movl % eax,-12 (% EBP );
Movl % eax, % eax;
Movl % eax,-8 (% EBP);/no_app

Apparently, the register allocated to the input by GCC is also % eax. A conflict occurs, and the output value is always 0 rather than input.

Use the code after the destruction description:

 

Int main (void)
...{
Int input, output, temp;
Input = 1;
_ ASM _ volatile __
("Movl $0, % eax;
Movl % eax, % 1;
Movl % 2, % eax;
Movl % eax, % 0 ;"
: "= M" (output), "= m" (temp)/** // * output */
: "R" (input)/** // * input */
: "Eax");/** // * descriptor */

Return 0;
}

Corresponding assembly code:

Movl $1,-4 (% EBP)
Movl-4 (% EBP), % edX // app
Movl $0, % eax;
Movl % eax,-12 (% EBP );
Movl % edX, % eax;
Movl % eax,-8 (% EBP);/no_app

Through the destruction description section, GCC learned that % eax was used, so % edX was allocated to the input. When using Embedded Assembly, remember to: Tell GCC as much information as possible to prevent errors.

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/littlehedgehog/archive/2008/04/08/2259665.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.