Atpcs and inline assembly: Use rules for function call registers on ARM processors __ function

Source: Internet
Author: User

In order to optimize the ARM CPU to do deinterlace, learning Arm assembly, for ARM Assembly of the parameters of the rules do not understand, hereby record.

Original link:

Http://lli_njupt.0fees.net/ar01s05.html

5. Atpcs and inline assembly

Atpcs (arm-thumb produce call Standard) is the basic rule of subroutine invocation of ARM program and THUMB program, which includes the usage rules of register, the usage rules of data stack and the passing rules of parameters. 5.1. Arm registers

Before you learn about the usage rules for registers, look at arm registers first.

picture of ARM registers and corresponding assembly descriptors


The abbreviations are as follows: R:register registers Pc:program Counter; program counters cpsr:current programs status register; Current program status register spsr:saved programs status Register; Saved program State register sp:stack pointer data stack pointer lr:link register; connection Register SB: Static base Address register SL: Data stack limit pointer FP: frame pointer ip:intra-procedure-call Scratch Register; Internal program invoke staging registers

ARM has a total of 37 registers, can work in 7 different modes. The following descriptions are categorized according to the above figure: the R0-R7 registers are shared for all modes, with a total of 8. R8-r12 in the packet register, the fast interrupt mode has its own set of registers, and other modes are shared, so there are 10. R13,r14 in the packet register, in addition to user mode and system mode sharing, other modes of each group, so there are 2*7-2 = 12. R15 and CPSR, total 2; SPSR In addition to user mode and system mode, the other modes are one and a total of 5.

So the total number is 8+10+12+2+5 = 37. The corresponding assembly names indicate their usual agreed usage. note that the capitalization of register names (such as R0) and ATPCS Convention tokens (such as A0) is not sensitive in the assembly.

Fig. 29. Program Status Register CPSR


Status Bit N: Operation result sign bit, 1: negative, 0: positive or 0 Z: The result of the operation is 0, 1: The result of the operation is 0, 0: The result of the operation is 1 C: there are 4 ways to set the value of C:
-Addition operations (including comparison instruction CMN): When the result of the operation produced a carry (unsigned number overflow), c=1, otherwise c=0.
-Subtraction operations (including comparison instruction CMP): When the operation generated borrow, c=0, otherwise c=1.
C is the last person to remove a value for a non plus/minus operation instruction containing a shift operation.
-For other non-add/subtract operation instructions, the value of C usually does not change. V: For the addition/subtraction operation instruction, when the manipulation number and the result of the operation are signed numbers represented by the binary complement, the v=1 represents the sign bit overflow. For other non-add/subtract operation instructions, the value of V usually does not change

Control bit: I:IRQ enable bit, when I=1, prohibit IRQ interrupt, otherwise, allow IRQ interrupt. F:fiq enable the f=1, when the Fiq interrupt is prohibited, otherwise, allow Fiq interrupt. T: State bit, used to indicate the state of the instruction execution, that is, whether this instruction is arm instruction or thumb instruction. M: Control bit m[4:0] 7 operating modes to control the processor

Table 9. 7 working modes of ARM CPU

Mode abbreviation m Control bit Description
User mode Usr 0b10000

Normal program execution mode

Quick break mode Fiq 0b10001
Break mode Irq 0b10010
Management mode Svc 0b10011

To run a privileged operating system task

Data access termination mode Abt 0b10111

Enter this mode when data or instruction Prefetch terminates

No pattern defined und 0b11011

This mode is entered when an undefined instruction is executed

System mode Sys 0b11111

Protection mode used by the operating system


Can use assembly instructions for mode switching, or the occurrence of various interruptions, abnormal CPU automatically into the corresponding mode; In addition to user mode, the remaining 6 modes of operation are privileged mode, and the remaining 5 modes in privileged mode, other than system mode, are called exception modes. 5.2. Register usage rules

The subroutine passes the parameter [2] between the registers through the register R0-R3. At this time, register R0-R3 can be recorded as A1-A4. The invoked subroutine does not need to recover the contents of the register R0-R3 before returning. If the number of arguments is more than 4, the remaining Word data is passed through the data stack. Our Hello world! example is to pass the standard output file description, the string pointer and the length to the system by R0-R3.

In subroutines, use the register R4-R11 to save local variables. At this time, register R4-r11 can be recorded as V1-V8. If some registers in the register v1-v8 are used in the subroutine, the subroutine must save the values of these registers when they enter, and must restore the values of those registers before returning. In thumb[3] programs, you can usually save local variables using only register R4-R7. In addition, R9, R10 and R11 also have a special role, which is recorded as: Static base register SB, data stack limit pointer SL and frame pointer FP. Register R12 used as a subroutine call when the stack pointer is temporarily saved, the function returned using the register to stack, recorded as IP; in the link code between subroutines often this use rule, the called function does not have to restore the R12 before returning.

The register R13 is used as the stack pointer and is recorded as an SP. Register R13 cannot be used for other purposes in subroutines. The value of the Register SP when it enters the subroutine and the value of the exit subroutine must be equal.

The register R14 is called the link register, which is used as LR to hold the return address of the subroutine. If you save a return address in a subroutine, the register R14 can be used for other purposes, but it is restored when the program returns.

Register R15 is a program counter, recorded as a PC, it cannot be used for other purposes. In the interrupt program, all registers must be protected and the compiler will automatically protect the r4~r11.

The registers in the atpcs are predefined in the arm compiler and assembler, i.e. they are already specified in the compilation tool set and cannot be changed.

Here for the first time in user space with assembly language and C language to write an example, they use the above mentioned some registers. HELLO.C contains a standard GNU C program main function main, which invokes the use of assembly language to write subroutine Strcopy, and returns the number of characters copied from the source string to the destination string. The contents of hello.c are as follows:

#include <stdio.h>

extern int strcopy (char *dst, const char *src);
int main ()
{
        int ret = 0;
        const char *SRC = "Hello world!";
        Char dst[] = "World hello!";

        printf ("DST string is \ \"%s\ "and src string are \"%s\ "\ n", DST, SRC);

        ret = strcopy (DST, SRC);

        printf ("After copying%d chars and now DST string is \"%s\ "\ n", ret, DST);

        return 0;
}

The C program call assembler should first declare the assembler name to invoke through extern declaration, which is the symbol that the assembly language uses the. Global label, which tells the connector the address of the subroutine, equivalent to the function name in C. The number of formal parameters in a declaration is consistent with the number of variables required in the assembler, and the parameters are passed to satisfy the atpcs rule. Here passed two pointer parameters, in Strcopy. S will refer to them through registers r0 and R1.

Strcopy. The contents of s are as follows:

. section. Text
. Align 2

. Global strcopy
strcopy:
/*let R4 as a counter and return*/
 push {r4}
 mov r4, #0
1:
 LDRB R2, [R1], #1
 strb R2, [r0], #1 
 cmp R2, #0  
 add R4, r4, #1 
 bne 1b

 mov r0, R4     @as A return value
 pops {r4}

 mov pc, LR     @continue to exe next instruction

Strcopy. s uses the R2 registers, it uses in the subroutine, does not need to save to return; R4 as a counter, statistics the number of copied characters, it at the beginning of the program is saved to the stack, and at the end of the pop-up, its statistical value through the register R0 return. Another important register is LR, which when the C program calls the function, is filled in when the function needs to execute the next instruction address, so after the subroutine execution, you need to assign it to the program counter PC.

Try the following methods to compile, which gives us a better understanding of the nature of GCC compilation.

# arm-linux-gcc-c Hello.c-o hello.o 
# arm-linux-as-c strcopy. S-o strcopy.o
# arm-linux-ld  hello.o strcpy.o-o Hello
arm-linux-ld:warning:cannot Find entry symbol _st Art Defaulting to 00008080
hello.o:in function ' main ':
hello.c: (. text+0x38): Undefined reference to ' memcpy '
hello.c: (. text+0x4c): Undefined reference to ' printf '
hello.c: (. text+0x78): Undefined reference to ' printf '

Arm-linux-ld cannot find the desired dynamic libraries and paths on its own, and these things GCC has helped us integrate together. An introduction and valid commands are as follows, in order to print out the actual process of the entire compilation, use the parameter-V:

# ARM-LINUX-GCC hello.o strcpy.o-o hello-v arm-linux-gcc hello.o strcpy.o-o Using hello-v built-in.
Target:arm-unknown-linux-gnueabi configured with: ... Thread model:posix gcc version 4.2.2/usr/local/arm/4.2.2-eabi/usr/bin-ccache/. /libexec/gcc/arm-unknown-linux-gnueabi/4.2.2/collect2--sysroot=/usr/local/arm/4.2.2-eabi/--EH-FRAME-HDR- Dynamic-linker/lib/ld-linux.so.3-x-M Armelf_linux_eabi-o hello/usr/local/arm/4.2.2-eabi//usr/lib/crt1.o/usr/ local/arm/4.2.2-eabi//usr/lib/crti.o/usr/local/arm/4.2.2-eabi/usr/bin-ccache/.. /lib/gcc/arm-unknown-linux-gnueabi/4.2.2/crtbegin.o-l/usr/local/arm/4.2.2-eabi/usr/bin-ccache/.. /lib/gcc/arm-unknown-linux-gnueabi/4.2.2-l/usr/local/arm/4.2.2-eabi/usr/bin-ccache/.. /lib/gcc-l/usr/local/arm/4.2.2-eabi//lib-l/usr/local/arm/4.2.2-eabi//usr/lib hello.o STRCPY.O-LGCC--AS-NEEDED-LG cc_s--NO-AS-NEEDED-LC-LGCC--as-needed-lgcc_s--no-as-needed/usr/local/arm/4.2.2-eabi/usr/bin-ccache/. /lib/gcc/arm-unknown-linux-gnueabi/4.2.2/crtend.o/usr/local/arm/4.2.2-eabi//usr/lib/crtn.o
 

Note the Collect2 command, which is similar to LD, but it first looks for the required. o files and dynamic libraries from the relevant directories and links them again. The results of the application are as follows:

#./hello 
DST string is ' world hello! ' and src string is ' Hello world! '
After copying chars and now DST string are "Hello world!"

5.3. Use of data stacks

Figure 30. Classification of data Stacks


Depending on the position of the stack pointer, the stack can be divided into 2 kinds of full stacks and empty stacks. When the stack pointer points to the top element of the stack, which refers to the last data element to the stack, it is called a full stack, and when the stack pointer points to an available data unit adjacent to the top of the stack, it is called an empty stack.

According to the growth direction of the data stack can also be divided into incremental stack and descending stack 2 kinds. When the data stack grows in the direction that the memory address is reduced, it is called the descending stack, which, when the data stack grows in the direction of the memory address increase, is called the increment stack.

Full stack and empty stack and the combination of two types of growth will be the following 4 kinds of data stacks: FD full decline, ED air decline, FA full increment, EA empty increments. The atpcs stipulates that the data stack is of type FD and that the operation of the data stack is 8 bytes aligned. PUSH and POP directives, which take a full descending stack, increment the address each time the transfer is performed in the POP instruction, decreasing the address each time the transfer is performed in the PUSH instruction. The corresponding multiple data transfer instructions are STMFD and LDMFD.

Figure 31. Atpcs Data Stack


The data stack is described by the data stack pointer sp, the top of the stack on the current stack, or the next writable address on the top of the stack. The base address of the data stack is the highest/lowest address of the data stack, because the data stack in the atpcs is FD type, in fact, the first memory unit in the data stack that occupies the stack data is the next internal deposit of the base site. The data stack boundary SL is the lowest/highest memory cell address that can be used in the data stack. The used data stack refers to the area between the base address of the data stack and the data stack pointer. This includes the memory unit corresponding to the data stack pointer, excluding the corresponding memory unit of the base address. The Data frame SF (stack frames) in the data stack is the area that is assigned to the subroutine to hold registers and local variables in the data stack.

When a function is called, if there are more than 4 arguments, the data stack is used to pass the function arguments, and another important function is to store the local variables. Atpcs to the parameters of the transmission of the following provisions. Depending on whether the number of parameters is fixed, the subroutine can be divided into subroutine with fixed number of parameters and subroutine with variable number of parameters. The parameter passing rules for these 2 seed programs are different.

subroutine parameter passing rule with variable number of parameters when the parameter is not more than 4, the register R0~R3 can be used for parameter passing. When there are more than 4 parameters, the data stack is used to pass parameters. When parameters are passed, all parameters are treated as Word data stored in contiguous memory cells. Then, the data of each name is transmitted to the Register R0,R1,R2,R3 in turn; If there are more than 4 parameters, the remaining Word data is transferred to the data stack, and the order of the stacks is reversed to the parameter order, that is, the last Word data is first placed in the stack. According to the above rules, a floating-point parameter can be passed through registers, can also pass through the data stack, may also pass through registers, the other half passes through the data stack.

Subroutine parameter with fixed number of parameters the subroutine that the parameter number is fixed, the parameter pass is different from the subroutine parameter passing rule with parameter number variable, if the system contains the hardware part of floating-point operation, the floating point parameter will be passed according to the following rules: Each floating-point parameter is processed sequentially; A FP register is allocated for each floating-point parameter, and the assigned method is a continuous set of FP registers that satisfies the floating-point parameter and has the smallest number. The first integer parameter is passed through the register R0~R3, and the other parameters are passed through the data stack.

An example of using a stack pass parameter is to use the assembly to compute the and return of 6 integers, and to print out the results through the printf function in C. It contains two files: Prt.c and SUM.S.

#include <stdio.h>

extern int sum (int a, int b, int c, int d, int e, int f);
int main ()
{
        printf ("Sum is:%d\n", SUM (1, 2, 3, 4, 5, 6));
        return 0;
}

. section. Text
. Global sum
:
 add R0, R0, R1
 add R0, R0, R2
 add r0, R0, R3
 pop {r1, r2}
 Add R0, R0, R1
 add R0, r0, R2
 mov pc, LR     @continue to exe next instruction

To verify that parameter passing does use the stack, GDB is used to step through the program.

(GDB) B main//Set the program breakpoint at Main and sum Note:breakpoint 1 also set at PC 0x8394. Breakpoint 2 at 0x8394 (GDB) b sum Breakpoint 3 in 0x8368 (GDB) R starting program:/TMP/ASM//view register status before main runs BREAKP  Oint 1, 0x00008394 in Main () (GDB) show reg Undefined Show command: "Reg".
Try "Help Show". (GDB) Info reg r0 0x1 1 r1 0xbedb3e94 3202039444 R2 0xbedb3e9c 320203 9452 R3 0x0 0 ... sp 0xbedb3d30 0xbedb3d30 lr 0x4003b004 107398349       2 pc 0x8394 0x8394 <main+16> fps 0x1001000 16781312 cpsr 0x60000010
1610612752 (GDB) n single stepping until exit from function Main, which has no line number information.             See the Register status before you run to sum, and you can see that at this point r0-r1 corresponds to 1-4 breakpoint 3, 0x00008368 in sum () (GDB) Info reg r0 0x1 1 r1 0x2 2 R2 0x3 3 R3 0x4 4 ... sp 0xbedb3d30 0xbedb3d30 LR 0x83b8 33720 pc 0x8368 0x8368 <sum> fps 0x1001000 16781312 cps R 0x60000010 1610612752 (GDB) X/2 $sp//View the data in the stack, the stack top data is 5, which is also the parameter 6 first into the stack 0xbedb3d30:5 6 (GDB) c sum is:
 21st

5.4. Return value and register

subroutine result return rule: When the result is a 32-bit integer, it is returned through the register r0, and the result is a 64-bit integer, which is returned through the register R0,R1. When the result is a floating-point number, it can be returned by the register F0, D0, or s0 of the floating-point operator, and when the result is a floating-point number (such as a complex number) of a compound, it can be returned through registers F0~FN or D0~DN. For more digits, the result needs to be passed in memory.

5.5. Inline assembly through the ATPCS can achieve C language and assembly language code of each other call, but if the assembly code is less, or need to read in C language or set register value, then need to inline assembly. The assembly instruction embedded in C contains most of the arm and thumb instructions, but the use of the instructions in the assembly file is somewhat different, there are some limitations: not directly write PC Register, program jump to use B or BL instructions. When using physical registers, do not use overly complex C expressions to avoid physical register conflicts. You cannot refer directly to a variable in the C language. The ARM gcc inline assembler syntax is as follows:

__asm__ (code:output operand list:input operand list:clobber list); 
/* or in Chinese
/__asm__ (assembly statement Template: Output part: Input part: Modified part)
A total of four parts: Assembly statement Template, the output part, the input part, the modification part, each part uses ":" The lattice opens, the assembly statement template is essential, the other three parts are optional, if uses the later part, but the front part is empty, also needs to use ":" The lattice opens, the corresponding partial content is empty. For example:
__asm__ __volatile__ ("CLI"::: "Memory")
5.5.1. Assembly statement TemplateAssembly statement templates are composed of assembly statement sequences, which are separated by ";", "\ n", "\ \ \ \ \ \ \" or "\n\t" between statements. It can also be why beautiful and read the statement sequence of branches written.
__asm__ ("mov r0, r0; mov r0, r0 "); /* Split by ";" * *
__asm__ ("mov r0, R0\nmov r0, R0");/* Split by "\ n"/
...
__asm__ ("mov r0, r0\n"/  * Multiple lines *
        "mov r0, r0"); 
The operands in the directive can use placeholders to refer to the C language variable, with a maximum of 10 operands, with the following name:%0,%1-%9.
"mov% 0,%1"
The operand represented by a placeholder in a directive is always treated as a long (4 bytes), but the operation imposed on it can be either word or byte, and the default is either low or low byte when the operand is used as a word or byte. A byte operation can explicitly indicate whether it is a low byte or a secondary byte. The method is to insert a letter between the% and the ordinal number, "B" for the low Byte, and "H" to represent the high byte, for example:%H1. 5.5.2. Output partThe output section describes the output operand, the different operand descriptors are separated by commas, and each operand descriptor consists of a qualifying string and a C language variable. The qualifying string for each output operand must contain "=" to indicate that he is an output operand.
"=r" (Result)/  * ' R ' is Constraint and result is a C variable * *
5.5.3. Input partThe input section describes the input operand, the different operand descriptors are separated by commas, and each operand descriptor consists of a qualifying string and a C language expression or a C-language variable.
"R" (value)/   * ' R ' Constraint and value is a C variable * *
5.5.4. Modify PartModify section (Modify): This section is often "memory" as a constraint to indicate that the contents of the memory has changed after the operation is completed, if the original contents of a register from memory, then the contents of this unit in memory has changed. The Linux system memory barrier is implemented according to this constraint condition, which organizes the compiler to change the order of operation before and after the memory barrier.
Include/linux/compiler-gcc.h
#define BARRIER () __asm__ __volatile__ ("::: Memory")
A complete example looks like this: It moves the value of the input variable into the output variable and implements the memory barrier. __VOLATILE__ can prevent the compiler from optimizing assembly code.
/*%0 refers to ' =r ' (input) and%1 refers to ' R ' (output) */
__asm__ __volatile__ ("mov%0,%1": "=r" (output): "R" (input): "Memory");
5.5.5. Restricted characters

Their role is to instruct the compiler how to handle the relationship between the subsequent C language variables and the instruction operands. You should know that each assembly instruction accepts only certain types of operands. For example, the jump target address that the hop instruction expects. Not all memory addresses are valid. Because the final opcode only accepts 24-bit offsets. However, it is contradictory that both the jump instruction and the data exchange instruction want to store the 32-bit target address in the register. In all cases, C is probably passed to operand as a function pointer. So in the face of constants, pointers, and variables passed to the inline assembler, the compiler must know how to organize into assembly code.

For the processor of arm cores, GCC 4 provides the following restrictions. Some of them are common to different architectures, others are not. This can refer to the GCC official manual constraints related chapters.

table 10. Inline Assembler Constraint table

Restriction characters meaning
M Memory variables
O The operand is a memory variable, but the address is an offset type, so there is a size limit.
For ARM, the address must be within the +-32MB range of the current PC
V The operand is a memory variable, but the addressing method is not an offset type and belongs to M but does not belong to O.
R Put the input variable into the universal register
I A power exponent of 2 between 0-255?


5.6. Sandbox

table 11. Memory Hierarchy

Memory Type located where [a]
CPU Registers is located in the CPU execution unit.

[A] where it is located.


List item content, you can use the Para, Formalpara, etc. list item contents ...

[2] Most of its manifestation is that the assembler uses SVC to implement system calls and B/BL instructions to invoke functions in C language, and invoke assembler subroutine in C language.

[3] Thumb is not the focus of the discussion, then will deliberately ignore the exposition of it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.