ARM architecture and programming notes

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic concept of memory ing

The address generated by the ARM processor is called a virtual address. The method for converting the virtual address to another physical address according to certain rules is called address ing. This physical address represents the location of the accessed storage. It is an address range that can write program code.

Necessity of memory ing Control

To allow programs running in different storage spaces to control exceptions. You can use the memory ing control to remap the abnormal vector table in different buckets to the fixed address 0x00 ~ 0x3f to control the source of the abnormal vector table.

APCs specifies the basic rules for subroutine calls. These rules include the register, data stack usage rules, and parameter transfer rules during the subroutine call process.

Abnormal vector table

For each exception event, there is a corresponding processing program, which is associated and stored in a fixed unit of storage in the format of a one-dimensional table. This table specifies the ing between each exception interrupt and its processing program. It is called an exception vector table.

Response Process of the ARM processor to abnormal interruptions

(1) Save the current status of the processor, the interrupt shielding bit, and the flag of each condition.

(2) set the corresponding bit in the CPSR of the current program status register.

(3) set the registers lr_mode to the return address.

(4) set the program counter value (PC) to the interrupt vector address of the interrupt, so as to jump to the corresponding abnormal Interrupt Processing Program for execution.

Abnormal Response Process

In addition to the reset exception, when the exception occurs, the ARM core executes the current command as much as possible and then automatically performs the following actions:

(1) Save the return address to LR _ <mode>

(2) Copy CPSR to spsr _ <mode>.

(3) set CPSR to enter the corresponding processor mode.

(4) set a 7th-bit CPSR to disable IRQ. If exceptions occur, they are fast interrupted and reset. Then, set the 6th-bit CPSR to prevent fast interruption.

(5) assign a vector address value to the PC.

Exception

1. save the address of the next command in the appropriate LR; 2. copy the CPSR to the appropriate spsr. 3. set the CPSR pattern bit to the value corresponding to the exception type. 4. force the PC to specify from the relevant exception vector.

Exit exception

1. save the value in LR (R14) minus the offset to the PC. The offset varies according to the exception type. 2. copy the spsr value back to CPSR; 3. clears the interruption prohibition mark at the entry location.

ARM processor exception is divided① Data abort ② fast interrupt request ③ General interrupt request ④ prefetch refers to abort ⑤ software interrupt 6 reset 7 undefined instruction exception 7.

In the exception vector table, the program jumps to use the LDR command instead of the B command. Cause:

1. The LDR command can jump to the full address range, while the B command can only jump to the front and back 32 MB range;

2. The chip has the remap function. When the vector table is in internal RAM or external memory, the B command cannot be used to jump to the correct position.

Small-end storage system:

In the small-end format, high numbers are stored in the high-byte format. Therefore, the memory system bytes 0 are connected to the data line 7 ~ 0 (low position alignment ).

Large-end storage system:

In the large-end format, high numbers are stored in low-end bytes. Therefore, the memory system bytes 0 are connected to the data line 31 ~ 24 (high alignment ).

Stmfd SP !, {LR}; Save the data stack ldmfd SP !, {PC}; Recovery

Reset: When the nreset signal becomes high again, the ARM processor performs the following operations:

1. force M [] in CPSR to change to b10011 (Management Mode); 2. set the I and F bits in CPSR; 3. clears the T-bit in CPSR. 4. force the PC to start pointing to the next command from the address 0x00; 5. return to the arm status and resume execution.

R14 (LR) registers and subroutine calls

1. program a calls program B During execution;

2. The program jumps to the label lable and executes program B. Meanwhile, the hardware saves the address of the next instruction of the "BL lable" command to R14 (LR );

3. Execute Program B, put the content of R14 Register into PC, and return program;

There are 7 working modes for ARM processors:

① User mode: the non-Permission mode, that is, the normal program execution mode. Most tasks are executed in this mode. In user mode, if no exception occurs, applications are not allowed to change the working mode of the processor on their own. If an exception occurs, the processor automatically switches the working mode.

② FIQ mode: Also known as the fast interrupt mode. It supports high-speed data transmission and channel processing. When a high-priority (FAST) interrupt occurs, it enters this mode.

③ IRQ mode: it also becomes a normal interrupt mode. When a low-priority (normal) interrupt is generated, it enters this mode. In this mode, there are two types of Interrupt processors: vector interrupt and non-vector interrupt. Generally, interruptions are performed in IRQ mode.

④ SVC mode: management mode, which is an operating system protection mode. The processor enters this mode when the reset or Soft Interrupt command is executed.

⑤ Stop mode: When an access exception occurs, the stop mode is enabled to handle memory faults and implement virtual storage or storage protection.

⑥ Undefined command exception mode: This mode is entered when undefined commands are executed. It is mainly used to handle undefined command traps and supports software simulation of hardware coprocessor, because undefined commands mostly occur in coprocessor operations.

7. System Mode: uses the privileged mode of the same register group as the user mode to run privileged operating system tasks.

Difference: privileged mode-a program can access all system resources or switch the processor mode at will.

ARM9. it adopts a 5-level pipeline: Finger fetch, decoding, execution, memory access, and register write-back.

· Read: read the instruction from the instruction cache.

· Decoding: decodes commands to identify which register to operate and read the operands from General registers.

· Execute: Perform ALU operations and shift operations. If it is a command for memory operations, calculate the memory address to be accessed in Alu.

· Memory Access: If the command is used to access the memory, it is used to buffer data (through data cache). If the command is not used to access the memory, the current pipeline is an empty clock cycle.

· Register write-back: Write the command operation or operation result back to the target register.

CISC: Complex Instruction Set (Complex Instruction Set Computer)

With a large number of commands and addressing methods, the instruction length is variable; 8/2 principle: 80% of programs use only 20% of commands; most programs can run only a small number of commands.

Balanced CED instruction setcomputer)

Only contains the most useful commands, and the command length is fixed; ensure that each command is executed quickly through the data channel; make the CPU hardware structure design easier

1. What are the differences between mov and LDR commands that send data to the target register?

MoV commands are used to transmit data from one register to another, or send a constant to one register, but cannot access the memory. The LDR command is used to read data from memory into registers.

1. The meaning of each letter in the ARM7TDMI-S? Kernel features? How many pipelines are used?

Arm {x} {y} {z} {t} {d} {m} {I} {e} {J} {f} {-s}

The letters in braces are optional. The meanings of each letter are as follows:

X -- serial number, for example, "7" in ARM7 and "9" in ARM9 ";

Y-Internal Storage Management/Protection Unit, for example, "2" in arm72 and "4" in arm94 ";

Z -- contains high-speed cache;

T-16-bit thumb Instruction Set;

D -- Support debugging on JTAG;

M -- supports arm commands for long multiplication (64-bit results), including fast multiplier;

I -- with the embedded tracing macro unit (embedded trace macro), used to set debugging hardware for breakpoints and observation points;

E -- Enhanced DSP commands (based on TDMI );

J -- Java accelerator jazelle is included. Compared with Java virtual machines, Java accelerator jazelle improves Java code speed by eight times and reduces power consumption by 80%;

F -- Vector floating point unit;

S-a comprehensive version means that the processor kernel is provided in the form of source code. This form of source code can be compiled into a form that is easy to use by EDA tools.

Kernel features:

32/16-bit arm v4t architecture ).

The 32-bit arm instruction set with the highest performance and flexibility.

Compact 16-bit thumb instruction set.

Unified bus interface, commands and data are transmitted on the 32-bit bus.

Level 3 assembly line.

32-bit arithmetic logical unit (ALU ).

Extremely small core size and low power consumption.

Coprocessor interface.

Extended debugging device: · embeddedice-RT real-time debugging unit. · JTAG interface unit. · Interfaces that are directly connected to the embedded tracing macro unit (TM.

The series kernel adopts a three-level assembly line kernel structure. The three-level assembly lines are fetch, decode, and execute)

· Metadata retrieval: Extracts commands from the memory and puts them into the instruction cache.

· Decoding: the decoding logic unit explains the instructions in the instruction cache in the previous step to tell the CPU how to operate.

· Execution: this stage includes shift operations, read General Register content, output results, and write General Register. That is, the command that has been decoded in the previous step is implemented in a logical circuit.

1. Sorting

Area text, code, readwrite entry mov r0, #100; number of cycles mov R1, #0; initialize data loop add R1, R1, R0; add data, obtain the final data subs r0, R0, #1; cyclic data R0 minus 1 CMP r0, #0; compare R0 with 0 to see whether the loop ends BNE loop; Determine whether the loop ends, to accept the request, perform the following steps: LDR R2, = Result STR R1, [R2] result DCD 0 stop B stop; Arrangement Algorithm: first compare all the data with the first one, finally, the smallest data is retrieved and placed in the first memory unit. Then, the second small data is compared from the second memory unit ,; in this way, ten pieces of data can be arranged. Area text, code, readwrite entry LDR r0, = data; the starting address of data obtained mov R1, R0 mov R5, #9; the number of loops that start with is 10 times, therefore, mov R6, R5 compare add r0, R0, #4 should start from 9; the address + 4 stored in R0 should be represented as the address sub R6, R6 of the next number to be compared, #1; reduce 1 LDR R2, [R1] for one cycle; compare LDR R3, [R0] CMP R3, R2 movcc R7, R2 with the data retrieved in the register; if the value of the following address is smaller than the previous one, the data will be exchanged for movcc R2, R3 movcc R3, R7 STR R2, [R1]; store data to the corresponding memory unit STR R3, [R0] CMP R6, #0; check whether each cycle ends BNE compare add R1, R1, #4; after each loop ends, move the initial memory address to a unit mov R0 and R1; reinitialize the address sub R5, R5, #1 stored in the register in the previous loop; after each loop, the number of times in the above loop will be reduced by 1 mov R6, R5 CMP R5, #0; Determine whether all the cycles have ended BNE compare data DCD 9, 4, 6, 7, 8, 1, 3, 2, 0, 5 stop B stop

C language code:

#include<stdio.h>  extern void strcopy(char *d, const char *s);  int main()  {      const char *srcstr="abcdefghi";      char dststr[]="ighfedcba";        strcopy(dststr,srcstr);      return 0;  }

Arm Assembly Code:

STACK_TOPEQU 0x40002000      PRESERVE8      AREA SCopy, CODE, READONLY      EXPORT START      EXPORT strcopy      import main      ENTRY  START      LDR R13,=STACK_TOP      B main  strcopy                      LDRB r2, [r1],#1      STRB r2, [r0],#1      CMP  r2, #0              BNE  strcopy              MOV  pc,lr            END

stack_top equ0x40002000PRESERVE8export copyAREA copy,CODE,READONLYimport copystrexport startENTRYstartldr r13,=stack_topldr r0,=srcldr r1,=dstBL copystrsrc    dcb "abcdefghij"dstdcb "helloworld"end

// C program # include <stdio. h> voidcopystr (char * D, char * s) {While (* D ++ = * s ++ )! = '\ 0 ');}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ARM architecture and programming notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

ARM architecture and programming notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support