Arm Compilation Basics (iOS reverse)

Source: Internet
Author: User

1. Arm Compilation Basics

In the reverse of a function, often need to analyze a large number of assembly code, in the reverse of iOS, ARM assembly is a must master language, this article summarizes the basic knowledge of ARM assembly, if you want to learn more, please refer to the dog God's small yellow Book "reverse engineer iOS" or arm official manual.

1.1 Registers, memory and stacks

In the ARM assembly, the operands are registers, memory and stacks
ARM's stack follows the advanced out, is full of diminishing, downward growth, that is, the opening downward, the new variable is stored to the bottom of the stack position; The closer to the bottom, the smaller the memory address
A register that is named Stackpointer holds the stack's bottom address and becomes the stack address.
You can put a variable in the stack (push) to save its value, or let it out (pop stack), restore the original value of the variable. In practice, the stack address is constantly changing, but before and after the execution of a piece of code, the stack address should be constant, or the program will be a problem,

1.2 Special-Purpose registers

Some of the registers in the ARM processor have a special purpose as shown below:

Register Device Use
R0-r3 Passing parameters and return values
R7 Frame pointer, pointing to the handover of the parent function to the called child function in the stack
R9 Reserved by the system before iOS3.0
R12 Internal procedure call memory, dynamic linker will use it
R13 SP Register
R14 LR Register, save function return address
R15 PC Register
1.3 Branch jump and conditional judgment

A register with the processor name "program Counter" (PC) is used to hold the address of the next instruction. In general, the computer executes the instructions sequentially, one after the processor executes an instruction and adds 1 to the PC, allowing it to refer to an instruction down. (1-1)
The processor sequence executes instructions 1 through 5 (2-2), but if you change the value of the PC, the order of execution of the instructions is completely different.

The order of instruction execution is disturbed, and becomes instruction 1, instruction 5, instruction 4, instruction 2, instruction 3, instruction 6, the scientific name of this disorder is called "branch", or "jump", it makes loops and subroutime possible, for example:

```// endless() 函数endless:    操作  操作数1, 操作数2    分支  endless    返回  // 死循环,执行不到这里啦!```

In practice, branches that meet certain conditions to trigger are the most practical, and this branch becomes a conditional branch. If else and while are implemented based on conditional branching, there are generally 4 conditions for branching in arm assembly:

    • -the result of the operation is 0 (or not 0);
    • -The operation result is negative;
    • -the result of the operation is rounded;
    • -Arithmetic overflow (for example, two positive numbers add up to more than register digits).

The criteria for judging these conditions (FALG) are stored in the program status REGISTER,PSR, and the data processing instructions change these flags, and the branch commands determine whether to jump based on those flags. The following pseudo-code shows a for loop

for:    相加   A,#1    比较  A,#16    不为0则跳转到for    /* 此循环将A和#16作比较,如果两者不相等,则将A加1,继续比较.     如果两者相等,则不再循环,继续往下执行. */
2. Interpretation of ARM/THUMB instructions

The instruction set used by ARM processors is divided into arm and thumb two types: arm instruction length is 32bit,thumb instruction length is 16bit. All directives can be broadly divided into three categories, namely, array manipulation instructions, memory operation instructions, and branch instructions.

2.1 Data Manipulation Instructions

The data manipulation directives have the following 2 rules:
* All the operands are 32bit;
* All results are 32bit and can only be stored in registers.
In general, the basic format for data manipulation directives is:cp{cond}{s} Rd,Rn,Op2

where "cond" and "s" are another overhanging suffix; Cond "function is to specify the command" OP "under what conditions, a total of 17 conditions:

instruction conditions
EQ The result is 0 (EQual to 0)
NE The result is not a 0 (not Equal to 0)
Cs With rounding or borrow (Carry Set)
Hs With CS (unsigned higer or same)
Cc No rounding or Borrow (Carry Clear)
LO With CC (unsigned LOwer)
MI Results less than 0 (minus)
Pl result greater than or equal to 0 (PLus)
Vs Overflow (Overflow Set)
Vc No overflow (Overflow Clear)
Hi Unsigned comparison greater than (unsigned higer)
Ls Unsigned comparison less than or equal (unsigned Lower or same)
GE Signed comparison greater than or equal (signed Greater than or Equal)
LT Signed comparison less than (signed lesser Than)
GT Signed comparison greater than (signed Greater Than)
LE Signed comparison less than or equal (signed lesser than or Equal)
AL Unconditional (always, default)

The use of "cond" is simple, for example:

R0R1R2R0R2R1

Compare the values of R0 and R1, if R0 is greater than or equal to R1, then R2 = R0; otherwise R2 = R1.
The function of "s" is to specify whether the command "OP" is set to flag, in total the following 4 flags:
N (negative)
If the result is less than 0 then set 1, otherwise 0;

Z (zero)
If the result is 0 then place 1, otherwise set 0;

C (Carry)
For the addition of operations (including CMN), if the resulting carry is set 1, or 0, for the reduction of operations (including CMP), carry equivalent to Not-borrow, if the resulting borrow is set 0, or 1, for the displacement of non-plus/minus operation, the C-shift is worth the last one; For other non-plus/minus operations, the value of C is generally the same;

V (overflow) If the operation causes an overflow, set 1, otherwise 0

It is important to note that C flag indicates whether the result of an unsigned number operation is overflow; V flag Indicates whether the result of a signed number operation is overflow.

Arithmetic operations directives can be broadly divided into 4 categories:

    • 1. Arithmetic operations

ADD R0,R1,R2; —————— > R0 = R1 + R2

ADC R0,R1,R2; —————— > R0 = R1 + R2 + C (array)

SUB R0,R1,R2; —————— > R0 = r1-r2

SBC R0,R1,R2; —————— > R0 = R1-R2-! C

RSB R0,R1,R2; —————— > R0 = r2-r1

RSC R0,R1,R2; —————— > R0 = R2-R1-! C

In arithmetic operations, add and sub are the base operations, and the others are variants of both. RSB is the abbreviation for "Reverse Sub", which simply swaps the two operands of the sub, and the variant ending with "C" represents the addition and subtraction of no carry and borrow, and when a carry or borrow is generated, the CARRRY flag is set to 1.

    • 2. Logical operation

and R0,R1,R2; —————— > R0 = R1 & R2

ORR R0,R1,R2; —————— > R0 = R1 | R2

EOR R0,R1,R2; —————— > R0 = R1 ^ R2

BIC R0,R1,R2; —————— > R0 = R1 &~ R2

MOV RO,R2; —————— > R0 = R2

MVN R0,R2; —————— > R0 = ~r2

The logical operation instructions have been described by the C operator, but the C operation character's shift operation does not have a bitwise logic operation instruction, ARM adopts bucket shift, there are four kinds of instructions:

LSL Logical left Shift

LSR Logic Right Shift

ASR Arithmetic Right Shift

ROR Loop Right Shift

    • 3. Compare operations

      CMP R1,R2; —————— > Execute R1-R2 and set flag according to results


CMN R1,R2; —————— > Execute R1 + R2 and set flag according to results

TST R1,R2; —————— > Execute R1 & R2 and set flag according to results

TEQ R1,R2; —————— > Execute R1 ^ R2 and set flag according to results

The comparison operation is actually changing the flag's arithmetic operation or logical operation, but the operation result is not kept in the register.

    • 4. Multiplication operation

MUL r4,r3,r2 —————— > R4 = R3 * R2

MLA r4,r3,r2,r1 —————— > R4 = R3 * R2 + R1

The operand of the multiplication operation must come from the Register

2.2 Memory Operation Instructions

The basic format of the memory operation directives is:

op{cond}{type} Rd,[Rn,Op2]

Where RN is a base register for storing base sites; " Cond "function is the same as the data operation instruction;" Type "Specifies the data type of the instruction" OP "operation in total of four types:

B(unsigned Byte)无符号byte(执行时扩展到32bit,以0填充);SB(signed Byte)有符号byte(仅用于LDR指令;执行时扩展到32bit,以符号位填充);H(unsigned Halfword)无符号halfword(执行时扩展到32bit,以0填充);SH(Signed Halfword)有符号halfword(仅用于LDR指令;执行时扩展到32bit,以符号位填充).

If you do not specify "type", the default is Word
The arm memory operation base instruction is only 2, and LDR (LoaD Register) reads the data out of memory and deposits it in the Register; The STR (STore Register) reads the array from the register and stores it in memory. Two instructions are used as follows:

    • LDR
LDR Rt,[Rn {,#offset}]          ;   Rt = *(Rn {+ offset}),{}代表可选#offset]!           ;   Rt = *(Rn + offset);Rn = Rn + offset#offset            ;   Rt = *Rn;Rn = Rn + offset
    • Str
STR Rt,[Rn {,#offset}]          ;   *(Rn {+ offset}) = Rt#offset]!           ;   *(Rn + offset) = Rt;   Rn = Rn + offset#offset            ;   *Rn = Rt;  Rn = Rn + offset

In addition, LDR and STR variants LDRD and STRD can also operate the double word (doubleword), that is, one-time operation of two registers, the basic format is as follows: Op{cond} rt,rt2, [Rn {, #offset}]
The usage is similar to the prototype, as follows:

    • STRD
R4,R5, [R9,#offset]    ; *(R9 + offset) = R4;*(R9 + offset + 4) = R5
    • Ldrd
R4,R5,[R9,#offset]     ; R4 = *(R9 + offset); R5 = *(R9+offset+4)

In addition to Ldr and STR, block transfers can also be performed via LDM (LoaD multiple) and STM (STore multipe) to operate multiple registers at once. The basic format for block transfer directives is

op{cond}{}mode] Rd{!},reglist

Where Rd is the base address register, optional "!" If the value of RD changes is Write Rd, Reglist is a series of registers, enclosed in curly braces, which can be split with "," or "-" to denote a range, for example, {R4-R6,R8} represents a register, R4,R5,R6,R8; The order of these registers is arranged according to their own numbers from small to large, regardless of the order in which the braces are arranged.

It is important to note thatLDM and STM operate in exactly the same direction as LDR and STR: LDM is storing memory data from Rd to Reglist, and STM is storing the values in Reglist in contiguous memory from Rd. This is especially easy to confuse.

The "Cond" function is the same as the data Operation directive. " Mode "Specifies the 4 rule in which the R4 is worth changing, as follows:

AfterBefore)每次传输前增加Rd的值DA(DecrementAfterBefore)每次传输前减少Rd的值.

What does that mean? The following is the LDM as a representative, to give a simple example, I believe you can see it. In the (Block Transport Instruction simulation environment), the value that R0 points to is 5.

After executing the following command, the value of R4,R5,R6 becomes:

R0, {R4R6};    R4 = 5, R5 = 6, R6 = 7R0, {R4R6};    R4 = 6, R5 = 7, R6 = 8R0, {R4R6};    R4 = 5, R5 = 4, R6 = 2R0, {R4R6};    R4 = 4, R5 = 3, R6 = 3

The STM directive acts in a similar manner and does not dwell on it. LDM and STM operations are completely opposite to LDR and str

2.3 Branch Instructions

Branch directives can be divided into unconditional branches and conditional branches.

    • Unconditional Branch
LabelLabelLabel4LabelBX Rd ;PC = Rd并切换指令集eg:foo():    Label ; 跳转到Label处并往下执行    ......  ; 得不到执行Label:    ......
    • Unconditional Branch

The cond of the jump branch is judged by the flag of the pinch surface, and their correspondence is as follows:

Td>ls
cond flag
EQ Z = 1
NE Z = 0
CS C = 1
HS C = 1
CC C = 0
LO C = 0
MI N = 1
PL N = 0
VS V = 1
VC V = 0
HI C = 1 & Z = 0
C = 0
GE n = V
LT N! = v
GT Z = 0 & N = V
LE Z = 1

In the conditional branch instruction the money will have a data manipulation instruction to set the flag, and the branch instruction determines the code direction based on the value of the FALG, for example:

Label:Lable1:    R0, [R1#4    R00; 如果R0 == 0,Z =1 ; 否则Z = 0    ; Z == 0则跳转
2.4 Thumb Command

The thumb instruction set is a subset of the arm instruction set, each thumb instruction is 16bit, so the thumb instruction is more space-saving than the arm instruction, and the transfer efficiency on the 16-bit data bus is higher. There will be lost, except for "B", all the thumb instructions can not be conditional execution; Bucket shift cannot be executed in conjunction with other instructions; Most thumb instructions can only use R0-R7 8 registers, and so on. The thumb command is characterized by the following with respect to arm instructions:

    • Reduced number of instructions
    • No conditional execution
    • All directives are shipped by default *
    • Bucket shift cannot be executed in conjunction with other instructions
    • Register use restricted
    • The immediate and second operands use a limited number of
    • Data writeback is not supported

Arm Assembler Basics (iOS reverse)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.