ARM instructions in the hopper of iOS reverse engineering

Last Update:2017-05-26 Source: Internet

Author: User

Tags bit set bitwise mul

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. ARM instructions in the hopper

ARM processors do not say much, the ARM processor because of low power, and so on, so most of the mobile device is basically the arm architecture of the processor. Of course, as a mobile device for Android phones, the iphone is also used with the arm architecture of the processor. If you want to know more about the iOS system and your app, then understanding the arm instruction set is essential, and the arm instruction set should also be the basis for iOS reverse engineering.

When you use Hopper to decompile, the inside is all arm instructions, it is a good look. Here is a hopper interface that uses Hopper to open Mobilenote.app. From the main window you can see all the instructions of arm, if you do not understand the arm instructions, then how to analyze it, right. So the understanding of ARM instruction is the basis of the reverse engineering of iOS. This blog today summarizes the basic instructions for the arm instruction set.

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160805112456325-1796809589. PNG "width=" 666 "height=" 375 "style=" margin:0px;padding:0px;border:none; "/>

The hopper feature is very powerful, and in hopper you can modify the arm instructions and generate a new executable file. Of course hopper powerful features can help you better understand the business logic of ARM assembly language, Hopper will generate related logic diagrams according to the arm assembly, as shown below. From the logical diagram below you can see clearly the instruction logic of the relevant arm assembly. The red line indicates that the condition is not established when the jump, blue lines indicate when the condition is set to jump.

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160805113143590-1526444553. PNG "width=" 359 "height=" 732 "style=" margin:0px;padding:0px;border:none; "/>

Hopper is powerful enough to generate the appropriate pseudo-code for the arm assembly, and if you look at the arm instruction is not intuitive, then the pseudo-code will be better for you. Below is the pseudo-code generated by the hopper according to the arm instructions, as shown below.

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160805113543731-1967981697. PNG "width=" 425 "height=" 338 "style=" margin:0px;padding:0px;border:none; "/>

Seemingly a bit of a deviation, today's theme is arm instruction set, hopper things do not do too much to repeat.

II. Overview of ARM instruction sets

ARM instruction is mainly for register, stack, memory operation. Registers are located in the CPU, the number of small speed, the arm instruction set most of the instructions are to register operation, but some instructions are stack and memory operations. Instructions for the Operation Stack, registers, and memory are described below.

1. Stack operation----Push and pop

First of all, simply talk about the concept of the stack, "stack" is a data structure, the data structure of the stack has LIFO (last in first out)----LIFO characteristics. What the stack refers to in arm is actually a piece of memory area with stack data structure. The main use of the stack in the register is worth, such as the R0 register is making it, but now there is a higher priority function to use R0, then the value of R0 pushed to the stack, and so R0 by the higher priority function after the use of the pop from the stack before the value. The stack is typically manipulated when the function is called.

The command for the stack operation is push and pop, usually in pairs, and at the beginning of the function, the value in the register to be used when the function is executed is pushed into the stack, and then at the end of the function the value from the previous push to the stack is in the pop to the appropriate register.

Below is an example of the use of push and pop. Before the function begins execution, the register R4, R5, R7, which the function is to use, and LR use push for the stack operation, LR is the address to be returned after the function executes. After the function has finished executing, use the pop command to perform the pre-stack value of the function in the pop to the appropriate register. One thing to note is that the value in the LR register pops into the PC (program Counter) register after the function ends, and the address of the command that will be executed is stored in the PC register. As a result, the function executes and then returns to the previously executed address to continue execution.

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160808161956652-272050574. PNG "width=" 421 "height=" 182 "style=" Margin:0px;padding:0px;border:none; "/>

2. Flag bits in the PC register

Here we take the 32-bit instruction as an example, the latter four bits in the PC register are the flag bits, and the 第28-31位 correspond to V (oVerflow), C (Carry), Z (Zero), N (negative) respectively. The following sections describe the States represented by these four symbols.

N (negative): If the result is a negative number, place the position.
Z (zero): position if the result is zero.
C (Carry): If there is a carry, place the position.
V (Overflow): Set the position when overflow occurs.

3. Command operators

Below are the arithmetic operations commonly used in arm instruction sets:

(1) Addition operation

ADD R0, R1, R2; R0 = R1 + R2
- The above command is relatively simple, that is, two values to add.
ADC R0, R1, R2; R0 = R1 + R2 + C (Carry)
- With the addition of carry, the ADC will add up to two operands and place the result in the destination register. The ADC uses the c--carry flag so that it can do more than 32-bit additions. Below is the assembly code of the 128-bit number for the addition operation.
- We are now going to add a 128-bit number because we are using a 32-bit register, so to store a 128-bit number, we need 4 (128/32 = 4) registers. So we assume that the first number is stored in the R0,R1,R2,R3 register, respectively, from low to high, while R4, R5, R6, and R7 store the second number. Below is an arm assembly instruction with two 128 number addition operations. We store the results in R8, R9, R10, R11 four registers. The first thing we do is to add the lowest bits of two numbers and set the C flag bit (ADDS R8, R0, R4), and then in the next bit, add the values in R1 and R5, add the last operation's carry, then set the flag bit, and so on. Thus our final value is stored in the four registers of the R8-R11.

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160808174352527-407769784. PNG "width=" height= "184" style= "margin:0px;padding:0px;border:none;"/>

(2) Subtraction operation

SUB R0, R1, r2 ; R0 = r1-r2
- This is a simple name, minus the R1 register using the value in the R2 register The values in the R0 are then stored in the.
SBC R0, R1, r2 ; R0 = R1-R2-! C
- Subtract with borrow, if our current register is 32Bit, Use the SBC borrow operation if the two-64bit numeric subtraction operation is necessary. Because when the two numeric values in the subtraction operation, if need borrow the C flag bit to clear 0 operation, so in the SBC operation need to take the C flag bit to reverse operation. Here is an example of a 1128-bit numeric subtraction. This example is similar to the ADC command above and does not have much to repeat here.

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160808183212590-700664350. PNG "width=" 338 "height=" 171 "style=" Margin:0px;padding:0px;border:none; "/>

RSB R0, R1, R2; R0 = R2-r1
- Reverse Subtraction
RSC R0, R1, R2; R0 = R2-R1-! C
- The inverse subtraction with borrow, the above two commands are similar to sub and SBC commands, are subtraction operations, but the operands are evaluated in different order.

(3), multiplication instruction

In an arm instruction set, there are two types of multiplication instructions, the first of which is Mul, and the second is the multiplicative MLA with an accumulation. Of course, these two instructions are not complicated to use.

MUL: Multiplication instruction mul{condition}{s} R0, R1, R2; R0 = R1 * R2
MLA: Multiplication additive instruction mla{conditions}{s} R0, R1, R2, R3; R0 = R1 * R2 + R3

(4), logical operation

Logical operations are better understood, and the logical operations that we use in our programming are much the same, nothing more than some of the same things as, or, non-, XOR, or these operations.

And R0, R1, R2; R0 = R1 & R2
- With operations, 1 & 1 = 1, 1 & 0 = 1, 0 & 1 = 1,0 & 0 = 0;
ORR R0, R1, R2; R0 = R1 | R2
- Or operation, 1 | 1 = 1, 1 | 0 = 1, 0 | 1 = 1, 0 | 0 = 0;
EOR R0, R1, R2; R0 = R1 ^ R2
- XOR, 1 ^ 1 = 1, 1 ^ 0 = 0, 0 ^ 1 = 0, 0 ^ 0 = 1;
BIC R0, R1, R2; R0 = R1 &~ R2
- Bit clear instruction, R2 is now reversed, and then R1 and operation. R1 & (~R2)
- Will R0 the post four bits 0: BIC R0, R0, #0x0F
MOV R0, R1; R0 = R1
- Assignment operation, assigning the value of R1 to R0
MVN R0, R1; R0 = ~r1
- Bitwise counter-action, each of the R1 to take the reverse operation, and then assign the value to R0

4. Loading and storage of registers

Sometimes we need to load the in-memory data into the register, or store the data in the register in memory, and then we will use the relevant commands for register loading and storing. These commands are summarized in one by one below.

(1) Transmission of single data

ldr{Conditions} Rd, < address >; Load data from an address into the RD register

str{Conditions} Rd, < address >, storing values in register Rd to memory in < address >

ldr{Conditions}b Rd, < address >; the memory address corresponding to the low 8-bit load to the RD register.

str{Conditions}b Rd, < address >, and 8 of the register Rd are stored in memory addresses.

LDR (Load Register): takes data out of memory and loads it into registers.
- LDR Rt, [Rn], #offset; Rt = *rn; RN = rn + offset
- LDR Rt, [Rn, #offset]! ; Rt = * (Rn + offset); RN = rn + offset

STR (Store Register): stores the data in the register into memory.
- STR RT, [Rn], #offset; *rn = RT; RN = rn + offset
- STR Rt, [Rn, #offset]! ; * (rn + offset) = RN; RN = RN + offset (address write-back)

(2), transfer two data at a time

LDRD (Load register Double): Fills two registers at a time
- Ldrd R4, R5, [R6, #offset]; R4 = * (R6 + offset); R5 = * (R6 + offset + 4)

STRD (Store Register Double): Stores two values to memory at a time
- STRD R4, R5, [R6, #offset]; * (R6 + offset) = R4; * (R6 + offset + 4) = R5

(3), block data access

LDM (Load mutiple): Loads a piece of data from the register into memory (reg list).
STM (Store multiple): loads block data from memory into registers.
LDM and STM block memory operations have a suffix, below is the four conditions, we assume that the value stored in the lower R0 Register is 0 (R0 = 6)
- such as: Ldmdb R0, {R1-R3}; R1 = 5, R2 = 4, R3 = 3
- such as: Ldmda R0, {R1-R3}; R1 = 6, R2 = 5, R3 = 4
- such as: Ldmib R0, {R1-R3}; R1 = 7, R2 = 8, R3 = 9
- such as: Ldmia R0, {R1-R3}; R1 = 6, R2 = 7, R3 = 8
- IA (Increment after): Add value after transfer,
- IB (Increment Befor): Add Value before transfer
- DA (decrement after): reduced value after transmission
- DB (decrement before): Reduced value before transfer

(4), Single data exchange: SWP

The SWP command is used to exchange registers with the memory direct value, below is the SWP instruction format:

swp{conditions}{b} Rd, Rm, [Rn]

The above command indicates that data in the memory address pointed to in the RN is loaded into Rd, and the value in the register RM is stored in the area to which the memory address points. If rd = Rm, then the in-memory value that RN points to is swapped with Rd. If the conditional suffix is added, the operation is performed when the condition is met, and the suffix B is 8 bits lower.

5. Comparison, branching and conditional directives

Branching and conditional directives are an integral part of programming, and often use branching and conditional directives when dealing with specific business logic. A branch is a jump, and a branch is used in conjunction with a condition to make a specific jump when a certain condition is met. Next, we will summarize the branch and condition directives commonly used in the arm instruction set, more specifically the conditional suffix.

(1), Comparison instruction

The comparison directives used in the arm instruction set are CMN, CMP, TEQ, TST. One thing to note is that CMN and CMP are arithmetic instructions, and Teq and TST belong to logic directives. The comparison instruction will always set the flag bit (N, Z, C, V) after execution, because the conditional suffix is based on the flag bit set to determine whether the result satisfies the condition. A detailed conditional suffix is given below. The conditional suffix can also be added after the comparison command.

CMN (Compare negative)----a negative value, CMN is the same as CMP, but he allows you to compare negative values
- CMN R0, R1; Status = R0-r1
CMP (Compare)----said that the CMP,CMN instruction is an arithmetic instruction because they do the subtraction of the operand, and the corresponding flag bit is set, but the calculation result is not recorded. CMN and CMP are arithmetic subtraction operations, so the C-carry flag is affected.
- CMP R0, R1; Status = R0-r1
The TEQ (test equivalence)----test is equivalent to TEQ the operand with an XOR (EOR) logical operation to determine whether the two operands are the same. Because Teq does an XOR operation, it does not affect the carry flag bit.
- TEQ R0, R1; Status = R0 EOR R1
TST (test bits)----test bits, using the TST command to check whether a particular bit is set. The TST hit order is actually a bitwise and (and) operation of two operands, storing the result in a flag bit. You can use TST to test specific values for some bits in the register.
- TST R0, R1; Status = R0 and R1

(2), branch instructions

The commonly used branch instruction is B, BL, BX these three instructions.

b lable; The instruction indicates that the PC is set to lable, and the PC is the next instruction to be executed, so B lable executes, then jumps to the label to execute the next command.
BL Label; Execute the instructions to set LR to PC-4 and then set the PC to lable. When the BL lable command is executed, the current BL command is stored in the PC, and PC-4 is the address of the previous instruction, assigning PC-4 to LR, which is the address to be returned when the jump is executed. If BL is adding some conditions, then the bl{condition} can be cycled.
BX Rd; This instruction indicates that RD is assigned to the PC and then switches the instruction set (such as switching from the arm instruction set to the thumb instruction set).

(3), conditional suffix

The above-mentioned branch instruction and conditional suffix can play its powerful function and function, and this part of the explanation is our conditional suffix. Conditional suffixes cannot be used alone, used in conjunction with other commands, and then do something based on the result of the condition. Below are all the conditional suffixes, whether the condition is set according to the NZCV of the four flag bits to judge, because we compare some values, will set the corresponding flag bit. Then we can use these flags to determine if the condition is true. NZCV is the number of flags we mentioned before, Z (whether 0), C (whether carry), N (whether negative), V (whether overflow) four standard bits to judge.

Eq:equal equals, (Z = 1)
Ne:not Equal Not equal to (Z = 0)
Cs:carry Set has carry (C = 1)
HS: (unsigned higher Or same) with CS (C = 1)
CC: (Carry Clear) No rounding (C = 0)
LO: (unsigned Lower) with cc (C = 0)
MI: (minus) result less than 0 (N = 1)
PL: (Plus) result greater than or equal to 0 (N = 0)
VS: (OVerflow Set) Overflow (V = 1)
VC: (OVerflow Clear) No overflow (V = 0)
HI: (unsigned higher) unsigned comparison, greater than (C = 1 & Z = 0)
LS: (unsigned Lower or same) unsigned comparison, less than or equal (C = 0 & Z = 1)
GE: (Signed Greater than or Equal) signed comparison, greater than or equal to (N = V)
LT: (Signed less Than) signed comparison, smaller than (N! = V)
GT: (Signed Greater Than) signed comparison, greater than (Z = 0 & N = V)
LE: (Signed less Than or Equal) has a signed comparison, which is smaller than or equal (Z = 1 | N! = V)
AL: (always) unconditional, default value
NV: (never) never executed

6. Shift operation (LSL, ASL, LSR, ASR, ROR, RRX)

The shift operation is not used as a separate command in an arm instruction set, it is a field in the instruction format. The next step is to introduce a variety of shift operations. If you have learned the "digital circuit" course before, then you are certainly not unfamiliar with these shifting operations.

(1), LSL----logical left shift (Logical shift ieft) and ASL----arithmetic left (arithmetic shift)

The logical left shift is the same as the arithmetic left shift, which shifts the operand to the left, the low 0, and the removed high to discard. Let's look at an example to see how LSL or ASL works in this example.

MOV R0, #5

MOV R1, R0, LSL #2

The above command is to store 5 on the R0 register (R0 = 5) and then transfer the R0 logic left 2 bits to the R1 register. The binary value of decimal 5 is 0101, and the logical left 2 bit is 0001_0100, which is 20 in decimal. In fact, there is no logical left 1 bit is equivalent to the original value to multiply 2 operation, 5 logical left 2 bit is actually 5 x 2^2 = 20. Below is the schematic diagram of the operation

650) this.width=650; "Src=" http://images2015.cnblogs.com/blog/545446/201608/545446-20160810171641324-1306951179. PNG "width=" 325 "height=" 161 "style=" Margin:0px;padding:0px;border:none; "/>

(2), LSR----logical right SHIFT (Logical shift)

The logical right shift is relative to the logical left shift, and the logical right shift is actually shifted to the right, and the left side is 0. Usage is similar to LSL, so do not repeat it here.

(3), ASR----arithmetic right shift (arithmetic)

ASR is similar to LSR, except that the high-level of the LSR is 0, while the ASR's high-fill sign bit. The sign bit is 1, then the complement 1, the sign bit is 0 then the complement 0.

(4), ROR----loop right shift (Rotate)

Loop right shift, see known meaning, is circulating to the right to move, the right to remove the bit to fill the high.

ARM instructions in the hopper of iOS reverse engineering

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More