Recently, I am very interested in iOS reverse engineering. Currently, iOS reverse engineering books include Hacking and Securing IOS Applications and iOS Hacker's Handbook. The Chinese books include iOS reverse engineering: analysis and practice: the programmer Nian Qian's iOS security attack and defense series english blog includes: Prateek Gianchandani's iOS Security Series blog which involves ARM assembly, but it is only widely used, the ARM assembly on iOS is not described in detail. Therefore, after a series of learning, I have a certain understanding of ARM in iOS. I plan to record several blog posts here, so I may forget to share them with you. I have limited my skills. If you have any mistakes, please kindly advise me. First, let's talk about the basic knowledge of ARM assembly. (We take ARMV7 as an example, the latest iPhone5s on the 64-bit temporarily not discussed) Basic knowledge part: First you introduce the Register: R0-R3: For function parameters and return value transfer R4-R6, R8, R10-R11: there is no special rule, that is, the General Register R7: Frame Pointer ). stack frame and link register (lr. R9: the operating system to retain R12: Also called IP (intra-procedure scratch), to make it clear to pay a little ink, see the http://blog.csdn.net/gooogleman/article/details/3529413 R13: Also called SP (stack pointer ), is the stack top pointer R14, also called LR (link register), which stores the return address of the function. R15: Also called program counter, pointing to the current instruction address. CPSR: the Current Program State Register, which stores symbols such as the condition flag and the interruption disable flag in the user State. There is also a SPSR corresponding to CPSR in other States such as system status interruptions, which will not be detailed here. In addition, registers related to VFP (vector floating point operation) are skipped. For more information, see the reference link below. Basic command: add and sub minus command str stores the register content to the stack. ldr loads the stack content into a register. w is an optional instruction width specifier. It does not affect the behavior of this command. It only ensures that 32-bit commands are generated. The detailed information of Infocenter.arm.com bl executes the function call and directs lr to the next instruction of the caller (caller), that is, the return address of the function blx is the same as above, but switches between the ARM and thumb instruction sets. Bx lr returns the call function (caller ). The following are some rules for function calling. I. in iOS, you need to use BLX and BX commands to call functions. You cannot use MOV commands (which will be explained below. ARM uses a stack to maintain function calls and responses. In ARM, stacks grow downward (from high addresses to low addresses ). Function call stack layout 1 (Reference of the apple iOS ABI Reference): Figure (1) SP (stack pointer) points to the top of the stack (stack as low as high as the address ). Stack frame is actually a block of storage space on the stack identified by R7 and the old R7 on the stack. Stack frames include the parameter area, which stores the parameters passed by the call function. For 32-bit ARM, the first four parameters are passed through the r0-r3, and excess parameters are passed through the stack, that is, stored in this area. The linkage area stores the next instruction of the caller. The stack frame pointer storage area (saved frame pointer) stores the bottom of the stack frame of the call function, marking the end of the stack frame of the caller and the callee function) stack frame. Local storage area ). It is used to store the local variables of callee and the registers to be restored before calling the function after callee is called. The register storage area (saved registers area ). This is what Apple says in its documentation. However, I think that this region is adjacent to the local storage area and what is done is also to store the register content to be restored. Therefore, I don't think we should distinguish this region from each other in terms of concept, otherwise, the register function to be restored will be split from the local storage area. Of course, these are only conceptual, but actually there is no difference. Next, let's take a look at what we need to do at the beginning and end of calling a subfunction. (The official name is the preface and conclusion, prologs and epilogs) Call start: LR in the stack R7 in the stack R7 = SP address. After the first two stack commands, the SP points to the address to move down, And then assign the SP value to R7, it indicates the end of the caller stack frame and the beginning of the callee stack frame. It will modify the registers that need to be restored when the caller stack frame is returned into the stack. Allocate stack space for subprograms. Since the stack grows from a high address to a low address, sub sp, # size is usually used for allocation. Call end: Release stack space. Add sp, # size command. Restores the saved registers. Recover R7 and pop up the previously stored LR from the stack to the PC, so that the function will return. ------------------------------------------------------------- The actual section of the gorgeous split line (1): Create a Test project using XCode and create a new project. c file, add the following functions: 1234567 # include <stdio. h> int func (int a, int B, int c, int d, int e, int f) {int g = a + B + c + d + e + f; return g;} view assembly language: in the upper left corner of XCode, select targe to compile on the real machine. In this case, the ARM assembly is generated. Otherwise, the x86 assembly is generated under the simulator. Click XCode => Product => Perform Action => Assemble file. c to generate assembly code. There are a lot of code, there are a lot of "." At the beginning of ". section", ". loc" and so on, these are required by the assembler, we don't have to worry about it. Put these ". "After adding and commenting at the beginning, the Code is as follows: 1234567891011121314_func :. cfi_startprocLfunc_begin0: add r0, r1Ltmp0: ldr. w r12, [sp] add r0, r2 ldr. w r9, [sp, #4] add r0, r3 add r0, r12 add r0, r9 bx lrLtmp2: Lfunc_end0: _ func: indicates the content of the func function. Lfunc_begin0 and Lfunc_end0 identify the start and end of the function definition. The function starts and ends with "xxx_beginx:" and "xxx_endx:". The following Code explains: add r0. r1 adds parameters a and B and then assigns the result to r0ldr. w r12, [sp] load the most parameter f from the stack to the r12 register add r0. r2 accumulates the parameter c to the r0 ldr. w r9, [sp, #4] load parameter e from Stack to r9 register add r0, r3 accumulate d accumulate to r0add r0, r12 accumulate parameter f to r0add r0, the r9 accumulate parameter e to r0, and all the six values from a to f are accumulated to the r0 register. As mentioned above, r0 stores the returned values. Bx lr: returns the call function. ----------------------------------------------------------- The actual splitting line (2): to let everyone know the changes on the stack during function calling, the following uses a function with three functions, the assembly code of the two called C code is explained as an example. Code above: 12345678910111213141516171819 # include <stdio. h> _ attribute _ (noinline) int addFunction (int a, int B, int c, int d, int e, int f) {int r = a + B + c + d + e + f; return r;} _ attribute _ (noinline) int fooFunction (int a, int B, int c, int d, int f) {int r = addFunction (a, B, c, d, f, 66); return r ;} int initFunction () {int r = fooFunction (11, 22, 33, 44, 55); return r;} because we are looking at changes in function calls and stacks Here, we add _ attribute _ (noinline) to prevent the compiler from inline functions (if you do not understand inline, google ). In the upper left corner of XCode, select targe to compile on the real machine. In this case, ARM assembly is generated. Otherwise, x86 assembly is generated under the simulator. Click XCode => Product => Perform Action => Assemble file. c to generate the assembly code, as shown in the following code: to better suit our thinking, let's start by calling the function. InitFunction: 12345678910111213141516171819_initFunction :. cfi_startprocLfunc_begin2: @ BB #0: push {r7, lr} mov r7, sp sub sp, #4 movs r0, #55 movs r1, #22Ltmp6: str r0, [sp] movs r0, #11 movs r2, #33 movs r3, #44 bl _ fooFunction add sp, #4 pop {r7, pc} Ltmp7: Lfunc_end2: another line of explanation: push {r7, lr} is the first two parts of the prologs part of the function call mentioned in the basic knowledge section, r7 is stored on the stack mov r7, sp Preface (prolog) 3. Sub sp, #4 allocate a 4-byte space on the stack to store local variables, that is, parameters. As we have said before, the r0-r3 can pass 4 parameters, but more than one can only pass through the stack. Movs r0, #55 store the immediate number 55 into r0movs r1, #22 store 22 into r1str r0, and [sp] store the r0 value to the memory pointed by the stack pointer sp. That is, the following three commands moves r0, #11 moves r2, #33 moves r3, #44 are stored in the specified register on the stack. So far, the r0-r3 has stored 4 immediate number parameters, respectively 11, 22, 33,44, and 55 on the stack. Bl _ fooFunction calls fooFunction. After the call, it jumps to fooFunction and analyzes the situation. Add sp, #44 the stack pointer moves 4 bytes up, reclaim the space allocated by the sub sp and #4 of the 3rd commands. Pop {r7, pc} restores the First Command push {r7, lr} to the value in the stack and assigns the previous lr value to the pc. Note: When you enter initFunction, lr is the next instruction to call the initFunction function. Therefore, now the value in lr is assigned to the pc program counter. In this way, the lr directs to this instruction, the function is reversed. Command 1, 2, and 3 are the prologs of the function, and command 9 and command 10 are the epilogs ). This is basically a routine. If you look more at it, you will naturally know that you don't have to stop and analyze it one by one.