[Go] iOS development classmate's arm64 compilation Getting Started

Source: Internet
Author: User

When it comes to locating certain crash problems, sometimes it's weird to encounter some problems. Sometimes it hangs in the system library. The problem of locating crash at this time is often a headache. So this time to learn some assembly knowledge, the use of assembly debugging techniques for debugging may have an unexpected effect.

Learning assembly language is not just about helping to locate crash, but learning to assemble can help you really understand the computer. After all, the CPU is running the corresponding instruction set.

0x1 Tools

We face either the source code or the binary. So we need some disassembly tools to help us with the assembly code review. Recommended tools are: –hopper disassembler fee application, see assembly code very convenient –machoview open source tools, see Mach-o file structure is very convenient.

0x2 Basic Concepts

Transition from high-level language to assembly language, it is important to transform the basic concepts. The compilation of three important concepts to learn, I think is registers, stacks, instructions. ARM64 architecture is divided into 2 kinds of execution states: AArch64 Application Level and AArch32 Application Level , this paper only talk about AArch64.

0x21 Register

If you don't know what a register is, we recommend Google first. It is no longer detailed here that registers are high-speed storage units in the CPU, much faster than in-memory access.

Here's a description of what registers arm64 have:

    • R0–r30

r0 - r30is a 31 generic shaping register. Each register can access a number of 64-bit sizes. When access is used x0 - x30 , it is a 64-bit number. When w0 - w30 access is used, the lower 32 bits of these registers are accessed,

In fact, the General Register has 32, the 32nd Register x31, in the instruction code, so used to do, that is zero register ZR , XZR/WZR representing 64/32 bits respectively, zero register the role is 0, written in to represent the discard results, take out is 0.

It r29 is also called fp (frame pointer). It r30 is also called lr (link register). Its purpose is described in the next section, "Stack."

    • Sp

The SP register is actually x31, which is used SP/WSP to access the SP registers in the instruction encoding.

    • Pc

The address of the currently executing instruction is stored in the PC register. In arm64, the software is not able to overwrite the PC registers.

    • V0–v31

V0 - V31is a vector register, or it can be said to be a floating-point register. It is characterized by a 128-bit size for each register. Bn Hn Sn Dn Qndifferent bits can be accessed in the same way as each other.

Bn Hn Sn Dn QnCan understand this memory, based on a word is 32 bits, that is, 4Byte size:

Bn: The size of a byte
Hn:half Word. is 16-bit
Sn:single Word. 32 Guests
Dn:double Word. 64 guests
Qn:quad Word. 128 Guests

    • SPRs

SPRs is a status register that is used to store some status identifiers in the running of the program. is different from the if else in the programming language. In the assembly it is necessary to control the execution of the branch according to some state in the State register. State registers are also divided into The Current Program Status Register (CPSR) and The Saved Program Status Registers (SPSRs) . Generally used CPSR , when an exception occurs, it is CPSR deposited SPSR . When the exception resumes, copy back again CPSR .

There are also system registers, as well as the FPSR FPCR status registers for floating-point operations. Basic knowledge of the above registers is possible.

0x22 Stack

The stack is the memory space where the temporary variables are stored when the instruction executes. It is important to understand the structure of the stack during the process of learning the assembly code.

The features of some stacks are listed first:

    • The stack is from high address to low address, stack low is high address, stack top is low address.
    • fpPoint to the bottom of the current frame, which is the high address.
    • spPoint to the top of the stack, which is the low address.

The following diagram provides a simple description of how the stack is divided when invoking method B from method A:

The 3-line assembly code is the first three-line assembly instruction for method B. The things they do are the things depicted in the figure (X29 is the FP, x30 is LR):

    • Where it will be saved. That is, the fp, lr sp - 0x10 position in the diagram. --> fp_B Then set the SP tosp-0x10
    • will be fp set to current sp . --> fp_Bthat's the position. This step is set to the _funcB FP
    • will be sp set to sp - 0x30 . That will sp point to the position in the diagram. --> sp_B

Note: lr is the link register value in which the next line of the last line of instructions for the execution of the method is saved _funcA . It also has a good understanding of how to return to execution when _funcB execution is done _funcA , but how does the computer know where to go? It is by lr recording the returned address that the method can return normally.

Say here, then, when the _funcB execution is complete, how to restore the stack to _funcA the process? _funcBthe last 3 instructions we analyzed directly:

123)45   
 mov sp , fp//SP is set to FP, which is the location of-->fp_b in the diagram ldp fplr[sp #0x10//reads 2 64-bit from the address pointed to by the SP, respectively, into FP,LR. Then the SP + = 0x10//after the implementation of this step, the FP executed the figure-->fp_a. LR reverts to the return address of the _funca. SP points to-->sp_a. //this time the state has been fully restored to the _funca environment ret//return instruction, this step executes the instruction of LR directly.  

The above describes how the method is called. We know that in programming languages there are parameters in the method, and there are return values. How does it manifest in the assembly?

    • In general arm64 the first 8 parameters of the method are stored on the x0–x7 respectively
    • If the number of parameters exceeds 8, the extra arguments will be present on the stack, and the new method will be read through the stack.
    • The return value of the method is generally on x0.
    • If the method return value is a large data structure, the result will exist on the address where the X8 executes.
0x23 directive

We have seen a number of directives in the previous level of content. In addition to the number of assembly instructions, its basic principles are relatively simple, a single handle out of an instruction is very easy to operate. For example, it mov 's a value assignment. ldris a value.

What kinds of assembly instructions can be divided into? I think I can read the assembly code normally by knowing the following basic instructions.

0x231 arithmetic
    • Arithmetic operations

Arithmetic operations are like ADD SUB MUL ... The subtraction operation is also a good command to understand.
Such as:

123)45   
 add x0 , x1x2//put x1 + x2 = x0 such an operation. sub sp sp Span class= "MH" >0x30//the sp-30 into SP. cmp x11# 4//equivalent to Subs Xzr, X11, #4.  //if x11-4 = = 0, then the status register Nzcv.z = 1 //if X11- 4 < 0, then NZCV. N = 1                 

NZCVis a number of state values that are stored in the status register, representing the state generated during the operation, where:
* N, negative condition flag, generally represents a negative result of the operation
* Z, zero condition flag, operation result is 0
* C, carry condition flag, unsigned operation with overflow, c=1.
* V, OVerflow condition Flag signed operation has overflow, v=1.

    • Logical Operation Instructions

Have LSL (logical left Shift) (logical right Shift) ( LSR ASR arithmetic right shift) ROR (loop right).
Have ( AND and) (or) ( ORR EOR xor)

Logical displacement operations can also be used in conjunction with arithmetic operations, such as:

1
 add  x14, x4, x27, lsl #1; // 意思是把 (x27 << 1) + x4 = x14;
    • Expand the number of bits operations

There are zero extend (high-level complement 0) and sign extend (high-fill and sign-bit consistent, usually signed number with this). Generally used to make up the number of digits. is often combined with arithmetic operations.
Such as:

1
add        w20, w30, w20, uxth // 取 w20的低16位,无符号补齐到32位后再进行 w30 + w20的运算。
    • Mov
0x232 addressing

Since it is related to memory, it is two kinds, a kind of deposit, a kind of fetching. Generally speaking
L The basic is to take the value of the instruction, such as LDR LDP;
The basic of S is the stored value instruction, such as STR STP;

Cases:

123)45   
LdrX0,[X1];Take a 64-bit size number from the address pointed to by ' X1 ' to deposit ' x0 'LdpX1,X2,[X10,#0x10];Remove 2 64-bit numbers from the address pointed to by X10 + 0x10, respectively, into X1, x2StrX5,[Sp#24; //The value of X5 (64-bit value) to sp+24 point to the memory address stp x29x30[sp#-16]! Span class= "C1" >//save x29, x30 value to sp-16 address, and sp-=16. ldp x29x30, [sp "#16 Span class= "C1" >//remove the byte data from the SP address, respectively, into x29, x30. then sp+=16;                 

The format of the addressing is divided into the following 3 types:

123 
[x10, #0x10] // signed offset。 意思是从 x10 + 0x10的地址取值[sp, #-16]! // pre-index。 意思是从 sp-16地址取值,取值完后在把 sp-16 writeback 回 sp[sp], #16 // post-index。 意思是从 sp 地址取值,取值完后在把 sp+16 writeback 回 sp
0x233 Jump

Jump atmosphere has a return jump BL and no return jump B . There is the meaning that the return means to be saved lr , and therefore BL L can be understood as well LR .

1. Saving LR also means that you can return to this method to continue execution. Typically used for direct invocation of different methods
2. B the relevant jump does not LR , generally is the method within the jump, such as the while loop, and if else so on.

Jump-related instructions also have a logical operation, that is condition code . The status indicator in the Mate status register is if else the key to the implementation of the Code branch.
condition codeWith these, the table also labels the values that are more than NZCV, respectively:

Such as:

123)45   
cmp x2#0 Span class= "C1" >//x2-0 = 0. The status register identifies zero:pstate. NZCV. Z = 1b. NE 0x1000d48f0//ne is a condition code, which means when judging the status register NZCV. Z! = 1 only jumps, so this sentence will not jump 0x1000d4ab0 bl testfunca//Jump method, this time LR is set to 0x1000d4ab40X1000D4AB4 orr x8xzr# 0x1f00000000 //Testfunca after execution, it's thoughtful to jump back to LR.          
0x4 Summary

This paper briefly introduces some arm64 knowledge, and the learning of arm64 compilation has a lot of advantages in understanding the execution of iOS code and the operation of the computer. We use the assembly knowledge in the daily life to locate some crash problems of the incurable diseases. Can be from the assembly principle to open a brain hole, play some black technology. such as Bao body, static scanning and so on.

The execution of the assembly instructions is simple, not as we debug the other code at the glance, some weird problems, and the compilation of each instruction results are determined, from this point of view to locate the problem can often be targeted to the root cause.

In the world of assembly instruction execution, you can have a deeper understanding of code execution, and the original line of code will be broken down into so many instructions! Therefore, if you after reading this article for the study of the Assembly has an interest, but there are a lot of details are not quite understand, suggest yourself with the hopper anti-compilation of some code, oneself try to understand the meaning of each command, the basic understanding of a few methods can be integrated through the.

0x5 Reference
    • Armv8-a Architecture–arm

[Go] iOS development classmate's arm64 compilation Getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.