iOS Advanced debug & Reverse Technology-Assembler Register call

Source: Internet
Author: User
Tags types of functions subq

Preface

In this tutorial, you can see the registers used by the CPU and explore and modify the arguments passed to the function call. You will also learn about the common Apple architecture and how to use registers in functions. This is called a schema calling convention.

It is an extremely important skill to understand how a compilation works, and how a particular schema invocation convention works. It allows you to observe and modify the arguments passed to the function without the source code. In addition, because the source has different or unknown names of variable conditions, it is sometimes better to use the assembly.

For example, suppose you always want to know the second parameter of the calling function, regardless of the name of the parameter. The assembly knowledge provides you with a good base layer to manipulate and observe the parameters in the function.

Assembly

Wait, what's the assembly?

Did you stop in a non-source function and you would see a series of memory addresses followed by some scary short commands? You hug the ball and whisper it in your ear to tell yourself you're not looking at these things? Well... These things are called compilations!

This is a backtracking image in Xcode that shows the assembler function in the simulator.

Look at the picture above, this assembly can be divided into several parts. Each line of assembly instructions contains an opcode, which can be considered a very simple computer instruction.

So what does the opcode look like? An opcode executes instructions for a simple task in the computer. For example, consider the following assembly code snippet:

pushq   %rbx  subq    $0x228, %rsp  movq    %rdi, %rbx 
    • 1
    • 2
    • 3
    • 1
    • 2
    • 3

In this assembly block, you will see three, 操作码 pushq subq and movq . Consider the actions performed by these opcode. The code behind the opcode is the label of the source and destination. These are opcode behavior items.

In the previous example, there are a series of, 寄存器 respectively, rbx rsp and rdi , after each % of which are called registers.

In addition, you can find 16 binary constants such as 0x228 . The $ constants at the back are all absolute numbers.

There is no need to know what the code is doing, because you first need to understand the registers and calling conventions of the functions.

Note: in the example above, the registers and constants are preceded by a bunch % of and $ . It's a way of expressing. However, there are two main ways of presenting the assembly. The first is the Intel assembly, the second is the AT&T assembly.

By default, the Apple Disassembly Tool Library shows the/T format. As in the example above, although this is a good format, it is certain that it is a little difficult.

x86_64 vs ARM64

As a developer of the Apple platform, when you learn to assemble, you will deal with two main assembly architectures: x86_64 Architecture and ARM64 architecture, x86_64 may be your MacOS computer architecture, unless you run on older computers. X86_64 is a kind 64-bit of architecture that means that each address can hold 64 1 and 0. In addition, the old Apple Computer uses the 32-bit architecture, but Apple has stopped producing 32-bit computers in 2010. Programs run under MacOS are compatible with 64-bit, including emulator programs. In other words, even if you are x86_64 MacOS, it can still run 32-bit programs.

If you have any doubts about the architecture of the hardware you are working on, you can run the following command at the terminal:

-m
    • 1
    • 2
    • 1
    • 2

ARM64 architecture uses on mobile devices such as the iphone to control power consumption is of paramount importance.
ARM emphasizes power protection, so it reduces the number of opcode, which helps reduce energy consumption under complex assembly instructions. This is good news for you because there are fewer instructions to learn on the ARM architecture.

Here's the same method shown earlier, this time running under the ARM64 bit assembly of iphone 7:

In their so many devices, but later all moved to the 64-bit ARM processor. 32-bit devices are almost out of date because Apple has eliminated them through a variety of iOS versions. For example, the iphone 4s is a 32-bit device that already does not support iOS 10. The only iphone 5 available in the 32-digit iphone series supports IOS 10.

Interestingly, all Apple watches are currently 32-bit. This is most likely because 32-bit ARM CPUs typically have a smaller power than their 64-bit siblings. This is important for the watch because the battery is very small.

X86_64 Register calling convention

Your CPU uses a set of registers to process the data that is running. These are storage devices, just like the memory in your computer. However, they are located on the CPU itself, very close to the CPU portion. So the CPU accesses them very quickly.

Most directives involve one or more registers and perform operations. It is like writing registers into memory, reading the contents of memory to registers, or performing arithmetic operations on two registers (plus minus, etc.).

At x64 (beginning here, x64 is the abbreviation of x86_64), a machine with 16 universal registers is used to manipulate the data.

These registers are,,,,,, RAX RBX RCX RDX RDI RSI RSP and R8 to R15 , respectively. You may not know the meaning of these names now, but you will soon explore these important registers.

When you downgrade the function in x64, this way and using registers, there are very specific conventions behind it. This determines where the function's arguments should be, and where the function's return value is when the function completes. This is important because code compiled with one compiler can use code compiled by another compiler.
For example, take a look at the following OBJECT-C code:

NSString *name = @"Zoltan";  NSLog(@"Hello world, I am %@. I‘m %d, and I live in %@.", name, 30, @"my father‘s basement"); 
    • 1
    • 2
    • 1
    • 2

It has four parameters passed to the NSLog function call, some variables are directly accessed, one parameter is defined in the local variable, and then the reference parameter is in the function. However, by compiling the code, the computer does not care about the name of the variable, it only cares about the address in memory.
The following registers are used as arguments to function calls under the x64 assembly. Try to commit these memory to memory, because in the future, you will often use these memory.

    • First parameter:RDI
    • A second parameter:RSI
    • A third parameter:RDX
    • The fourth parameter:RCD
    • The Fifth parameter:R8
    • The Sixth parameter:R9

If there are more than six parameters, the additional parameters are accessed through the stack in the function.

Returning to the OC example above, you can redefine the register like the following pseudo-code:

RDI = @"Hello world, I am %@. I‘m %d, and I live in %@.";  RSI = @"Zoltan";  RDX = 30;  RCX = @"my father‘s basement";  NSLog(RDI, RSI, RDX, RCX); 
    • 1
    • 2
    • 3
    • 4
    • 5
    • 1
    • 2
    • 3
    • 4
    • 5

When the NSLog function starts, these registers will contain the appropriate values. As shown in.

In any case, when the function prologue (function Prologue) (the start of the functions for the stack and register) is executed, the values on these registers are likely to change. Typically, the assembly overrides these values when the code does not need them, or simply discards the references.

This means that when you leave the function (via stepping over,stepping in, or stepping out), you can no longer assume that the register will retain the value you want to observe unless you actually see what the assembly code is doing.

This function call seriously affects your debug (breakpoint) policy, if you want to automate any type of interruption to explore, you should stop before the function call in order to check or modify the parameters instead of actually reaching the assembly.

Objective-c and registers

Registers use a specific calling convention. You can use the same knowledge to apply to other languages.

When OC executes the method inside, it is actually executed by a specific C function named Objc_msgsend. This actually has several different types of functions, which we'll talk about later. This is the core of message forwarding. The first argument, objc_msgsend, refers to the object that sent the message. Then there is selector, which is a simple char * Specifies the name of the function to execute on the object. Finally, the objc_msgsend uses variable parameters in the function.
Let's look at a real-world example of an IOS environment:

[UIApplication sharedApplication];
    • 1
    • 1

The compiler will turn the code into the following pseudo-code:

UIApplicationClass = [UIApplication class];  objc_msgSend(UIApplicationClass, "sharedApplication"); 
    • 1
    • 2
    • 1
    • 2

The first parameter reference is the UIApplication class, followed by the selector of the sharedapplication.

An easy way to tell a parameter is to check the colon of the selector. Each colon represents a parameter followed.

This is another OC example:

NSString *helloWorldString = [@"Can‘t Sleep; " stringByAppendingString:@"Clowns will eat me"];  
    • 1
    • 1

The compiler will turn into the following pseudo-code:

NSString *helloWorldString;  helloWorldString = objc_msgSend(@"Can‘t Sleep; ", "stringByAppendingString:", @"Clowns will eat me");  
    • 1
    • 2
    • 1
    • 2

The first argument is an instance NSString(@"Can‘t Sleep; ") , followed by a selector, and finally a parameter, as well as an NSString instance.
Using objc_msgSend knowledge, you can use the x64 register to help explore the context, which is a shortcut.

Theory to the actual

You can download the tutorial project here

In this chapter, you will use the project-provided tutorial resource bundle call Register, open the project in Xcode, and run it.

This is a fairly simple application that simply displays the contents of the x64 register. It is important to note that this application cannot display the value of the register at any given moment, it can only display the value of the register at the specified function call. means that when a function uses the value of a register to make a call, you do not see too many values for register changes.

Now you will understand the registers for the function behavior of MacOS applications and create NSViewController a viewDidLoad method symbol breakpoint. It is recommended to use "NS" instead of "UI" because you are running the Cocoa program.

Build and then return to the application, stop the first breakpoint, and enter in the LLDB console:

register read
    • 1
    • 1

A list of major registers is displayed when the execution status is paused. In any case, this information is much more. You should have the option to output registers and fix them as OC objects.

If you re-call it, -[NSViewController viewDidLoad] it will be converted to the following assembly pseudo-code:

RDI = UIViewControllerInstance  RSI = "viewDidLoad"  objc_msgSend(RDI, RSI) 
    • 1
    • 2
    • 3
    • 1
    • 2
    • 3

Remember the x64 calling convention, understanding the execution of objc_msgsend, you can find the concrete instance that is loaded NSViewController .

In the LLDB console, enter:

$rdi
    • 1
    • 1

You will get the output:

<Registers.ViewController: 0x6080000c13b0>  
    • 1
    • 1

This will output a reference hidden in the RDI register NSViewController , you know, for the function this is the first parameter.

In Lldb, the important thing is that the $ prefix is a register, so lldb know you want the value of the register, not the variable in the current source range. Yes, this is different from the assembly in disassembly view! It's kind of annoying, isn't it?

Note: When you look at the OC Stop method, you never see the objc_msgSend lldb in the backtracking, because objc_msgSend this type of function execution is jmp , or is the assembly instruction of the jump opcode. This means that the objc_msgSend action is like a jump function, but once the OC code starts to run, all the objc_msgSend history-related stacks will be optimized. This optimization is called 尾部调用优化 .

Try the output RSI register to include the called selector, and output the following in Lldb:

$rsi
    • 1
    • 1

Unfortunately, you get an invalid output message that looks like this:

140735181830794  
    • 1
    • 1

Why is that so?

OC selector is essentially char * . This means that, like all C types, LLDB does not know what style to use to present the data. As a result, you must explicitly convert to the type of data you want.

Try to convert to the correct type:

iOS Advanced debug & Reverse Technology-Assembler Register call

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.