When we build a program with Xcode, the source files (. m and. h) files are converted to an executable file. The bytecode contained in this executable will be executed by the CPU (the Intel processor on the ARM processor in the IOS device or on the MAC).
This article describes what the compiler has done in the process above, while delving into what's inside of the executable file. In fact, things are much more than we saw at first sight.
Here we put the Xcode aside and use the command-line tool (command-line tools). When we build a program with Xcode, Xcode simply invokes a series of tools. Florian a more detailed discussion of how tool calls work. In this article we will call these tools directly and see what they have done.
Hopefully this article will help you better understand how an executable file (also called Mach-o executable) in IOS or OS X is executed and how it is assembled. Xcrun
Let's take a look at some of the basics: There's a lot of use for a command-line tool called Xcrun. It may seem a bit strange, but it's very good. This gadget is used to invoke other tools. Previously, we executed the following command at the terminal:
% Clang-v
Now we use the following command instead:
% Xcrun clang-v
What Xcrun do here is to navigate to the clang and execute it with the input clang following the arguments.
Why should we do that? Doesn't seem to make any sense. But Xcode allows us to: (1) Use multiple versions of Xcode and use tools from a particular Xcode version. (2) Use different tools for a particular SDK (software Development Kit). If you have Xcode 4.5 and Xcode 5, Xcode-select and Xcrun can choose to use the IOS SDK tool in Xcode 5 or the OS X tool in Xcode 4.5. In many other platforms, this is not possible. See Xcrun and Xcode-select's home page for more information. Instead of installing command line tools, you can use the developer tools in the command-line. don't use the IDE's Hello world
Back to the terminal (Terminal), create a folder that contains a C file:
% mkdir ~/desktop/objcio-command-line
% CD!$
% Touch helloworld.c
Then use your favorite text editor to edit the file--for example, Textedit.app:
% OPEN-E HELLOWORLD.C
Enter the following code:
#include <stdio.h>
int main (int argc, char *argv[])
{
printf ("Hello world!\n");
return 0;
}
Save and return to the terminal, and then run the following command:
% xcrun clang helloworld.c
%./a.out
Now you can see the familiar Hello world! on the terminal. Here we compile and run the C program, not using the IDE throughout. Take a deep breath and be happy.
What did we do up there? We compile the helloworld.c into a mach-o binary file called A.out. Note that if we do not specify a name, the compiler assigns it as a.out by default.
How is this binary file generated? There is actually a lot of content that needs to be observed and understood. Let's look at the compiler first. Hello World and compilers
The compiler in Xcode currently chooses to use Clang (read as/klæŋ/) by default. Chris wrote a more detailed article about the compiler.
Simply put, in the compiler process, HELLOWORLD.C is treated as an input file and an executable file is generated a.out. This process has multiple steps/stages. What we need to do is to execute them correctly. The expansion #include of a preprocessing symbolic (tokenization) macro definition translate the symbolic content into a parse tree (parse tree) parse trees to do semantic analysis output an abstract syntax tree (Ab Stract Syntax tree* (AST)) generates code and optimization converts the AST to a lower intermediate code (LLVM IR) optimizes the generated intermediate code to generate specific target code output assembly code assembler converts assembly code to mesh The target object file. The linker merges multiple target object files into one executable file (or a dynamic library)
Let's look at a simple example of these steps. pretreatment
The first thing the compiler needs to do in compiling is to process the file. After preprocessing, if we stop the compilation process, we can let the compiler show some of the preprocessing content:
% Xcrun clang-e HELLOWORLD.C
Oh The above command output has 413 lines of content. We use the editor to open the content and see what happens:
% Xcrun Clang-e HELLOWORLD.C | Open-f
Many of the line statements you can see at the top are started with a # (read as hash). These statements, called line tags, tell us where the following content comes from. If you look back at the helloworld.c file, you'll find that the first line is:
#include <stdio.h>
We've all used #include and import. What they do is tell the preprocessor to insert the contents of the file stdio.h into the location of the #include statement. This is a recursive process: Stdio.h may contain other files.
Because of the many recursive inserts, we need to be sure to remember the relevant line number information. To ensure that it is correct, the preprocessor inserts a line mark at the beginning of the change where it occurs. The number following the # is the line number in the source file, and the last number is the line number in the new file. Go back to the file you just opened, followed by the system header file, or a file that is considered to encapsulate the extern "C" code block.
If you scroll to the end of the file, you can see our HELLOWORLD.C code:
# 2 "HELLOWORLD.C" 2
int main (int argc, char *argv[])
{
printf ("Hello world!\n");
return 0;
}
In Xcode, you can view the preprocessing results of any file in this way: Product-> perform Action-> preprocess. Note that it takes some time for the editor to load preprocessed files--close to 100,000 lines of code. compiling
Next: Profiling and code generation. We can use the following command to let clang output assembly code:
% Xcrun Clang-s-O-HELLOWORLD.C | Open-f
Let's take a look at the results of the output. First you will see there are some points. The beginning of the line. These are assembly instructions. The other is the actual x86_64 assembly code. Finally, there are some tags (label), similar to the C language.
Let's look at the first three lines first:
. Section __text,__text,regular,pure_instructions
. Globl _main
. Align 4, 0x90
These three lines are assembly instructions, not assembly code. The section instruction specifies which segment is to be executed next.
The. Globl instruction in the second line shows that _main is an external symbol. This is our main () function. This function is visible to the outside of the binary file because the system will call it to run the executable file.
The. Align directive indicates the alignment of the following code. In our code, the following code is aligned according to the 16 (2^4) byte, and, if necessary, padded with 0x90.
Next is the head of the main function:
_main: # # @main
. Cfi_startproc
# bb#0:
pushq%rbp Ltmp2
:
. Cfi_def_cfa_offset
Ltmp3:
. Cfi_offset%rbp, -16
movq %rsp,%RBP Ltmp4
:
. Cfi_def_cfa_register
%RBP SUBQ $32,%RSP
There are some tags in the code above that are the same as the C tag work mechanism. They are symbolic links to the assembly code for some particular part. The first is the address where the _main function really starts. This symbol will be export. Binary files will have a reference to this location.
The. Cfi_startproc directives are typically used at the beginning of a function. CFI is the abbreviation for the call frame information (called information). This call frame corresponds to a function in a loose way. When a developer uses debugger and step in or step out, it is actually stepping in/out a call frame. In the C code, the function has its own call frame, and of course something else will have a similar call frame. The CFI_STARTPROC directive gives the function A. Eh_frame entry, which contains information about the call stack (which is also used to expand the call frame stack when an exception is thrown). This command will also send a number of instructions to the CFI that are relevant to the specific platform. It matches the back of the. Cfi_endproc to mark where the main () function ends.
Next is another label # # bb#0:. Then, finally, see the first sentence assembly code: Pushq%RBP. Starting from here things are starting to get interesting. On OS x, we have x86_64 code, and for this architecture there is something called the ABI (applying the binary interface Application binary interface), and the ABI specifies how the function call works at the assembly code level. During a function call, the ABI makes the RBP register (the underlying pointer register base pointer register) protected. When the function call returns, make sure that the value of the RBP register is the same as before, which is the responsibility of the main function. Pushq%RBP pushes the RBP value into the stack so that we can pop it out later.
The next two CFI directives:. Cfi_def_cfa_offset 16 and. Cfi_offset%RBP,-16. This will output some information about the build call stack unwind and debug. We have changed the stack and the underlying pointers, and these two instructions can tell the compiler where they are, or, more specifically, they can make sure that when the debugger uses this information, it can find the corresponding object.
Next, Movq%rsp,%RBP will place the local variables on the stack. Subq $32,%RSP moves the stack pointer 32 bytes, which is where the function will call. We first store the old stack pointer in the RBP and then use this as the base address for our local variables, and then we update the stack pointer to where we'll be using it.
After that, we called printf ():
Leaq l_.str (%rip),%rax
movl $, -4 (%RBP)
movl%edi , -8 (%RBP)
movq%rsi , -16 (%RBP)
movq %rax,%rdi
movb $,%al callq _printf
First, LEAQ will be