OS X Application Format explained
How OS X executes the application
Translator: 51test2003 translated from http://0xfe.blogspot.com/2006/03 ... s-applications.html
As a long-term UNIX user, I usually have some tools to troubleshoot system failures. Recently, I was developing software and added Apple's OS X system support; However, unlike other traditional UNIX variants, OS X does not support many of the tools associated with loading, linking, and executing programs.
For example, when a shared library relocation error occurs, the first thing I do is run LDD on the executable file. The LDD tool lists the shared libraries (including the path) on which the executable file depends. But in OS X, trying to run LDD will error.
evil:~ mohit$ Ldd/bin/ls
-bash:ldd:command not found
Not found? But it's basically on all UNIX. I want to know if Objdump is available.
$ objdump-x/bin/ls
-bash:objdump:command not found
Command not found. What's going on?
The problem is that unlike Linux, Solaris, HP-UX, and many other Unix variants, OS X does not use Elf binaries. In addition, OS X is not part of the GNU project. This project contains tools like LDD and Objdump.
In order to get the list of shared libraries on which the executable depends on OS X, you need to use the Otool tool.
evil:~ mohit$ Otool/bin/ls
Otool:one Of-fahlltdoortmrihscis must be specified
Usage:otool [-FAHLLDTDORSTMRIHVVCXM] object_file ...
-F Print the FAT headers
-A print the archive header
-H print the Mach header
-L PRINT the load commands
-L print shared libraries used
-D print shared library ID name
-T print the text section (disassemble with-v)
-P start dissassemble from routine name
-S Print contents of section
-D Print the data section
-O Print the OBJECTIVE-C segment
-R Print the relocation entries
-S Print the table of contents of a library
-T print the table of contents of a dynamic shared library
-M Print the module table of a dynamic shared library
-R Print the reference table of a dynamic shared library
-I print the indirect symbol table
-H Print the Two-level hints table
-V Print verbosely (symbolicly) when possible
-V Print disassembled operands symbolicly
-C print argument strings of a core file
-X print no leading addresses or headers
-M don ' t use archive (member) syntax
evil:~ mohit$ otool-l/bin/ls
/bin/ls:
/usr/lib/libncurses.5.4.dylib (Compatibility version 5.4.0, current version 5.4.0)
/usr/lib/libsystem.b.dylib (Compatibility version 1.0.0, current version 88.0.0)
Much better. We can see that/bin/ls quoted two dynamic libraries. Although, file extensions are not familiar to us at all.
I believe that many unix/linux users have similar experiences with OS X systems, so I decided to write a little bit of knowledge about OS X executables that I currently know.
The OS X Runtime Schema Runtime environment is a framework for code extensions on OS X. It consists of a set of definitions of how code is loaded, managed, and executed. Once the application runs, the appropriate runtime environment loads the program into memory, resolves references to external libraries, and prepares code for execution.
OS X supports three runtime environments:
DYLD Runtime Environment: a recommended environment based on the Dyld library manager.
CFM Runtime Environment: OS 9 Legacy environment. Applications that are actually designed to use new features of OS X but have not yet been fully ported to DYLD.
The Classic Environment: OS 9 (9.1 or 9.2) programs do not need to be modified to run directly on OS X.
This article focuses on the DYLD runtime environment.
Mach-o executable file format in OS X, almost all files that contain executable code, such as: applications, frameworks, libraries, kernel extensions ..., are implemented in the Mach-o file. Mach-o is a file format and an ABI (Application binary interface) that describes how an executable file is loaded and run by the kernel. Professionally speaking, it tells the system:
which dynamic library loader to use
Which shared library to load.
How to organize the process address space.
function entry point address, and so on.
Mach-o is not a new thing. Originally used by the Open Software Foundation (OSF) to design the Mach micro-core OSF/1 operating system. Later transplanted to the x86 system OpenStep.
In order to support the DYLD runtime environment, all files should be compiled into the mach-o executable file format.
Organization of the Mach-o file
The Mach-o file is divided into three regions: header, loading command area section, and raw segment data. The header and Load command area describe file functions, layouts, and other features; The original segment data contains a sequence of bytes referenced by the load command. To investigate and examine the parts of the Mach-o file, OS X comes with a very useful program Otool, which is located in the/usr/bin directory.
Next, you'll use Otool to understand how the Mach-o file is organized.
Head to view the Mach-o header of the file, using the-h parameter of the Otool command
evil:~ mohit$ otool-h/bin/ls
/bin/ls:
Mach Header
Magic cputype cpusubtype filetype Ncmds sizeofcmds Flags
0xfeedface 0 2 1608 0x00000085
The head first specifies the magic number. The magic number indicates whether the file is a 32-bit or 64-bit Mach-o file. Also indicates the CPU byte order. Explanation of the magic number, see/usr/include/mach-o/loader.h.
The header also specifies the destination schema for the file. This allows the kernel to ensure that the code does not run on a CPU that is not written for this processor. For example, in the above output, Cputype is set to 18, which represents CPU_TYPE_POWERPC, which is defined in/usr/include/mach/machine.h.
From the last two information, we infer that this binary file is used for 32-bit PowerPC based systems.
Sometimes a binary file may contain more than one system of code. Commonly referred to as Universal Binaries, usually begins with Fat_header this extra head. Check the Fat_header content, using the-F switch parameter of the Otool command.
The Cpusubtype attribute sets the exact model of the CPU, usually set to Cpu_subtype_powerpc_all or Cpu_subtype_i386_all.
FileType indicates how the file is aligned and how it is used. It actually tells you that the file is a library, a static executable, a core file, and so on. The above filetype equals mh_execute, indicating demand paged executable file. The following is a fragment from/usr/include/mach-o/loader.h that lists the different file types.
#define MH_OBJECT 0x1/* relocatable OBJECT file */
#define Mh_execute 0x2/* Demand Paged executable file */
#define MH_FVMLIB 0x3/* Fixed VM Shared library file */
#define Mh_core 0x4/* CORE file */
#define MH_PRELOAD 0x5/* Preloaded executable file * *
#define MH_DYLIB 0x6/* Dynamically bound shared library */
#define Mh_dylinker 0x7/* Dynamic Link Editor */
#define MH_BUNDLE 0x8/* Dynamically bound BUNDLE file */
#define MH_DYLIB_STUB 0x9/* Shared library STUB for static */
/* Linking only, no section contents */
The next two properties involve loading the command section, specifying the number and size of the commands.
Finally, status information is obtained, which may be used by the kernel when loading and executing.
The Load Command load command section contains a list of commands that tell the kernel how to load each of the original segments in the file. A typical description of how to align and protect the layout of each segment and segment in memory.
View the list of loading commands in the file, using the-l switch parameter of the Otool command.
Evil:~/temp mohit$ otool-l/bin/ls
/bin/ls:
Load Command 0
CMD lc_segment
Cmdsize 56
Segname __pagezero
VMADDR 0x00000000
Vmsize 0x00001000
Fileoff 0
FileSize 0
Maxprot 0x00000000
Initprot 0x00000000
Nsects 0
Flags 0x4
Load Command 1
CMD lc_segment
Cmdsize 600
Segname __text
Vmaddr 0x00001000
Vmsize 0x00006000
Fileoff 0
FileSize 24576
Maxprot 0x00000007
Initprot 0x00000005
Nsects 8
Flags 0x0
Section
Sectname __text
Segname __text
Addr 0x00001ac4
Size 0x000046e8
Offset 2756
Align 2^2 (4)
Reloff 0
Nreloc 0
Flags 0x80000400
Reserved1 0
Reserved2 0
[___snipped for brevity___]
Load Command 4
CMD Lc_load_dylinker
Cmdsize 28
Name/usr/lib/dyld (offset 12)
Load Command 5
CMD lc_load_dylib
Cmdsize 56
Name/usr/lib/libncurses.5.4.dylib (offset 24)
Time Stamp 1111407638 Mon Mar 21 07:20:38 2005
Current version 5.4.0
Compatibility version 5.4.0
Load Command 6
CMD lc_load_dylib
Cmdsize 52
Name/usr/lib/libsystem.b.dylib (offset 24)
Time Stamp 1111407267 Mon Mar 21 07:14:27 2005
Current version 88.0.0
Compatibility version 1.0.0
Load Command 7
CMD Lc_symtab
Cmdsize 24
Symoff 28672
Nsyms 101
Stroff 31020
Strsize 1440
Load Command 8
CMD Lc_dysymtab
Cmdsize 80
Ilocalsym 0
Nlocalsym 0
Iextdefsym 0
Nextdefsym 18
Iundefsym 18
Nundefsym 83
Tocoff 0
Ntoc 0
Modtaboff 0
Nmodtab 0
Extrefsymoff 0
Nextrefsyms 0
Indirectsymoff 30216
Nindirectsyms 201
Extreloff 0
Nextrel 0
Locreloff 0
Nlocrel 0
Load Command 9
CMD lc_twolevel_hints
Cmdsize 16
Offset 29884
Nhints 83
Load Command 10
CMD lc_unixthread
Cmdsize 176 Flavor Ppc_thread_state
Count Ppc_thread_state_count
R0 0x00000000 R1 0x00000000 R2 0x00000000 R3 0x00000000 R4 0x00000000
R5 0x00000000 R6 0x00000000 R7 0x00000000 R8 0x00000000 R9 0x00000000
R10 0x00000000 R11 0x00000000 R12 0x00000000 R13 0x00000000 R14 0x00000000
R15 0x00000000 R16 0x00000000 R17 0x00000000 R18 0x00000000 R19 0x00000000
R20 0x00000000 R21 0x00000000 R22 0x00000000 r23 0x00000000 R24 0x00000000
R25 0x00000000 R26 0x00000000 r27 0x00000000 r28 0x00000000 r29 0x00000000
R30 0x00000000 R31 0x00000000 CR 0x00000000 Xer 0x00000000 LR 0x00000000
CTR 0x00000000 MQ 0x00000000 vrsave 0x00000000 srr0 0x00001ac4 srr1 0x00000000
The above file has 11 load commands directly located under the head, from 0 to 10.
The first four commands (lc_segment), from 0 to 3, define how the segments in the file are mapped into memory. A segment defines a sequence of bytes in a mach-o binary binary file that can contain 0 or more sections. Let's talk about paragraph later.
Load Command 4 (lc_load_dylinker) specifies which dynamic linker to use. Almost always set to OS X default dynamic linker/usr/lib/dyld.
Commands 5 and 6 (lc_load_dylib) specifies the shared library to which the file needs to be linked. They are loaded by the dynamic linker specified in command 4.
Commands 7 and 8 (Lc_symtab, lc_dynsymtab) specify the symbol tables used by the file and dynamic linker respectively. Command 9 (lc_twolevel_hints) contains a two-level namespace for hint table. Finally, Command ten (Lc_unixthread), defines the initial state of the main thread of the process. This command is only included in the executable file.
Segments and Sections
Most of the load commands involved above refer to the segments in the file. A segment is a sequence of characters that mach-o files are directly mapped to virtual memory by the kernel and dynamic linker. The header and load command areas are considered the first paragraph of the file. A typical OS X executable file typically consists of the following five paragraphs:
__pagezero: Fixed at virtual address 0 without any protection rights. This segment does not occupy space in the file, and accessing null causes an immediate crash.
__text: Contains read-only data and executable code.
__data: Contains writable data. These sections are typically copy-on-write by the kernel flag.
__OBJC: Contains data that is used by the Objective C language runtime environment.
__linkedit: Contains the raw data for the dynamic linker.
The __text and __data segments may contain 0 or more sections. Each section consists of data of the specified type, such as executable code, constants, C strings, and so on.
To view a section content, use the otool command-s option.
Evil:~/temp mohit$ OTOOL-SV __text __cstring/bin/ls
/bin/ls:
Contents of (__text,__cstring) section
00006320 00000000 5f5f6479 6c645f6d 6f645f74
00006330 65726d5f 66756e63 73000000 5f5f6479
00006340 6c645f6d 616b655f 64656c61 7965645f
00006350 6d6f6475 6c655f69 6e697469 616c697a
__snip__
Disassembly __text section, using the THE-TV switch parameter.
Evil:~/temp mohit$ Otool-tv/bin/ls
/bin/ls:
(__text,__text) Section
00001AC4 or R26,R1,R1
00001ac8 Addi R1,R1,0XFFFC
00001ACC RLWINM r1,r1,0,0,26
00001ad0 Li r0,0x0
00001ad4 STW r0,0x0 (R1)
00001ad8 STWU r1,0xffc0 (R1)
00001ADC lwz r3,0x0 (R26)
00001ae0 Addi r4,r26,0x4
__snip__
In the __text section, there are four main sections:
__text: Compiled machine code.
__const: General-purpose constant data.
__cstring: Literal string constant.
__picsymbol_stub: The location-independent code stub route used by the dynamic linker.
This preserves the obvious isolation of executable and unenforceable code in the segment.
Run the application now that you know the format of the Mach-o file, let's see how OS X loads and runs the application. When you run the application, the shell first calls the fork () system call. Fork creates a logical copy of the calling process (shell) and is ready to execute. The child process then calls the EXECVE () system call, which of course needs to provide the program path to execute.
The kernel loads the specified file and checks to see if its header verification is a valid Mach-o file. You then begin to interpret the load command to replace the child process address space with the segments in the file. At the same time, the kernel executes the dynamic linker specified by the binary file, starting to load and link all dependent libraries. Call the Entry-point function after you have bound the individual symbols that are necessary to run.
In the build application, the Entry-point function is typically from/USR/LIB/CRT1.O static link (standard function). This function initializes the kernel environment and invokes the main () function of the executable file.
The application is now running.
Dynamic linker
OS X Dynamic Linker/usr/lib/dyld, responsible for loading dependent shared libraries, importing variable symbols and functions, and binding to the current process. When the process first runs, the linker does the import of the shared library into the process address space. Depending on how the program is build, the actual binding is done in a different way.
Bind--load-time bindings immediately after loading.
--just-in-time binding when a symbol is referenced.
Pre-binding
If no binding type is specified, the just-in-time binding is used.
The application can continue to run only when all required symbols and segments are resolved from different target files. To find libraries and frameworks, the standard dynamic linker/usr/bin/dyld will search for a predefined collection of catalogs. To modify the directory, or provide a rollback path, you can set the Dyld_library_path or DYLD_FALLBACK_LIBRARY_PATH environment variable