As a long-term UNIX user, I usually have some tools to troubleshoot system faults. Recently, I am developing software and adding support for Apple's OSX system. However, unlike other traditional unix variants, OSX does not support many tools related to loading, linking, and executing programs.
For example, when an error occurs in shared library relocation, the first thing I do is to run LDD on executable files. The LDD tool lists the shared libraries (including the paths) on which executable files depend ). However, if you try to run LDD on OSX, an error is returned.
evil:~ mohit$ ldd /bin/ls-bash: ldd: command not found
Not found? But there are basically all UNIX-like systems! I want to know whether objdump is available.
evil:~ mohit$ objdump -x /bin/ls-bash: objdump: command not found
Command not found! What's going on?
The problem is that, unlike Linux, Solaris, HP-UX, and many other UNIX variants, OSX does not use the ELF file format. In addition, OSX is not part of the GNU project. This project contains tools such as LDD and objdump.
To obtain the list of shared libraries on which executable files depend, you need to use the otool.
evil:~ mohit$ otool /bin/lsotool: one of -fahlLtdoOrTMRIHScis must be specifiedUsage: otool [-fahlLDtdorSTMRIHvVcXm] <object file> ... -f print the fat headers -a print the archive header -h print the mach header -l print the load commands -L print shared libraries used -D print shared library id name -t print the text section (disassemble with -v) -p <routine name> start dissassemble from routine name -s <segname> <sectname> print contents of section -d print the data section -o print the Objective-C segment -r print the relocation entries -S print the table of contents of a library -T print the table of contents of a dynamic shared library -M print the module table of a dynamic shared library -R print the reference table of a dynamic shared library -I print the indirect symbol table -H print the two-level hints table -v print verbosely (symbolicly) when possible -V print disassembled operands symbolicly -c print argument strings of a core file -X print no leading addresses or headers -m don't use archive(member) syntaxevil:~ mohit$ otool -L /bin/ls/bin/ls: /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.0.0)
Much better! We can see that/bin/ls references two dynamic libraries. Although we are not familiar with the file extension.
I believe many UNIX and Linux users have had similar experiences when using the OSX system, so I decided to write a little bit about OSX executable files that I know now.
The OSX runtime architecture runtime environment is a framework for code extension on OSX. It consists of a set of sets that define how code is loaded, managed, and executed. Once the application is running, the appropriate runtime environment loads the program to the memory, solves the reference of the External library, and prepares code for execution.
OSX supports three runtime environments:
- Dyld runtime environment: Recommended Environment Based on dyld Library Manager;
- CFM runtime environment: os9 legacy environment. It is actually used to design applications that require the use of new OSX features, but have not been fully transplanted to dyld.
- Classic environment: os9 (9.1 or 9.2) programs run directly on OSX without modification.
This article focuses on the dyld runtime environment.
Almost all executable files in OSX use the Mach-o file format, such as application, framework, library, and kernel extension ...... All are implemented using the Mach-o file. Mach-O is a file format and an ABI that describes how executable files are loaded and run by the kernel (Application binary interface ). Specifically, it tells the system which dynamic library loader is used, which shared library is loaded, how the process address space is organized, and the function entry point address.
Mach-O is not a new thing. Initially, the Open Software Foundation (OSF) was used to design an OSF/1 Operating System Based on the Mach microkernel. Later, it was transplanted to the openstep of the x86 system.
To support the dyld runtime environment, all files should be compiled into the Mach-O executable file format.
Organization of the Mach-o file
The Mach-o file is divided into three areas: the header, the load command segment, and the original segment data. The header and the load command section describe the functions, layout, and other features of the file. The original segment data contains the byte sequence referenced by the load command. To study and check the various parts of the Mach-o file, OSX comes with a very useful program otool, which is located in the/usr/bin directory.
Next, we will use otool to learn how Mach-O files are organized.
View the Mach-O header of the file in the header, and use the-h Parameter of the otool command.
evil:~ mohit$ otool -h /bin/ls/bin/ls:Mach header magic cputype cpusubtype filetype ncmds sizeofcmds flags 0xfeedface 18 0 2 11 1608 0x00000085
The first parameter in the header is magic number ). The magic number indicates whether the file is a 32-bit or 64-bit Mach-o file. It also indicates the CPU byte sequence. For more information about the magic number, see/usr/include/Mach-O/loader. h.
The header also specifies the target architecture of the file. In this way, the kernel is allowed to ensure that the Code does not run on the CPU not compiled for this processor. For example, in the above output, cputype is set to 18, which represents cpu_type_powerpc, which is defined in/usr/include/Mach/machine. h.
From the above two pieces of information, we can infer that this binary file is used in a 32-bit PowerPC-based system.
Sometimes binary files may contain more than one system code. It is usually called Universal binaries and usually starts with the extra header fat_header. Check the fat_header content and use the-F Switch Parameter of the otool command.
The cpusubtype attribute specifies the exact CPU model, which is usually set to cpu_subtype_powerpc_all or cpu_subtype_i1__all.
Filetype specifies how files are aligned and used. In fact, it tells you that files are libraries, static executable files, core files, and so on. The above filetype is equal to mh_execute, indicating the demand paged executable file. The following is a clip captured from/usr/include/Mach-O/loader. H. Different file types are listed.
#define MH_OBJECT 0x1 /* relocatable object file */#define MH_EXECUTE 0x2 /* demand paged executable file */#define MH_FVMLIB 0x3 /* fixed VM shared library file */#define MH_CORE 0x4 /* core file */#define MH_PRELOAD 0x5 /* preloaded executable file */#define MH_DYLIB 0x6 /* dynamically bound shared library */#define MH_DYLINKER 0x7 /* dynamic link editor */#define MH_BUNDLE 0x8 /* dynamically bound bundle file */#define MH_DYLIB_STUB 0x9 /* shared library stub for static */ /* linking only, no section contents */
The following two attributes involve loading the command segment and specify the number and size of commands.
Finally, the status information is obtained, which may be used by the kernel during loading and execution.
The load command section contains a list of commands that tell the kernel how to load various original segments in the file. A typical description of how to align and protect the layout of each segment and each segment in the memory.
View the load command list in the file and use the-l switch parameter of the otool command.
evil:~/Temp mohit$ otool -l /bin/ls/bin/ls:Load command 0 cmd LC_SEGMENT cmdsize 56 segname __PAGEZERO vmaddr 0x00000000 vmsize 0x00001000 fileoff 0filesize 0 maxprot 0x00000000initprot 0x00000000 nsects 0 flags 0x4Load command 1 cmd LC_SEGMENT cmdsize 600 segname __TEXT vmaddr 0x00001000 vmsize 0x00006000 fileoff 0filesize 24576 maxprot 0x00000007initprot 0x00000005 nsects 8 flags 0x0Section sectname __text segname __TEXT addr 0x00001ac4 size 0x000046e8 offset 2756 align 2^2 (4) reloff 0 nreloc 0 flags 0x80000400reserved1 0reserved2 0[ ___SNIPPED FOR BREVITY___ ]Load command 4 cmd LC_LOAD_DYLINKER cmdsize 28 name /usr/lib/dyld (offset 12)Load command 5 cmd LC_LOAD_DYLIB cmdsize 56 name /usr/lib/libncurses.5.4.dylib (offset 24) time stamp 1111407638 Mon Mar 21 07:20:38 2005 current version 5.4.0compatibility version 5.4.0Load command 6 cmd LC_LOAD_DYLIB cmdsize 52 name /usr/lib/libSystem.B.dylib (offset 24) time stamp 1111407267 Mon Mar 21 07:14:27 2005 current version 88.0.0compatibility version 1.0.0Load command 7 cmd LC_SYMTABcmdsize 24 symoff 28672 nsyms 101 stroff 31020strsize 1440Load command 8 cmd LC_DYSYMTAB cmdsize 80 ilocalsym 0 nlocalsym 0 iextdefsym 0 nextdefsym 18 iundefsym 18 nundefsym 83 tocoff 0 ntoc 0 modtaboff 0 nmodtab 0 extrefsymoff 0 nextrefsyms 0indirectsymoff 30216 nindirectsyms 201 extreloff 0 nextrel 0 locreloff 0 nlocrel 0Load command 9 cmd LC_TWOLEVEL_HINTScmdsize 16 offset 29884 nhints 83Load command 10 cmd LC_UNIXTHREAD cmdsize 176 flavor PPC_THREAD_STATE count PPC_THREAD_STATE_COUNT r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x00000000 r14 0x00000000 r15 0x00000000 r16 0x00000000 r17 0x00000000 r18 0x00000000 r19 0x00000000 r20 0x00000000 r21 0x00000000 r22 0x00000000 r23 0x00000000 r24 0x00000000 r25 0x00000000 r26 0x00000000 r27 0x00000000 r28 0x00000000 r29 0x00000000 r30 0x00000000 r31 0x00000000 cr 0x00000000 xer 0x00000000 lr 0x00000000 ctr 0x00000000 mq 0x00000000 vrsave 0x00000000 srr0 0x00001ac4 srr1 0x00000000
The above files are directly located by loading command 11 under the header, from 0 to 10.
Commands 0 and 3 (lc_segment) ranges from 0 to 3 and defines how segments in the file are mapped to the memory. Segment defines the byte sequence in the Mach-O binary file, which can contain zero or more sections. Let's talk about it later.
- Command 4 (lc_load_dylinker) specifies the dynamic linker to use. Almost always set to OSX default dynamic linker/usr/lib/dyld.
- Commands 5 and 6 (lc_load_dylib) specifies the shared library to be linked. They are loaded by the dynamic linker specified by command 4.
- Commands 7 and 8 (lc_symtab, lc_dynsymtab) Specify the symbol table used by the file and dynamic linker respectively.
- Command 9 (lc_twolevel_hints) contains two levels of namespace hint table.
- Command 10 (lc_unixthread) defines the initial state of the main thread of the process. This command is only contained in the executable file.
Segments and zones)
Most of the loading commands mentioned above reference the segments in the file. Segments are a series of character sequences mapped to the virtual memory by the kernel and the dynamic linker. The header and the load command area are considered to be the first part of the file. A typical OSX executable file consists of the following five segments:
- _ Pagezero is located at virtual address 0 and has no protection rights. This segment does not occupy space in the file. Access to null causes an immediate crash.
- _ Text contains read-only data and executable code.
- _ Data contains writable data. These sections are usually marked as copy-on-write by the kernel.
- _ Objc contains the data used in the runtime environment of Objective C language.
- _ Linkedit contains the raw data used by the dynamic linker.
The _ text and _ data segments may contain 0 or more sections. Each section consists of data of the specified type, such as executable code, constants, and C strings.
View the content of a section and use the otool command-s option.
evil:~/Temp mohit$ otool -sv __TEXT __cstring /bin/ls/bin/ls:Contents of (__TEXT,__cstring) section00006320 00000000 5f5f6479 6c645f6d 6f645f74 00006330 65726d5f 66756e63 73000000 5f5f6479 00006340 6c645f6d 616b655f 64656c61 7965645f 00006350 6d6f6475 6c655f69 6e697469 616c697a __SNIP__
Disassemble the _ text section and use the-TV switch parameter.
evil:~/Temp mohit$ otool -tv /bin/ls/bin/ls:(__TEXT,__text) section00001ac4 or r26,r1,r100001ac8 addi r1,r1,0xfffc00001acc rlwinm r1,r1,0,0,2600001ad0 li r0,0x000001ad4 stw r0,0x0(r1)00001ad8 stwu r1,0xffc0(r1)00001adc lwz r3,0x0(r26)00001ae0 addi r4,r26,0x4__SNIP__
In the _ Text Segment, there are four main sections:
- The machine code after _ text compilation.
- _ Const common constant data.
- _ Cstring literal String constant.
- _ Picsymbol_stub the location-independent code stub routing used by the dynamic linker.
This ensures the obvious isolation between executable and unexecutable code segments.
Running the application knows the format of the Mach-o file. Next, let's take a look at how OSX loads and runs the application. When running an application, shell first calls fork (2) System Call. Fork creates a logical copy of the calling process (Shell) and is ready for execution. The sub-process then calls execve (2) system call. Of course, the program path to be executed must be provided.
The kernel loads the specified file and checks whether its header is a valid Mach-o file. Then begin to explain the load command, replace the child process address space with the sections in the file. At the same time, the kernel also runs a dynamic linker specified by a binary file to load and link all dependent libraries. Call the entry-point function after binding all the necessary symbols for running.
When building an application, the entry-point function usually uses the/usr/lib/crt1.o static Link (standard function ). This function initializes the kernel environment and calls the main () function of the executable file.
The application is running now.
Dynamic linker
OSX dynamic linker/usr/lib/dyld is responsible for loading dependent shared libraries, importing variable symbols and functions, and binding with the current process. When a process is running for the first time, the linker imports the Shared Library to the process address space. Depending on the build method of the program, the actual binding method is different.
- Bind the load-time binding immediately after loading.
- Just-in-time is bound when the symbol is referenced.
Pre-binding: If the binding type is not specified, use just-in-time to bind.
The application can continue to run only when all the required symbols and segments are resolved from different target files. To search for libraries and frameworks, the standard dynamic linker/usr/bin/dyld searches for predefined directory sets. To modify the directory or provide a rollback path, you can set the environment variable dyld_library_path or dyld_fallback_library_path.