IOS system analysis (2) Mach-O binary file parsing,

Source: Internet
Author: User

IOS system analysis (2) Mach-O binary file parsing,

Refer to the following link for more technical tips: Alibaba Cloud blog

0x01 Mach-O Format

The Mach-O file format is the executable file format on OS X and iOS systems, similar to windows PE files and Linux (other Unix like) ELF files, if you do not fully understand the Mach-O Format and related content, you will not be able to thoroughly study the xnu kernel.

The format of the Mach-O file is as follows:

It consists of the following parts:

1. Header: stores some basic information about Mach-O, including the platform, file type, and number of LoadCommands.

2. LoadCommands: This section follows the Header. when loading the Mach-O file, the data here is used to determine the memory distribution.

3. Data: The specific Data of each segment is stored here, including the specific code and Data.

0x02 FAT binary data, data structure defined in \ <mach-o/fat. h \>

1. the first section is the magic number. Pay attention to the size end. After reading it, check whether it is 0xCAFEBABE or 0 xBEBAFECA (otherwise it is thin ), based on this, we need to convert the byte order of subsequent reads. We can see that the first 4 bytes are 0 xBEBAFECA, Which is fat.

2. the second segment is arch count, which is the CPU architecture contained in the App or dSYM, such as armv7 and arm64. In this example, 2 (the last 4 bytes 0x00 00 02 ), indicates that two cpu architectures are included.

  `sizeof(struct fat-header) = 8byte`

3. the subsequent sections include cputype (0x 0C 00 00 01), cpusubtype (0x00 00 00 00), and offset (0x00 10 00 00) size (0x00 F0 27 00) and other data are read in sequence according to the structure definition in fat. here we need to note that if only one CPU architecture is included, there is no definition of this fat header. You can skip this part and directly read the Arch data.

   `sizeof(struct fat-arch) = 20byte`

4. Based on the offset data read in the fat header, we can jump to the location of the arch data corresponding to the file. Of course, if there is only one architecture, we do not need to calculate the offset. The parsed function is provided.

0x03 Mach Header binary data

With magic, we can distinguish whether it is 32-bit or 64-bit. 64-bit has four more reserved fields. Here we also need to pay attention to the issue of the byte order, that is, Judge magic to determine whether to convert the byte order.

`sizeof(struct mach-header-64) = 32byte`  ; `sizeof(struct mach-header) = 28byte`

According to the definition of mach-header and mach-header_64, it is obvious that the main function of Headers is to help the system quickly locate the running environment and file type of the Mach-O file.

FileType

Because the Mach-O file is not only used to implement executable files, but also to implement other content

1. kernel Extension

2. library files

3. CoreDump

4. Others

The following are some of the most useful file types:

1. obj files generated during MH-OBJECT compilation (gcc-c xxx. c generates xxx. o files)

2. MH-EXECUTABLE executable binaries (/usr/bin/ls)

3. MH-CORE CoreDump (Dump file at crash)

4. MH-DYLIB dynamic library (/usr/lib/those in the shared library file)

5. MH-DYLINKER connector linker (/usr/lib/dyld file)

6. MH-KEXT-BUNDLE kernel extension file (self-developed simple kernel module)

Flags

Mach-O headers also contains some important dyld loading parameters.

1. The MH-NOUNDEFS target has no undefined symbols and there is no link dependency

2. MH-DYLDLINK this target file is dyld input file, cannot be again static Link

3. MH-PIE allows random Address Space (enable ASLR-\> Address Space Layout Randomization)

4. MH-ALLOW-STACK-EXECUTION stack memory executable code, which is usually disabled by default.

5. MH-NO-HEAP-EXECUTION heap memory cannot execute code

0x04 LoadCommands

Load Commands follows the Header directly. The total memory occupied by all Commands is provided in the Mach-O Header. After the Header is loaded, the next data is loaded by parsing the LoadCommand. Definition:

Cmd Field

Depending on the type of the cmd field, different functions are used for loading. Simply list a table to see what the different command types play in the kernel code.

1. LC-SEGMENT; The LC-SEGMENT-64 is processed by the load-segment function in the kernel (load the data in the segment and map it To the memory space of the process)

2. The LC-LOAD-DYLINKER is processed by the load-dylinker function in the kernel (call the/usr/lib/dyld Program)

3. The LC-UUID is processed by the load-uuid function in the kernel (loaded with a unique ID of 128-bit)

4. The LC-THREAD is processed by the load-thread function in the kernel (enable an MACH thread without allocating stack space)

5. The LC-UNIXTHREAD is processed by the load-unixthread function in the kernel (enabling a UNIX posix thread)

6. The LC-CODE-SIGNATURE is processed by the load-code-signature function in the kernel (for digital signature)

7. The LC-ENCRYPTION-INFO is processed by the set-code-unprotect function in the kernel (encrypted binary)

UUID binary data 128 bytes

UUID is a 16-byte (-bit) segment of data, which is the unique identifier of a file. In the symbolic model mentioned above, the UUID must be consistent with the UUID in the App binary file, to be correctly symbolic. The UUID viewed by dwarfdump is the data segment. When reading this part of data, read it through the Command structure, that is, the first section (0x0000001B) indicates the next data type, the second section (0x00000018) data size (including Command data ).

SymTab binary data

1. The block structure of the symbol table. The first two sections are still Command data. The following four sections are the offset (0x001DF5E0), the number of symbols (0x001DF5E0), the offset (0x0020C3A0) of the string in the file, and the size of the string table (0x000729A8 ).

2. the next step is to read the Segment and Section data blocks. like reading the data block structure above, the Segment data and Section data displayed are read according to the Command structure, they are continuous in binary files, that is, each Segment data is followed by multiple corresponding Section data. The total number of Section data is determined by the nsects in the Segment structure.

3. Here I wrote a simple Mach-O parsing tool [https://github.com/liutianshx2012/Tmacho] (https://github.com/liutianshx2012/Tmacho)

Segment data

When loading data, the main load is the LC-SEGMET live LC-SEGMENT_64. The usage of other Segment is not discussed here.

LCSEGMENT and LC-SEGMENT-64 definitions such.

 

We can see that most of the data here is used to help the kernel map the Segment to the virtual memory.

The nsects field indicates the number of secetion in the Segment. section is the place where useful data is stored.

The vmaddr of TEXT is the load address of the program.-DWARF indicates the information of the DWARF data block, indicating that dSYM is a data structure in the DWARF format.

` sizeof(struct segment-command) = 56byte   ;   sizeof(struct segment-command-64) = 72byte`

Section Data

From the Section data, we can find debugging information such as-debug-info,-debug-pubnames, and-debug-line, through the debugging information, we can find the starting address and variable type of the symbol in the program. If we want to be symbolic, We can parse the data to obtain the desired information.

Symbol data

You can use the data in the SymTab to obtain the position and number of the Symbol in the file. the Symbol block data contains the starting address of the Symbol, the offset of the string, and other data, for more information about the data structure, see \ <nlist. h \> and \ <stabl. h \>. After reading all the data, you can read all the symbolic data, that is, the subsequent data.

Symbol String data

1. The offset and size of each symbol string in the file can be obtained through the data in SymTab and Symbo, and each symbol data is a string ending with 0.

2. Through the combination of the above two data parts, we can get the load address of each symbo in the program. These data will be very helpful for subsequent symbolic work.

3. At this point, the reading of the header data in the dSYM file is complete. The header data has the corresponding data structure definition, which is easier to read. when parsing data, pay attention to the byte order, the differences between 32-bit and 64-bit data structures, the difference in byte length, and the difference in the DWARF version, each data block is closely related, the read deviation of one byte will lead to subsequent data reading errors. The difference is a thousand miles away.

 

Link: http://blog.tingyun.com/web/article/detail/1341

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.