iOS system Analysis (ii) mach-o binary file parsing

Source: Internet
Author: User

? more technical dry please poke: Listen to the Cloud blog

0X01 mach-o Format Simple Introduction

Mach-o file format is an executable file format on OS X and IOS, similar to the ELF files of Windows PE files and Linux (other Unix like), if you do not thoroughly understand the format and related content of Mach-o, then in-depth study of the XNU kernel is impossible Talking

The format of the Mach-o file is as follows:

There are several components:

1. Header: Save some basic information of mach-o, including platform, file type, number of loadcommands and so on.

2. Loadcommands: This section immediately follows the header, and when loading the Mach-o file, this data is used to determine the memory distribution.

3. Data: Each segment is stored here with specific code, data, and so on.

0x02 fat binary data, data structure defined in \<mach-o/fat.h\>

1. The first paragraph is the Magic magic number, here note the size of the end, read out after the need to see whether it is 0xCAFEBABE or 0xBEBAFECA (otherwise thin), you need to follow this to the byte order of subsequent read bytes. It can be seen that the top 4byte is 0xBEBAFECA, which is described as fat.

2. The second paragraph is arch count, that is, which CPU architectures are included in the app or dsym, such as ARMV7, arm64, and so on, in this case 2 (4byte 0x 00 00 00 02), which indicates that two CPU architectures are included.

' sizeof (struct fat-header) = 8byte '

3. The following paragraphs contain data such as Cputype (0x 0C 00 00 01), Cpusubtype (0x 00 00 00 00), offset (0x 00 10 00 00), size (0x F0 27 00), according to the structure in fat Definition, read in turn, it is necessary to note that if there is only one CPU architecture, there is no such fat header definition, you can skip this part, directly read the arch data.

' sizeof (struct fat-arch) = 20byte '

4. Depending on the offset data read from the FAT header, we can jump to the location of the arch data corresponding to the file, although it is not necessary to calculate the offset if there is only one schema. Give the Analytic function

0x03 Mach Header binary data

With magic we can distinguish between 32-bit or 64-bit,64-bit 4-byte reserved fields, which also requires attention to the problem of byte order, that is, the magic to determine whether the byte order needs to be converted.

' sizeof (struct mach-header-64) = 32byte '; ' sizeof (struct mach-header) = 28byte '

According to the definition of Mach-header and mach-header_64, it is obvious that the main function of headers is to help the system quickly locate the running environment of mach-o files and file types.

FileType

Because Mach-o files are not only used to implement executables, they are also used to implement other content

1. Kernel extensions

2. library files

3. Coredump

4. Other

Here are some of the best file types to use

1. Mh-object the obj file generated during compilation (gcc-c xxx.c generate XXX.O file)

2. mh-executable executable binary file (/usr/bin/ls)

3. Mh-core coredump (dump file at crash)

4. Mh-dylib Dynamic library (those shared library files in/usr/lib/)

5. Mh-dylinker Connector Linker (/USR/LIB/DYLD file)

6. mh-kext-bundle Kernel extension file (self-developed simple kernel module)

Flags

Mach-o headers also contains some of the most important dyld loading parameters.

1. Mh-noundefs target does not have undefined symbol, there is no link dependency

2. Mh-dyldlink the target file is a Dyld input file and cannot be statically linked again

3. Mh-pie allow random address space (open ASLR-\>address space Layout randomization)

4. mh-allow-stack-execution stack memory executable code, usually closed by default.

5. Mh-no-heap-execution heap Memory cannot execute code

0x04 Loadcommands

The Load Commands is directly behind the header, and the sum of all command-occupied memory is given in the Mach-o header. After the header is loaded, the next data is loaded by parsing the Loadcommand. Defined as follows:

cmd field

Depending on the type of CMD field, different functions are used to load. Simply list a table to see what the different command types in the kernel code do.

1. Lc-segment;lc-segment-64 is handled by the Load-segment function in the kernel (loading and mapping data from the SEGMENT to the memory space of the process)

2. Lc-load-dylinker is handled by the Load-dylinker function in the kernel (calling the/USR/LIB/DYLD program)

3. Lc-uuid is handled by the LOAD-UUID function in the kernel (the unique ID of the loaded 128-bit)

4. Lc-thread is handled by the Load-thread function in the kernel (opens a Mach thread, but does not allocate stack space)

5. Lc-unixthread is handled by the Load-unixthread function in the kernel (opens a UNIX POSIX thread)

6. Lc-code-signature is processed by the Load-code-signature function in the kernel (digitally signed)

7. Lc-encryption-info is processed by the Set-code-unprotect function in the kernel (encrypted binaries)

UUID binary Data 128byte

The UUID is a piece of data that is 16 bytes (128bit), which is the unique identifier of the file, which must be aligned with the UUID in the app binaries before it is symbolized correctly. The UUID that dwarfdump looks at is this data. Reading this part of the data is read through the command structure, that is, the first paragraph (0x0000001b) represents the next data type, the second segment (0x00000018) data size (including command data).

Symtab binary Data

1. symbol table data block structure, the first two paragraphs remain command data. The back 4 paragraphs are the symbol's offset in the file (0X001DF5E0), the number of symbols (0X001DF5E0), the string's offset in the file (0x0020c3a0), the string table size (0X000729A8).

2. The next step is to read the segment and section data blocks, which are read according to the command structure, the segment data and the section data shown in the data block structure above, and they are contiguous in the binary file. That is, each segment data is followed by a number of corresponding section data, the total data of the section is determined by the nsects in the segment structure.

3. Here I wrote a simple mach-o parsing tool [Https://github.com/liutianshx2012/Tmacho] (Https://github.com/liutianshx2012/Tmacho)

Segment Data

When loading data, the main load is Lc-segmet alive lc-segment_64. The use of other segment is not to be delved into here.

Lcsegment and lc-segment-64 are defined as.

As you can see, most of the data here is used to help the kernel map segment to virtual memory.

The Nsects field, which indicates how many secetion are in the segment, section is the place where the data that is useful is stored.

The text of the VMADDR is the loading address of the program,-dwarf shows the dwarf data block information, indicating that dSYM is the dwarf format of the data structure.

' sizeof (struct segment-command) = 56byte; sizeof (struct segment-command-64) = 72byte '

Section data

From the section data, we can find debugging information such as-debug-info,-debug-pubnames,-debug-line and so on, through which we can find the starting address, variable type and other information of the symbol in the program. If we want to symbolize it, we can get the information we want by parsing the data.

Symbol data

The position and number of symbols in the file can be obtained through the data in the Symtab, which contains the starting address of the symbol, the offset of the string and other data, which can refer to \<nlist.h\> and \<stabl.h\ >. After all this data is read, all the symbolic data can be read, that is, the next data.

Symbol String Data

1. The data in Symtab and Symbo can be used to get the offset and size of each symbol string in the file, and each symbol data is a string ending with 0.

2. We can get the loading address of each Symbo in the program by the combination of the above two parts of the data. This data is very helpful for future sign work.

3. In this regard, the read of the header data in the dSYM file is complete. The head data have the corresponding data structure definition, the reading time is relatively easy, the parsing data should pay attention to the byte order question, 32-bit and 64-bit data structure difference, the byte length difference, the dwarf version difference, each data block is closely related, A byte of read deviation will cause subsequent data read error, is called horseshoes, lost thousands of miles.

Original link: http://blog.tingyun.com/web/article/detail/1341

iOS system Analysis (ii) mach-o binary file parsing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.