Anatomy of the IOS dsym file (top)

Source: Internet
Author: User

In the iOS app development process, we will use Xcode to package, generate. xcarchive package files, through the Xcode Organizer tool can manage, export the release file, I believe that iOS development for these processes are quite familiar with, here is not to repeat. The main want to say is, after packaging the dSYM file.

Get the dSYM file by opening the Archives Management window first, such as:

Once per archive, a record is generated to find the directory where the current record is located, such as:

Opening the. xcarchive package file will see its directory structure, and the dSYM package file in Dsyms is the next file we'll dissect. dSYM the same package file, when opened, we will find a binary file, for example, a binary file called demo.

From the directory name, you can see that iOS is using the DWARF file structure, DWARF (the possible explanation is: debugging with attributed Recordformats) is a debug file structure standard, the structure is quite complex. About Dwarf's past life, from where, why, and how to develop, please refer to dwarf website or online search.

An important role of the dSYM file is that when our program crashes, through crash log or other means, we will see the call stack information, through the log information, we do not know exactly where the file is a problem, this time the binary file is very useful, Through it we can use the tool to symbolize, such as Xcode's own Atos, so that you can directly navigate to the specific location of a file.

There are a lot of tools to parse dwarf files, for example, Mac OS has dwarfdump,otool, but the existing tools do not meet all our needs, now we understand the internal structure, in future development needs, can be used for reference.

Below we open this binary file. Note: The following "segment" units are 4 bytes.

After opening the file, first look at the file header, structure definition reference <fat.h>.

Fat binary Data

The first paragraph is the magic, which requires attention to the byte order, read out after the need to see whether it is 0xCAFEBABE or 0xBEBAFECA, need to follow this to the subsequent read byte sequence.

The second paragraph is arch count, which is what CPU architectures are included in the app or dsym, such as armv7,arm64, for example, 2, which means that there are two kinds of CPU architectures included.

The subsequent segments contain data such as Cputype (0x0000000c), Cpusubtype (0x00000009), offset (0x00000040), size (0x000f6825), and are read, in turn, according to the structure definition in fat. It is necessary to note that if there is only one CPU architecture, it is not defined by this fat header, you can skip this section and read the arch data directly.

Depending on the offset data read from the FAT header, we can jump to the location of the arch data corresponding to the file, although it is not necessary to calculate the offset if there is only one schema. In the example, the offset of 32-bit Arch is 0x00000040,64-bit Arch's offset is 0x000f6880. The following data structure reference <mach.h>.

Mach binary data

(32-bit)

 

(64-bit)

With magic we can distinguish between 32-bit or 64-bit,64-bit 4-byte reserved fields, which also requires attention to the problem of byte order, that is, the magic to determine whether the byte order needs to be converted.

The following section resolves, taking 32-bit as an example.

UUID binary Data

The UUID is a piece of data that is 16 bytes (128bit), which is the unique identifier of the file, which must be aligned with the UUID in the app binaries before it is symbolized correctly. The UUID that dwarfdump looks at is this data. Reading this part of the data is read through the command structure, that is, the first paragraph (0x0000001b) represents the next data type, the second segment (0x00000018) data size (including command data).

Symtab binary Data

Symbol table data block structure, the first two paragraphs are still command data. The back 4 paragraphs are the symbol's offset in the file (0x00001000), the number of symbols (0x00000015), the string's offset in the file (0X000010FC), the string size (0x00000297).

The next step is to read the segment and section data blocks, which are read according to the command structure, as shown above, and the segment data and the section data are separated, and they are actually contiguous in the binary file. That is, each segment data is followed by a number of corresponding section data, the total data of the section is determined by the nsects in the segment structure.

Segment Data

From the segment data we can see, __text vmaddr is 0x00004000, that is, the loading address of the program, of course, this refers to the 32-bit program, 64-bit is different. The information in the dwarf data block is indicated in __dwarf, which indicates that DSYM is the data structure in dwarf format.

Section data

The above is part of the section data, we can find __debug_info, __debug_pubnames, __debug_line and other debugging information, through these debugging information we can find the starting address of the symbol in the program, Information such as variable type. If we want to symbolize it, we can get the information we want by parsing the data.

For definitions of segment and section types, please refer to the dwarf website.

about how to parse the parsed data to get the position of the symbol in the file, and then share it in the next chapter.

Here we have read most of the data in the header of the symbol file, there is also a part of the data in the file is also very important, is the symbol block data, he is our program all the method information.

Symbol binary Data

The position and number of symbols in the file can be obtained through the data in the Symtab, and the symbol block data contains the

The starting address, the cheap amount of string, and so on, this part of the data structure can refer to <nlist.h> and <stabl.h>. After all this data is read, all the symbolic data can be read, that is, the next data.

Symbol string Binary data

The data in Symtab and Symbo can be used to get the offset and size of each symbol string in the file, and each symbol data is a string ending with 0.

We can get the load address of each Symbo in the program by the combination of the above two parts of the data. This data is very helpful for future sign work. 64-bit data parsing is the same as the above method, but note that some of the data in the 64-bit is a bit different, it needs to be noted when parsing.

In this connection, the read of the header data in the dSYM file is complete. The head data have the corresponding data structure definition, the reading time is relatively easy, the parsing data should pay attention to the byte order question, 32-bit and 64-bit data structure difference, the byte length difference, the dwarf version difference, each data block is closely related, A byte of read deviation will cause subsequent data read error, is called horseshoes, lost thousands of miles.


Anatomy of the IOS dsym file (top)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.