Capture and parse the thread Call Stack in iOS (2)

Source: Internet
Author: User

Capture and parse the thread Call Stack in iOS (2)

 

1. Some references

I also read many links and books during this process, including but not limited:

OS X ABI Mach-O File Format Reference

Mach-O Programming Topics

"Programmer's self-cultivation"-I read this book a few years ago, and I took it from the shelf for further study, mainly for comparison and confirmation;

The Mac Hacker's Handbook

Mac OS X and iOS Internals

And a lot of Google Search.

2. Related APIs and data structures

Because we get a group of addresses from the call stack of the Backtracking thread above, the symbolic Input and Output here should be addresses and symbols respectively. The interface design is similar to the following:

- (NSString *)symbolicateAddress:(uintptr_t)addr;

However, in practice, we need to rely on dyld-related methods and data structures:

/* * Structure filled in by dladdr(). */typedef struct dl_info {        const char      *dli_fname;     /* Pathname of shared object */        void            *dli_fbase;     /* Base address of shared object */        const char      *dli_sname;     /* Name of nearest symbol */        void            *dli_saddr;     /* Address of nearest symbol */} Dl_info;extern int dladdr(const void *, Dl_info *);DESCRIPTION     These routines provide additional introspection of dyld beyond that provided by dlopen() and dladdr()     _dyld_image_count() returns the current number of images mapped in by dyld. Note that using this count     to iterate all images is not thread safe, because another thread may be adding or removing images dur-ing during     ing the iteration.     _dyld_get_image_header() returns a pointer to the mach header of the image indexed by image_index.  If     image_index is out of range, NULL is returned.     _dyld_get_image_vmaddr_slide() returns the virtural memory address slide amount of the image indexed by     image_index. If image_index is out of range zero is returned.     _dyld_get_image_name() returns the name of the image indexed by image_index. The C-string continues to     be owned by dyld and should not deleted.  If image_index is out of range NULL is returned.

To determine whether the resolution was successful, the interface design evolved:

bool jdy_symbolicateAddress(const uintptr_t addr, Dl_info *info)

Dl_info is used to fill in the resolution result.

3. algorithm ideas

The symbolic resolution of an address is also straightforward, that is, finding the memory image to which the address belongs, locating the symbol table in the image, and finally matching the symbol of the target address from the symbol table.

Authorization/KjrLKiw7vT0LqtuMe + 38zltcTPuL3ao6yxyMjnu/nT2kFTTFK1xMar0sbBv6O6PC9wPg0KPHByZSBjbGFzcz0 = "brush: java;">// ASLR-based offset https://en.wikipedia.org/wiki/Address_space_layout_randomization/*** When the dynamic linker loads an image, * the image must be mapped into the virtual address space of the process at an unoccupied address. * The dynamic linker accomplishes this by adding a value the virtual memory slide amount to the base address of the image. */

.

3.1 find the target image containing the address

There was a little surprise when I saw an API at first, but I couldn't use it on the iPhone:

extern bool _dyld_image_containing_address(const void* address)
__OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_3,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);

So you have to make your own judgment.

How to judge?

A segment defines a range of bytes in a Mach-O file and the addresses and memory protection attributes at which those bytes are mapped into virtual memory when the dynamic linker loads the application. as such, segments are always virtual memory page aligned. A segment contains zero or more sections.

Traverse each segment to determine whether the target address falls within the range of the segment:

/** The segment load command indicates that a part of this file is to be * mapped into the task's address space. the size of this segment in memory, * vmsize, maybe equal to or larger than the amount to map from this file, * filesize. the file is mapped starting at fileoff to the beginning of * the segment in memory, vmaddr. the rest of the memory of the segment, * if any, is allocated zero fill on demand. the segment's maximum virtual * memory protection and initial virtual memory protection are specified * by the maxprot and initprot fields. if the segment has sections then the * section structures directly follow the segment command and their size is * reflected in seconds size. */struct segment_command {/* for 32-bit ubuntures */uint32_t cmd;/* LC_SEGMENT */uint32_t precise size;/* includes sizeof section structs */char segname [16]; /* segment name */uint32_t vmaddr;/* memory address of this segment */uint32_t vmsize;/* memory size of this segment */uint32_t fileoff; /* file offset of this segment */uint32_t filesize;/* amount to map from the file */vm_prot_t maxprot;/* maximum VM protection */vm_prot_t initprot; /* initial VM protection */uint32_t nsects;/* number of sections in segment */uint32_t flags;/* flags */}; /*** @ brief: determines whether a segment_command contains the addr address, and determines */bool jdy_segmentContainsAddress (const struct load_command * mongoptr, const uintptr_t addr) based on the virtual address and segment size of the seg) {if (partition PTR-> cmd = LC_SEGMENT) {struct segment_command * segPtr = (struct segment_command *) Partition PTR; if (addr> = segPtr-> vmaddr & addr <(segPtr-> vmaddr + segPtr-> vmsize) {return true ;}

In this way, we can find the image file containing the target address.

3.2 locate the symbol table of the Target Image

Because the collection of symbols and the creation of symbol tables run through the compilation and link stages, we will not expand it here, but only determine that apart from code snippet _ TEXT and DATA segment DATA, there is also a _ LINKEDIT segment containing the symbol table:

The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.

So now we need to first locate the _ LINKEDIT Section, also from the official Apple documentation:

Segments and sections are normally accessed by name. Segments, by convention, are named using all uppercase letters preceded by two underscores (for example, _TEXT); sections should be named using all lowercase letters preceded by two underscores (for example, _text). This naming convention is standard, although not required for the tools to operate correctly.

We traverse each segment and compare whether the segment name is the same as _ LINKEDIT:

usr/include/mach-o/loader.h#define SEG_LINKEDIT    __LINKEDIT

Next we will look at the symbol table:

/*** From The Mac Hacker's Handbook: * The LC_SYMTAB load command describes where to find the string and symbol tables within the _ LINKEDIT segment. the offsets given are file offsets, so you subtract the file offset of the _ LINKEDIT segment to obtain the virtual memory offset of the string and symbol tables. adding the virtual memory offset to the virtual-memory address where the _ LINKEDIT segment is loaded will give you the in-memory location of the string and sym-bol tables. */

In other words, we need to combine _ LINKEDIT segment_command (see the structure description above) and LC_SYMTAB load_command (see the structure description below) to locate the symbol table:

/* * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD * stab style symbol table information as described in the header files * 
  
    and 
   
    . */struct symtab_command {    uint32_t    cmd;        /* LC_SYMTAB */    uint32_t    cmdsize;    /* sizeof(struct symtab_command) */    uint32_t    symoff;     /* symbol table offset */    uint32_t    nsyms;      /* number of symbol table entries */    uint32_t    stroff;     /* string table offset */    uint32_t    strsize;    /* string table size in bytes */};
   
  

As described above, the offsets in LC_SYMTAB and _ LINKEDIT are all file offsets. Therefore, you need to obtain the addresses of the symbol tables and string tables in the memory, first, we will subtract the fileoff of LINKEDIT from symoff and stroff of LC_SYMTAB to get the virtual address offset, and then add the vmoffset of _ LINKEDIT to get the virtual address. Of course, to get the final actual memory address, you also need to add the ASLR-based offset.

3.3 find the most matched symbol with the target address in the symbol table

Finally, I found the symbol table. It was a little tired to write it here. I directly pasted the Code:

/*** @ Brief matches the most appropriate symbol for the address in the specified symbol table. The address here must be subtracted from vmaddr_slide */const JDY_SymbolTableEntry * Minus (uintptr_t addr, JDY_SymbolTableEntry * symbolTable, uint32_t nsyms) {// 1. addr> = symbol. value; Because addr is an instruction address in a function, it should be greater than or equal to the entry address of the function, that is, the value of the corresponding symbol; // 2. symbol. value is nearest to addr; the function entry address closer to the instruction address addr is a more accurate match. const JDY_SymbolTableEntry * nearestSymbol = NULL; uintptr_t currentDistance = UINT32_MAX; for (uint32_t symIndex = 0; symIndex <nsyms; symIndex ++) {uintptr_t symbolValue = symbolTable [symIndex]. n_value; if (symbolValue> 0) {uintptr_t symbolDistance = addr-symbolValue; if (symbolValue <= addr & symbolDistance <= currentDistance) {currentDistance = symbolDistance; nearestSymbol = symbolTable + symIndex ;}} return nearestSymbol;}/** This is the symbol table entry structure for 64-bit ubuntures. */struct nlist_64 {union {uint32_t n_strx;/* index into the string table */} n_un; uint8_t n_type;/* type flag, see below */uint8_t n_sect; /* section number or NO_SECT */uint16_t n_desc;/* see
  
   
*/Uint64_t n_value;/* value of this symbol (or stab offset )*/};
  

After finding the matched nlist structure, we can use. n_un.n_strx to locate the corresponding symbol names in the string table.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.