iOS中線程Call Stack的捕獲和解析(二)

來源:互聯網
上載者:User

iOS中線程Call Stack的捕獲和解析(二)

 

1. 部分參考資料

做這一塊時也是查閱了很多連結和書籍,包括但不限於:

《OS X ABI Mach-O File Format Reference》

《Mach-O Programming Topics》

《程式員的自我修養》——這本幾年前讀過的,又一次從書架上拿下來溫習,主要是用來對比確認;

《The Mac Hacker’s Handbook》

《Mac OS X and iOS Internals》

以及很多Google Search。

2. 相關API和資料結構

由於我們在上面回溯線程調用棧拿到的是一組地址,所以這裡進行符號化的輸入輸出應該分別是地址和符號,介面設計類似如下:

- (NSString *)symbolicateAddress:(uintptr_t)addr;

不過在實際操作中,我們需要依賴於dyld相關方法和資料結構:

/* * Structure filled in by dladdr(). */typedef struct dl_info {        const char      *dli_fname;     /* Pathname of shared object */        void            *dli_fbase;     /* Base address of shared object */        const char      *dli_sname;     /* Name of nearest symbol */        void            *dli_saddr;     /* Address of nearest symbol */} Dl_info;extern int dladdr(const void *, Dl_info *);DESCRIPTION     These routines provide additional introspection of dyld beyond that provided by dlopen() and dladdr()     _dyld_image_count() returns the current number of images mapped in by dyld. Note that using this count     to iterate all images is not thread safe, because another thread may be adding or removing images dur-ing during     ing the iteration.     _dyld_get_image_header() returns a pointer to the mach header of the image indexed by image_index.  If     image_index is out of range, NULL is returned.     _dyld_get_image_vmaddr_slide() returns the virtural memory address slide amount of the image indexed by     image_index. If image_index is out of range zero is returned.     _dyld_get_image_name() returns the name of the image indexed by image_index. The C-string continues to     be owned by dyld and should not deleted.  If image_index is out of range NULL is returned.

又為了要判斷此次解析是否成功,所以介面設計演變為:

bool jdy_symbolicateAddress(const uintptr_t addr, Dl_info *info)

Dl_info用來填充解析的結果。

3. 演算法思路

對一個地址進行符號化解析說起來也是比較直接的,就是找到地址所屬的記憶體鏡像,然後定位該鏡像中的符號表,最後從符號表中匹配目標地址的符號。

vc7EtbWjqTwvcD4NCjxwPtLUz8LLvMK3ysfD6Mr20ru49rTz1sK1xLe9z/KjrLKiw7vT0LqtuMe+38zltcTPuL3ao6yxyMjnu/nT2kFTTFK1xMar0sbBv6O6PC9wPg0KPHByZSBjbGFzcz0="brush:java;"> // 基於ASLR的位移量https://en.wikipedia.org/wiki/Address_space_layout_randomization /** * When the dynamic linker loads an image, * the image must be mapped into the virtual address space of the process at an unoccupied address. * The dynamic linker accomplishes this by adding a value the virtual memory slide amount to the base address of the image. */

3.1 尋找包含地址的目標鏡像

起初看到一個API還有點小驚喜,可惜iPhone上用不了:

extern bool _dyld_image_containing_address(const void* address)
__OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_3,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);

所以得自己來判斷。

怎麼判斷呢?

A segment defines a range of bytes in a Mach-O file and the addresses and memory protection attributes at which those bytes are mapped into virtual memory when the dynamic linker loads the application. As such, segments are always virtual memory page aligned. A segment contains zero or more sections.

通過遍曆每個段,判斷目標地址是否落在該段包含的範圍內:

/* * The segment load command indicates that a part of this file is to be * mapped into the task's address space.  The size of this segment in memory, * vmsize, maybe equal to or larger than the amount to map from this file, * filesize.  The file is mapped starting at fileoff to the beginning of * the segment in memory, vmaddr.  The rest of the memory of the segment, * if any, is allocated zero fill on demand.  The segment's maximum virtual * memory protection and initial virtual memory protection are specified * by the maxprot and initprot fields.  If the segment has sections then the * section structures directly follow the segment command and their size is * reflected in cmdsize. */struct segment_command { /* for 32-bit architectures */    uint32_t    cmd;        /* LC_SEGMENT */    uint32_t    cmdsize;    /* includes sizeof section structs */    char        segname[16];    /* segment name */    uint32_t    vmaddr;     /* memory address of this segment */    uint32_t    vmsize;     /* memory size of this segment */    uint32_t    fileoff;    /* file offset of this segment */    uint32_t    filesize;   /* amount to map from the file */    vm_prot_t   maxprot;    /* maximum VM protection */    vm_prot_t   initprot;   /* initial VM protection */    uint32_t    nsects;     /* number of sections in segment */    uint32_t    flags;      /* flags */};/** * @brief 判斷某個segment_command是否包含addr這個地址,基於segment的虛擬位址和段大小來判斷 */bool jdy_segmentContainsAddress(const struct load_command *cmdPtr, const uintptr_t addr) {    if (cmdPtr->cmd == LC_SEGMENT) {        struct segment_command *segPtr = (struct segment_command *)cmdPtr;        if (addr >= segPtr->vmaddr && addr < (segPtr->vmaddr + segPtr->vmsize)) {            return true;        }

這樣一來,我們就可以找到包含目標地址的鏡像檔案了。

3.2 定位目標鏡像的符號表

由於符號的收集和符號表的建立貫穿著編譯和連結階段,這裡就不展開了,而是只要確定除了程式碼片段_TEXT和資料區段DATA外,還有個_LINKEDIT段包含符號表:

The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.

所以現在我們需要先定位到__LINKEDIT段,同樣摘自蘋果官方文檔:

Segments and sections are normally accessed by name. Segments, by convention, are named using all uppercase letters preceded by two underscores (for example, _TEXT); sections should be named using all lowercase letters preceded by two underscores (for example, _text). This naming convention is standard, although not required for the tools to operate correctly.

我們通過遍曆每個段,比較段名稱是否和__LINKEDIT相同:

usr/include/mach-o/loader.h#define SEG_LINKEDIT    __LINKEDIT

接著來找符號表:

/** * 摘自《The Mac Hacker's Handbook》: * The LC_SYMTAB load command describes where to find the string and symbol tables within the __LINKEDIT segment. The offsets given are file offsets, so you subtract the file offset of the __LINKEDIT segment to obtain the virtual memory offset of the string and symbol tables. Adding the virtual memory offset to the virtual-memory address where the __LINKEDIT segment is loaded will give you the in-memory location of the string and sym- bol tables. */

也就是說,我們需要結合__LINKEDIT segment_command(見上面結構描述)和LC_SYMTAB load_command(見下面結構描述)來定位器號表:

/* * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD * stab style symbol table information as described in the header files *  and . */struct symtab_command {    uint32_t    cmd;        /* LC_SYMTAB */    uint32_t    cmdsize;    /* sizeof(struct symtab_command) */    uint32_t    symoff;     /* symbol table offset */    uint32_t    nsyms;      /* number of symbol table entries */    uint32_t    stroff;     /* string table offset */    uint32_t    strsize;    /* string table size in bytes */};

如上述引用描述,LC_SYMTAB和_LINKEDIT中的位移量都是檔案位移量,所以要獲得記憶體中符號表和字串表的地址,我們先將LC_SYMTAB的symoff和stroff分別減去LINKEDIT的fileoff得到虛擬位址位移量,然後再加上_LINKEDIT的vmoffset得到虛擬位址。當然,要得到最終的實際記憶體位址,還需要加上基於ASLR的位移量。

3.3 在符號表中尋找和目標地址最匹配的符號

終於找到符號表了,寫到這裡有點小累,直接貼下代碼:

/** * @brief 在指定的符號表中為地址匹配最合適的符號,這裡的地址需要減去vmaddr_slide */const JDY_SymbolTableEntry *jdy_findBestMatchSymbolForAddress(uintptr_t addr,                                                              JDY_SymbolTableEntry *symbolTable,                                                              uint32_t nsyms) {    // 1. addr >= symbol.value; 因為addr是某個函數中的一條指令地址,它應該大於等於這個函數的入口地址,也就是對應符號的值;    // 2. symbol.value is nearest to addr; 離指令地址addr更近的函數入口地址,才是更準確的匹配項;    const JDY_SymbolTableEntry *nearestSymbol = NULL;    uintptr_t currentDistance = UINT32_MAX;    for (uint32_t symIndex = 0; symIndex < nsyms; symIndex++) {        uintptr_t symbolValue = symbolTable[symIndex].n_value;        if (symbolValue > 0) {            uintptr_t symbolDistance = addr - symbolValue;            if (symbolValue <= addr && symbolDistance <= currentDistance) {                currentDistance = symbolDistance;                nearestSymbol = symbolTable + symIndex;            }        }    }    return nearestSymbol;}/* * This is the symbol table entry structure for 64-bit architectures. */struct nlist_64 {    union {        uint32_t  n_strx; /* index into the string table */    } n_un;    uint8_t n_type;        /* type flag, see below */    uint8_t n_sect;        /* section number or NO_SECT */    uint16_t n_desc;       /* see  */    uint64_t n_value;      /* value of this symbol (or stab offset) */};

找到匹配的nlist結構後,我們可以通過.n_un.n_strx來定位字串表中相應的符號名。

 

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.