Android-based Elf Plt/got symbol redirection process and elf hook implementation--by Low-end code farm 2014.10.27 Introduction
There are two main reasons for writing this technical article:
- One is to find that most of the articles on the Web describing the Plt/got symbol redirection process are aimed at x86, such as "redirecting functions in shared ELF libraries" is very well written. Although the process is very similar to arm, but due to the different CPU system, the implementation of the instructions is very large;
- The second is the introduction of most of the online elf file format, based on the link view (linking view), the link view is based on sections (section) of the ELF parsing. However, in the process of loading the dynamic link library, linker only focuses on the segment (Segment) information in the ELF. Therefore, the section information in the elf is completely tampered with or even deleted, and does not affect the linker loading process, which can prevent static analysis tools (such as ida,readelf, etc.) to analyze it, generally add shell of the elf file will have this aspect of processing. For this elf file, if you want to implement the hook function, you must be based on the execution view (execution view) for symbolic resolution;
Get ready
Before reading down, make sure you have a general understanding of the Elf file format and arm assembly, and refer to the guide:
- ELF file format analysis;
- ARM documentation;
Preparation tools:
- Readelf (NDK included)
- Objdump (NDK included)
- IDA Pro 6.4 or above
- Android Real Machine or simulator
Symbol redirection
On ARM, there are three main types of redirects, namely R_arm_jump_slot,r_arm_abs32 and R_arm_glob_dat, and we want to hook the ELF function, You need to handle these three types of redirection at the same time.
Example
Look at the sample code first
typedef int (*strlen_fun)(const char *);strlen_fun global_strlen1 = (strlen_fun)strlen;strlen_fun global_strlen2 = (strlen_fun)strlen;#define SHOW(x) LOGI("%s is %d", #x, x)extern "C" jint Java_com_example_allhookinone_HookUtils_elfhook(JNIEnv *env, jobject thiz){ const char *str = "helloworld"; strlen_fun local_strlen1 = (strlen_fun)strlen; strlen_fun local_strlen2 = (strlen_fun)strlen; int len0 = global_strlen1(str); int len1 = global_strlen2(str); int len2 = local_strlen1(str); int len3 = local_strlen2(str); int len4 = strlen(str); int len5 = strlen(str); SHOW(len0); SHOW(len1); SHOW(len2); SHOW(len3); SHOW(len4); SHOW(len5); return 0;}
This code calls strlen in three different ways, namely, global function pointers, local function pointers, and direct calls, and we analyze three invocation analyses, respectively, for this example.
First, with Readelf, let's look at the redirect table as follows:
Relocation section '. Rel.dyn ' at offset 0x2a48 contains entries:offset Info Type sym.value Sym. Nam E0000ade0 00000017 r_arm_relative 0000af00 00000017 r_arm_relative 0000af0c 00000017 R_ARM_RELATIVE 0000af10 000 00017 r_arm_relative 0000af18 00000017 r_arm_relative 0000af1c 00000017 r_arm_relative 0000af20 00000017 R_ARM_RE lative 0000af24 00000017 r_arm_relative 0000af28 00000017 r_arm_relative 0000af30 00000017 R_ARM_RELATIVE 0000a EFC 00003215 R_arm_glob_dat 00000000 __stack_chk_guard0000af04 00003715 R_arm_glob_dat 00000000 __page_size000 0af08 00004e15 R_arm_glob_dat 00000000 strlen0000b004 00004e02 R_arm_abs32 00000000 strlen0000b008 00004e0 2 R_arm_abs32 00000000 Strlen0000af14 00006615 R_arm_glob_dat 00000000 __gnu_unwind_find_exid0000af2c 00007 415 R_arm_glob_dat 00000000 __cxa_call_unexpected ... Relocation section '. Rel.plt ' at offset 0x2ad0 contains entries:offset Info Type sym.value Sym. Name0000af40 00000216 R_arm_jump_slot 00000000 __cxa_atexit0000af44 00000116 R_ARM_ Jump_slot 00000000 __cxa_finalize0000af48 00001716 R_arm_jump_slot 00000000 memcpy ... 0000afd4 00004c16 R_arm_jump_slot 00000000 fgets0000afd8 00004d16 R_arm_jump_slot 00000000 FCLOSE0000AFDC 00004 E16 R_arm_jump_slot 00000000 strlen0000afe0 00004f16 R_arm_jump_slot 00000000 strncmp ...
In the two sections of. Rel.plt and. Rel.dyn, we found that there were altogether 4 strlen, and we first recorded their key information, which is very useful later on. Each of them is
. Rel.dyn 0000af08 R_arm_glob_dat
. Rel.dyn 0000b004 R_ARM_ABS32.rel.dyn 0000b008 R_ARM_ABS32.rel.plt 0000AFDC r_arm_jump_slot
In the code, we called 6 strlen, but why did it only occur 4 times? In addition, how do they correspond to each other, with these questions to analyze the assembly code. To drag the compiled so to Ida, we see the instructions for the sample code:
. TEXT:000050BC EXPORT JAVA_COM_EXAMPLE_ALLHOOKINONE_HOOKUTILS_ELFHOOK.TEXT:000050BC Java_com_example_allho OKINONE_HOOKUTILS_ELFHOOK.TEXT:000050BC.TEXT:000050BC var_40 = -0X40.TEXT:000050BC var_38 = -0x38.text:0 00050BC var_34 = -0X34.TEXT:000050BC s = -0X2C.TEXT:000050BC var_28 = -0X28.TEXT:000050BC var_24 = -0X24.TEXT:000050BC Var_20 = -0X20.TEXT:000050BC var_1c = -0X1C.TEXT:000050BC var_18 = -0X18.TEXT:000050BC Var_14 = -0X14.TEXT:000050BC var_10 = -0X10.TEXT:000050BC var_c = -0XC.TEXT:000050BC.TEXT:000050BC PUSH {r4,lr}.text:000050be SUB S P, SP, #0x38. text:000050c0 str R0, [SP, #0x40 +var_34].text:000050c2 str R1, [SP, #0x40 +var_38].text:000050c4 LDR R4, = (_global_offset_table_-0x50ca). text:000050 C6 ADD R4, PC; _global_offset_table_.text:000050c8 LDR R3, = (AHELLOWORLD-0X50CE). Text:000050ca ADD R3, PC; "HelloWorld". text:000050cc STR R3, [SP, #0x40 +S].TEXT:000050CE LDR R3, = (strlen_ptr-0xaf34). text:000050d0 LDR R3, [R4,R3]; __imp_strlen.text:000050d2 STR R3, [SP, #0x40 +var_28].text:000050d4 LDR R3, = (strlen_ptr-0xaf34). Text:000050d6 LDR R3, [R4,R3]; __imp_strlen.text:000050d8 STR R3, [SP, #0x40 +var_24].text:000050da LDR R3, = (global_strlen1_ptr-0xaf34). TEXT:000050DC LDR R3, [R4,R3]; Global_strlen1.text:000050de Ldr R3, [R3].text:000050e0 Ldr R2, [S P, #0x40 +s].text:000050e2 MOVS R0,R2.text:000050e4 BLX r3.text:000050e6 MOVS R3, R0.text:000050e8 STR R3, [SP, #0x40 +var_20].text:000050ea LDR R3, = (global_strlen2_ptr -0xaf34). Text:000050ec LDR R3, [R4,R3]; Global_strlen2.text:000050ee Ldr R3, [r3].text:000050f0 Ldr R2, [S P, #0x40 +s].text:000050f2 MOVS R0, R2.text:000050f4 BLX r3.text:0000 50f6 MOVS R3, R0.text:000050f8 STR R3, [SP, #0x40 +var_1c].text:00005 0FA Ldr R2, [sp, #0x40 +s].text:000050fc LDR R3, [sp, #0x40 +var_28].t Ext:000050fe MOVS R0, r2.text:00005100 BLX r3.text:00005102 MOVS R3, r0.text:00005104 STR R3, [sp, #0x40 +var_18].text:00005106 Ldr R2, [sp, #0x40 +s].text:00005108 LDR R3, [SP, #0x40 +var_24].text:0000510a MOVS R0, r2.text:0000510c BLX r3.text:0000510e MOVS R3, r0.text:00005110 STR R3, [SP, #0 x40+var_14].text:00005112 LDR R3, [SP, #0x40 +s].text:00005114 MOVS R 0, R3; s.text:00005116 BLX strlen.text:0000511a MOVS R3, r0.text:0000511c STR R3, [sp, #0x40 +var_10].text:0000511e LDR R3, [sp, #0x40 +s].text: 00005120 MOVS R0, R3; s.text:00005122 BLX strlen.text:00005126 MOVS R3, R0 .... TEXT:000051CA ADD sp, SP, #0x38. text:000051cc POP {r4,pc}.text:000051cc; End of function Java_com_example_allhookinone_hookutils_elfhook
Find out some important addresses first, they are
- global_offset_table: 0x0000af34
- Strlen_ptr:0x0000af08
- __imp_strlen:0x0000b0c8
- global_strlen1_ptr:0x0000af0c
- global_strlen1:0x0000b004
- Global_strlen2_ptr:0x0000af10
- global_strlen2:0x0000b008
global function pointers Call external functions
Global_strlen1 and Global_strlen2 call, corresponding to 0x000050e4 and 0x000050f4 two BLX instructions, by calculating the final R3 values are *global_strlen1 and *global_strlen2, respectively, The values of global_strlen1 and global_strlen2 correspond exactly to the two R_ARM_ABS32 relocation items located in. Rel.dyn, so we conclude that the external function is called by means of a global function pointer, and its relocation type is R_arm _abs32, and is located in the. Rel.dyn section area .
We only analyze Global_strlen1 's invocation process, first to Global_strlen1_ptr (0X0000AF0C), which is located in the. Got section, above theglobal_offset_table . It then navigates through the Global_strlen1_ptr to 0x0000b004 (located in the. Data section) and finally to the final function address through 0x0000b004, so r_arm_ The offset of the ABS32 relocation item points to the address of the final calling function address (that is, the pointer to the function pointer), and the entire relocation process is preceded by a. Got, and then from. Got to. Date. Here is the 16-binary representation fragment of the. Got segment:
...0000AF0C 04 B0 00 00 08 B0 00 00 DC B0 00 00 B4 87 00 000000AF1C F4 84 00 00 60 5B 00 00 58 5B 00 00 50 5B 00 000000AF2C EC B0 00 00 FC 8C 00 00 00 00 00 00 00 00 00 00...0000B004 C8 B0 00 00 C8 B0 00 00 ?? ?? ?? ?? ?? ?? ?? ??0000B014 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??0000B024 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00...0000B0C8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000B0D8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00...
Finally found that the 0X0000B0C8 address slice instructions are all 0, when dynamic link, linker will overwrite the value of 0x0000b004 address, point to the real address of strlen (instead of the current 0X0000B0C8, a bit around).
Local function pointers Call external functions
Local_strlen1 and Local_strlen2 call, corresponding to 0x00005100 and 0x0000510c two BLX instructions, by calculating the value of the final R3 is *strlen_prt, that is, 0x0000af08, Just corresponds to the R_arm_glob_dat relocation item in. Rel.dyn, so we conclude that by calling the external function with a local function pointer, the relocation type is R_arm_glob_dat and is located in the. Re.dyn section .
We only analyze Local_strlen1 's invocation process, first locating to STRLEN_PRT (0X0000AF08), which is located in the. Got section, aboveglobal_offset_table , and then through Strlen_ PRT, which locates to 0x0000b0c8, is the same as the result of the analysis above, so the r_arm_glob_dat of the re-entry points to the address of the final calling function address (that is, the pointer to the function pointer). Here is the 16-binary representation fragment of the. Got segment:
0000AF08 C8 B0 00 00 04 B0 00 00 08 B0 00 00 DC B0 00 000000AF18 B4 87 00 00 F4 84 00 00 60 5B 00 00 58 5B 00 000000AF28 50 5B 00 00 EC B0 00 00 FC 8C 00 00 00 00 00 00...0000B0C8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000B0D8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00...
Note that the 0x000050d8 instruction "STR R3, [SP, #0x40 +var_24]", where the real address of the function has been saved to the stack, so even if we modify the got table will not affect the value of the stack, Therefore, this relocation type cannot be hook by modifying the address.
Calling external functions directly
Finally, take a look at the direct invocation of strlen, corresponding to the BLX instructions at 0x0000511a and 0x00005122 two, and finally they all point to the. PLT section directive, as follows:
.plt:00002E38 ADR R12, 0x2E40.plt:00002E3C ADD R12, R12, #0x8000.plt:00002E40 LDR PC, [R12,#(strlen_ptr_0 - 0xAE40)]! ; __imp_strlen...0000AFDC C8 B0 00 00 CC B0 00 00 D0 B0 00 00 D4 B0 00 00 0000AFEC D8 B0 00 00 DC B0 00 00 E0 B0 00 00 E4 B0 00 00 0000AFFC E8 B0 00 00 00 00 00 00 C8 B0 00 00 C8 B0 00 00 ...
Finally, the PC points to *strlen_ptr_0, Strlen_ptr_0 's address 0X0000AFDC, which is located in the. Got section, and the 0X0000AFDC address value is exactly 0x0000b0c8, how familiar the figure. Therefore, it is concluded that the external function is called directly, its relocation type is r_arm_jump_slot, and it is located in the. Re.plt section, and its offset points to the address of the final calling function address (that is, the pointer to the function pointer). The whole process is first to. PLT, then to. Got, and finally to the real function address.
In this part of the analysis, there are some differences between Ida and Objdump, and the following is the assembly instruction through Objdump:
00002e38 <[email protected]>: 2e38: e28fc600 add ip, pc, #0, 12 2e3c: e28cca08 add ip, ip, #8, 20 ; 0x8000 2e40: e5bcf19c ldr pc, [ip, #412]! ; 0x19c...... afd8: 00002c50 andeq r2, r0, r0, asr ip afdc: 00002c50 andeq r2, r0, r0, asr ip afe0: 00002c50 andeq r2, r0, r0, asr ip afe4: 00002c50 andeq r2, r0, r0, asr ip
See AFDC address, point to is 0X00002C50, and 0X00002C50 exactly is plt[0], instructions are as follows:
00002c50 <[email protected]>: 2c50: e52de004 push {lr} ; (str lr, [sp, #-4]!) 2c54: e59fe004 ldr lr, [pc, #4] ; 2c60 <[email protected]> 2c58: e08fe00e add lr, pc, lr 2c5c: e5bef008 ldr pc, [lr, #8]! 2c60: 000082d4 ldrdeq r8, [r0], -r4
After executing the 2C5C command, the final PC points to 0x0000af3c, which is exactly global_offset_table + 8, or got[2], where we see the 0x0000af3c:
0000AF3C 00 00 00 00 28 B0 00 00 24 B0 00 00 2C B0 00 000000AF4C 30 B0 00 00 34 B0 00 00 38 B0 00 00 3C B0 00 00
It turns out that the function address pointed to in got[2] is actually 0, because the symbol bindings on Android do not support lazy binding, so when so is loaded, linker will pre-got[n] (n>=2) The corresponding functions are found in advance, so here got[2] Code will not actually be executed, so there is no complete plt/got link process on Android today. Guess this is mainly due to stability considerations.
Summarize
Although Ida and obudump two tools decompile the instructions in the Plt\got process some differences, but for Android, this difference does not affect, because the lazy binding is not supported on Android. At the same time we come to a very important conclusion:R_arm_abs32, R_arm_glob_dat, and R_arm_jump_slot are not the same in code, but their offset is a pointer to a pointer to a function. This is very useful for us to elfhook below.
Parsing elf based on Execution view
Redirecting functions in shared ELF Libraries This article provides an example of how ELF is parsed based on a link view, which is basically the same as when parsing based on an execution view. The key is to find the. Dynsym,. Dynstr,. Rel.plt, and Rel.dyn, and their number of entries through segment.
For the first time, a segment of type pt_dynamic is found by the Program Header table, which corresponds to. dynamic, which corresponds to an array of type Elf32_dyn, with the following structure as follows:
/* Dynamic structure */typedef struct { Elf32_Sword d_tag; /* controls meaning of d_val */ union { Elf32_Word d_val; /* Multiple meanings - see d_tag */ Elf32_Addr d_ptr; /* program virtual address */ } d_un;} Elf32_Dyn;
By iterating through this array, we can find all the information we need, and I'll list the corresponding relationships:
- Dt_hash. HASH
- Dt_symtab & Dt_syment-Dynsym
- Dt_strtab & Dt_strsz-Dynstr
- Pltrel (decide REL or RelA) & (Dt_rel | Dt_rela) & (Dt_relsz | Dt_relasz) & (Dt_relent | dt_relaent). Rel.dyn
- Dt_jmprel & Dt_pltrelsz & (Dt_relent | dt_relaent). rel.plt
- Fini_array & Fini_arraysz-Fini_array
- Init_array & Init_arraysz-Init_array
Here is the relevant code for the lookup:
void Getelfinfobysegmentview (Elfinfo &info, const elfhandle *handle) {info.handle = handle; Info.elf_base = (uint8_t *) handle->base; INFO.EHDR = Reinterpret_cast<elf32_ehdr *> (info.elf_base); May is wrong INFO.SHDR = Reinterpret_cast<elf32_shdr *> (info.elf_base + info.ehdr->e_shoff); INFO.PHDR = Reinterpret_cast<elf32_phdr *> (info.elf_base + info.ehdr->e_phoff); Info.shstr = NULL; Elf32_phdr *dynamic = NULL; Elf32_word size = 0; Getsegmentinfo (info, pt_dynamic, &dynamic, &size, &info.dyn); if (!dynamic) {LOGE ("[-] could ' t find pt_dynamic segment"); Exit (-1); } Info.dynsz = size/sizeof (Elf32_dyn); Elf32_dyn *dyn = Info.dyn; for (int. i=0; i<info.dynsz; i++, dyn++) {switch (Dyn->d_tag) {Case DT_SYMTAB:info.sym = Rein Terpret_cast<elf32_sym *> (info.elf_base + dyn->d_un.d_ptr); Break Case DT_STRTAB:INFO.SYMSTR = ReintErpret_cast<const Char *> (info.elf_base + dyn->d_un.d_ptr); Break Case DT_REL:info.reldyn = Reinterpret_cast<elf32_rel *> (info.elf_base + dyn->d_un.d_ptr); Break Case DT_RELSZ:info.reldynsz = dyn->d_un.d_val/sizeof (Elf32_rel); Break Case DT_JMPREL:INFO.RELPLT = Reinterpret_cast<elf32_rel *> (info.elf_base + dyn->d_un.d_ptr); Break Case DT_PLTRELSZ:info.relpltsz = dyn->d_un.d_val/sizeof (Elf32_rel); Break Case dt_hash:uint32_t *rawdata = reinterpret_cast<uint32_t *> (info.elf_base + dyn->d_un.d_ptr); Info.nbucket = rawdata[0]; Info.nchain = rawdata[1]; Info.bucket = RawData + 2; Info.chain = Info.bucket + info.nbucket; Break }}//because. Dynsym is next to. dynstr, so we can caculate the symsz simply Info.symsz = ((uint32_t) Info.symstr-(uint32_t) info.sym)/sizeof (elf32_sym);}
However, there is a value I can not get through the pt_dynamic segment, that is. The number of dynsym, which I finally get through a workaround. Because the. Dynsym and. Dynstr two sections are adjacent, they are subtracted from the two addresses and can be obtained. The total length of the dynsym, in addition to sizeof (ELF32_SYM) can be obtained. The number of items Dynsym, if you have a better way, please tell me.
ELF Hook
With the introduction above, it is very simple to write an elf hook, and I post the key code:
#define R_ARM_ABS32 0x02#define r_arm_glob_dat 0x15#define r_arm_jump_slot 0x16int elfhook (const char *soname, const char *symbol, void *replace_func, void **old_func) {assert (Old_func); ASSERT (Replace_func); ASSERT (symbol); elfhandle* handle = Openelfbysoname (soname); Elfinfo info; Getelfinfobysegmentview (info, handle); Elf32_sym *sym = NULL; int symidx = 0; Findsymbyname (info, symbol, &SYM, &SYMIDX); if (!sym) {LOGE ("[-] Could not find symbol%s", symbol); Goto fails; }else{Logi ("[+] sym%p, symidx%d.", Sym, SYMIDX); } for (int i = 0; i < Info.relpltsz; i++) {elf32_rel& Rel = info.relplt[i]; if (Elf32_r_sym (rel.r_info) = = Symidx && elf32_r_type (rel.r_info) = = R_arm_jump_slot) {void *addr = (v OID *) (info.elf_base + rel.r_offset); if (Replacefunc (addr, Replace_func, Old_func)) goto fails; only once break; }} for(int i = 0; i < Info.reldynsz; i++) {elf32_rel& Rel = info.reldyn[i]; if (Elf32_r_sym (rel.r_info) = = Symidx && (elf32_r_type (rel.r_info) = = R_arm_abs32 || Elf32_r_type (rel.r_info) = = R_arm_glob_dat)) {void *addr = (void *) (info.elf_base + rel.r_offset); if (Replacefunc (addr, Replace_func, Old_func)) goto fails; }} fails:closeelfbysoname (handle); return 0;}
The
Finally is the code for the test:
typedef int (*strlen_fun) (const char *), strlen_fun Old_strlen = null;size_t my_strlen (const char *str) {logi ("strlen WA s called. "); int len = Old_strlen (str); Return len * 2;} Strlen_fun global_strlen1 = (strlen_fun) strlen;strlen_fun global_strlen2 = (strlen_fun) strlen; #define SHOW (x) LOGI ("%s Is%d ", #x, X) extern" C "Jint java_com_example_allhookinone_hookutils_elfhook (jnienv *env, Jobject thiz) {const char *s TR = "HelloWorld"; Strlen_fun local_strlen1 = (strlen_fun) strlen; Strlen_fun local_strlen2 = (strlen_fun) strlen; int len0 = GLOBAL_STRLEN1 (str); int len1 = GLOBAL_STRLEN2 (str); int len2 = LOCAL_STRLEN1 (str); int len3 = LOCAL_STRLEN2 (str); int len4 = strlen (str); int len5 = strlen (str); Logi ("Hook Before:"); SHOW (Len0); SHOW (LEN1); SHOW (LEN2); SHOW (LEN3); SHOW (LEN4); SHOW (LEN5); Elfhook ("libonehook.so", "strlen", (void *) My_strlen, (void * *) &old_strlen); Len0 = Global_strlen1 (str); LEN1 = Global_strlen2 (str);Len2 = Local_strlen1 (str); Len3 = Local_strlen2 (str); LEN4 = strlen (str); LEN5 = strlen (str); Logi ("Hook after:"); SHOW (Len0); SHOW (LEN1); SHOW (LEN2); SHOW (LEN3); SHOW (LEN4); SHOW (LEN5); return 0;}
From the printed results can be found, local_strlen1 and Local_strlen2 is said above, and not affected, but if the function is called again, it takes effect, the reason is not resolved. The test results will not be sent, leave you to try it.
Githup Address
Complete code, see Https://github.com/boyliang/AllHookInOne.git
Android-based Elf Plt/got symbol redirection process and elf hook implementation