Android reverse journey --- explanation of the SO (ELF) File Format
1. Preface
Starting from today, we have officially started our Android reverse journey. We should be familiar with reverse engineering knowledge. The reverse engineering field is a challenging and mysterious field. As an Android developer, everyone wants to explore this field, because once you crack other people's content, your sense of accomplishment will surely burst. But on the contrary, we should not only study the way to crack, we also need to study the encryption method, because encryption and cracking are mutually exclusive. However, the native layer, that is, the so file cracking, may be the biggest headache During the cracking process. So let's take a closer look at the content of the so file. Let's take a look at what we will introduce today. Today, we will first introduce the elf File Format, because we know that the so file in Android is an elf file, so we need to understand the so file format first, to learn more about an elf file, you can manually write a tool class to parse an elf file.
2. Prepare materials
We need to know the elf File Format. For details about the elf file format, we already have a lot of Introduction Information on the Internet. I will not explain it too much here. However, there are two materials that need to be introduced, because the content on the Internet is really a lot of complicated. These two materials are the most comprehensive and the best. I just read these two materials to operate:
The first piece of information is a classic story of a non-worm brother:
Look, isn't it super detailed? When we use Java code to parse the elf file, we will follow this figure. However, some data structures in this figure are not very clear, so the second data source comes.
Article 2: Standard Edition of Peking University Laboratory
Http://download.csdn.net/detail/jiangwei0910410003/9204051
This document is not described in detail here. It will be explained later when parsing.
For the above two documents, we have to read them carefully. This is a classic. It is also the basis of subsequent work.
Third, Tools
Of course, we also need to introduce a tool here, because this tool is also very useful when parsing the elf File below, and it is a template for checking our parsing the elf File.
Is very famous:ReadelfCommand
However, this command cannot be used in the Window, because it is Linux, so we have to do a job to install it.Cygwin. For the installation of this tool, you can refer to this article:
Http://blog.csdn.net/jiangwei0910410003/article/details/17710243
However, during the download process, I am worried that the children will encounter setbacks, so I am very considerate and put it in the cloud Disk:
Http://pan.baidu.com/s/1C1Zci
After downloading, you need to change one item to use it:
This file:
This path should be changed to the bin directory path in your local cygwin64, otherwise it will run incorrectly. After modification, run Cygwin. bat.
We will not introduce the readelf tool in detail here. We will only introduce the commands we will use:
1. readelf-h xxx. so
View the header information of the so file
2. readelf-S xxx. so
View the Section header of the so file
3. readelf-l xxx. so
View the Program segment header information of the so file (Program)
4. readelf-a xxx. so
View All content of the so file
There are a lot of command usage, so I will not elaborate on it here. There are many introductions on the Internet ~~
4. parse the Elf File by actual operations (Java code & C ++ code)
Above we introduced the elf File Format information, elf File tool, then we will come to the actual operation, to use Java code hand parsing a libhello-jni.so file. About this libhello-jni.so file:
Http://download.csdn.net/detail/jiangwei0910410003/9204087
1. First define the structure content in the elf File
We need to refer to the header file format of elf. h. This file is also available on the Internet. Here is a download link:
Http://download.csdn.net/detail/jiangwei0910410003/9204081
Let's take a look at the data structure class of the elf File defined in Java:
Package com. demo. parseso; import java. util. ArrayList; public class ElfType32 {public elf32_rel; public elf32_rela rela; public ArrayList
SymList = new ArrayList
(); Public elf32_hdr; // elf header information public ArrayList
PhdrList = new ArrayList
(); // There may be multiple program headers public ArrayList
ShdrList = new ArrayList
(); // There may be multiple field headers public ArrayList
StrtbList = new ArrayList
(); // There may be multiple string values: public ElfType32 () {rel = new elf32_rel (); rela = new elf32_rela (); hdr = new elf32_hdr ();} /*** typedef struct elf32_rel {Elf32_Addrr_offset; Elf32_Wordr_info;} Elf32_Rel; **/public class elf32_rel {public byte [] r_offset = new byte [4]; public byte [] r_info = new byte [4]; @ Overridepublic String toString () {return r_offset: + Utils. bytes2HexString (r_offset) +; r_info: + Utils. bytes2HexString (r_info) ;}}/*** typedef struct elf32_rela {Elf32_Addrr_offset; Elf32_Wordr_info; Elf32_Swordr_addend;} Elf32_Rela; */public class elf32_rela {public byte [] r_offset = new byte [4]; public byte [] r_info = new byte [4]; public byte [] r_addend = new byte [4]; @ Overridepublic String toString () {return r_offset: + Utils. bytes2HexString (r_offset) +; r_info: + Utils. bytes2HexString (r_info) +; r_addend: + Utils. bytes2HexString (r_info) ;}}/*** typedef struct elf32_sym {Elf32_Wordst_name; struct; role; unsigned charst_info; unsigned charst_other; role;} Elf32_Sym; */public static class Elf32_Sym {public byte [] st_name = new byte [4]; public byte [] st_value = new byte [4]; public byte [] st_size = new byte [4]; public byte st_info; public byte st_other; public byte [] st_shndx = new byte [2]; @ Overridepublic String toString () {return st_name: + Utils. bytes2HexString (st_name) + st_value: + Utils. bytes2HexString (st_value) + st_size: + Utils. bytes2HexString (st_size) + st_info: + (st_info/16) + st_other: + (short) st_other) & 0xF) + st_shndx: + Utils. bytes2HexString (st_shndx) ;}} public void printSymList () {for (int I = 0; I
> 4) # define ELF_ST_TYPE (x) (unsigned int) x) & 0xf) * // *** typedef struct elf32_hdr {unsigned chare_ident [EI_NIDENT]; role; Elf32_Halfe_machine; elf32_Worde_version; Elf32_Addre_entry; // Entry point Elf32_Offe_phoff; Elf32_Offe_shoff; temperature; Elf32_Halfe_ehsize; temperature;} Elf32_Ehdr; */public class elf32_hdr {public byte [] e_ident = new byte [16]; public byte [] e_type = new byte [2]; public byte [] e_machine = new byte [2]; public byte [] e_version = new byte [4]; public byte [] e_entry = new byte [4]; public byte [] e_phoff = new byte [4]; public byte [] e_shoff = new byte [4]; public byte [] e_flags = new byte [4]; public byte [] e_ehsize = new byte [2]; public byte [] e_phentsize = new byte [2]; public byte [] e_phnum = new byte [2]; public byte [] e_shentsize = new byte [2]; public byte [] e_shnum = new byte [2]; public byte [] e_shstrndx = new byte [2]; @ Overridepublic String toString () {return magic: + Utils. bytes2HexString (e_ident) + e_type: + Utils. bytes2HexString (e_type) + e_machine: + Utils. bytes2HexString (e_machine) + e_version: + Utils. bytes2HexString (e_version) + e_entry: + Utils. bytes2HexString (e_entry) + e_phoff: + Utils. bytes2HexString (e_phoff) + e_shoff: + Utils. bytes2HexString (e_shoff) + e_flags: + Utils. bytes2HexString (e_flags) + e_ehsize: + Utils. bytes2HexString (e_ehsize) + e_phentsize: + Utils. bytes2HexString (e_phentsize) + e_phnum: + Utils. bytes2HexString (e_phnum) + e_shentsize: + Utils. bytes2HexString (e_shentsize) + e_shnum: + Utils. bytes2HexString (e_shnum) + e_shstrndx: + Utils. bytes2HexString (e_shstrndx) ;}}/*** typedef struct elf32_phdr {Elf32_Wordp_type; Elf32_Offp_offset; distance;} Elf32_Phdr; */public static class elf32_phdr {public byte [] p_type = new byte [4]; public byte [] p_offset = new byte [4]; public byte [] p_vaddr = new byte [4]; public byte [] p_paddr = new byte [4]; public byte [] p_filesz = new byte [4]; public byte [] p_memsz = new byte [4]; public byte [] p_flags = new byte [4]; public byte [] p_align = new byte [4]; @ Overridepublic String toString () {return p_type: + Utils. bytes2HexString (p_type) + p_offset: + Utils. bytes2HexString (p_offset) + p_vaddr: + Utils. bytes2HexString (p_vaddr) + p_paddr: + Utils. bytes2HexString (p_paddr) + p_filesz: + Utils. bytes2HexString (p_filesz) + p_memsz: + Utils. bytes2HexString (p_memsz) + p_flags: + Utils. bytes2HexString (p_flags) + p_align: + Utils. bytes2HexString (p_align) ;}} public void printPhdrList () {for (int I = 0; I
There is no problem or difficulty. When you look at the data structure defined in the elf. h file, remember the number of bytes occupied by each field.
With the structure definition, let's take a look at how to parse it.
Before parsing, We need to read the so file to byte [] and define a data structure type.
public static ElfType32 type_32 = new ElfType32();byte[] fileByteArys = Utils.readFile(so/libhello-jni.so);if(fileByteArys == null){System.out.println(read file byte failed...);return;}
2. parse the header information of the elf File
The description of these fields depends on the description in the PDF file mentioned above.
Here we will introduce several important fields, which will also be used when we modify the so file later:
1), e_phoff
This field is the Offset Value of the Program Header content in the entire file. We can use this offset value to locate the start position of the Program Header and parse the Program Header information.
2), e_shoff
This field is the Offset Value of the Section Header in this file. We can use this offset value to locate the start position of the Section Header and parse the Section Header information.
3), e_phnum
This field is the number of program headers used to parse program header information
4), e_shnum
This field is the number of Field headers used to parse the field header information.
5), e_shstrndx
This field is the index value of the String segment in the entire segment list. This field is used to locate the position of the String segment.
According to the above figure, we can easily parse
/*** Parse Elf header information * @ param header */private static void parseHeader (byte [] header, int offset) {if (header = null) {System. out. println (header is null); return;}/*** public byte [] e_ident = new byte [16]; public short e_type; public short e_machine; public int e_version; public int e_entry; public int e_phoff; public int e_shoff; public int e_flags; public short e_ehsize; public short e_phentsize; public short e_phnum; public short e_shentsize; public short e_shnum; public short e_shstrndx; */type_32.hdr.e_ident = Utils. copyBytes (header, 0, 16); // magic number type_32.hdr.e_type = Utils. copyBytes (header, 16, 2); type_32.hdr.e_machine = Utils. copyBytes (header, 18, 2); type_32.hdr.e_version = Utils. copyBytes (header, 20, 4); type_32.hdr.e_entry = Utils. copyBytes (header, 24, 4); type_32.hdr.e_phoff = Utils. copyBytes (header, 28, 4); type_32.hdr.e_shoff = Utils. copyBytes (header, 32, 4); type_32.hdr.e_flags = Utils. copyBytes (header, 36, 4); type_32.hdr.e_ehsize = Utils. copyBytes (header, 40, 2); type_32.hdr.e_phentsize = Utils. copyBytes (header, 42, 2); type_32.hdr.e_phnum = Utils. copyBytes (header, 44,2); type_32.hdr.e_shentsize = Utils. copyBytes (header, 46,2); type_32.hdr.e_shnum = Utils. copyBytes (header, 48, 2); type_32.hdr.e_shstrndx = Utils. copyBytes (header, 50, 2 );}
Read byte according to the number of bytes for each field.
3. parse Section Header information
For fields in this structure, see the description in pdf. I will not explain it here. We will manually construct such a data structure, and then describe the meaning of each field in detail.
Follow this structure. Our analysis is also simple:
/*** Parse field header information content */public static void parseSectionHeaderList (byte [] header, int offset) {int header_size = 40; // 40 bytes int header_count = Utils. byte2Short (type_32.hdr.e_shnum); // Number of headers byte [] des = new byte [header_size]; for (int I = 0; I
Note that the Section headers we see are generally multiple. Here we use a List to save
4. parse the Program Header information
The fields here are not explained here. For more information, see the pdf document.
We follow this structure for parsing:
/*** Parse program header information * @ param header */public static void parseProgramHeaderList (byte [] header, int offset) {int header_size = 32; // 32 bytes int header_count = Utils. byte2Short (type_32.hdr.e_phnum); // Number of headers byte [] des = new byte [header_size]; for (int I = 0; I
Of course there are other structure parsing work, which will not be described here, because these structures will not be used in the subsequent introduction, but they also need to be understood. For details, see the pdf document.
5. Verify resolution results
The above Parsing is complete. to verify whether our Parsing is correct, we need to define a print function for each structure, that is, to write the toString method.
Then, we can use the readelf tool to view the structure content of the so file and check whether the resolution is successful.
Parsing Code: http://download.csdn.net/detail/jiangwei0910410003/9204119
Above we use Java code for parsing. In order to take care of the vast number of programmers, we provide a C ++ version of the parsing class:
# Include
# Include
# Include
# Include elf. h/** a very important macro with simple functions: P: the segment address ALIGNBYTES: the number of bytes function: when the P value is added to the ALIGNBYTES integer multiple, this function is also called: Page function for its eg: 0x3e45/0x1000 ==> 0x4000 */# define ALIGN (P, ALIGNBYTES) (unsigned long) P + ALIGNBYTES-1 )&~ (ALIGNBYTES-1) int addSectionFun (char *, char *, unsigned int); int main () {addSectionFun (D: libhello-jni.so ,. jiangwei, 0x1000); return 0;} int addSectionFun (char * lpPath, char * szSecname, unsigned int nNewSecSize) {char name [50]; FILE * fdr, * fdw; char * base = NULL; Elf32_Ehdr * ehdr; Elf32_Phdr * t_phdr, * load1, * load2, * dynamic; Elf32_Shdr * s_hdr; int flag = 0; int I = 0; unsigned mapSZ = 0; unsigned nLoop = 0; unsigned int nAddInitFun = 0; unsigned int nNewSecAddr = 0; unsigned int nModuleBase = 0; memset (name, 0, sizeof (name )); if (nNewSecSize = 0) {return 0;} fdr = fopen (lpPath, rb); strcpy (name, lpPath); if (strchr (name ,'. ') {strcpy (strchr (name ,'. '), _ new. so);} else {strcat (name, _ new);} fdw = fopen (name, wb); if (fdr = NULL | fdw = NULL) {printf (Open file failed); return 1;} fseek (fdr, 0, SEEK_END); mapSZ = ftell (fdr); // printf (mapSZ: 0x % x, mapSZ); base = (char *) malloc (mapSZ * 2 + nNewSecSize); // 2 * Source File Size + newly added Section sizeprintf (base 0x % x, base); memset (base, 0, mapSZ * 2 + nNewSecSize); fseek (fdr, 0, SEEK_SET); fread (base, 1, mapSZ, fdr ); // copy the source file content to baseif (base = (void *)-1) {printf (fread fd failed); return 2;} // determine Program Headerehdr = (Elf32_Ehdr *) base; t_phdr = (Elf32_Phdr *) (base + sizeof (Elf32_Ehdr); for (I = 0; I
E_phnum; I ++) {if (t_phdr-> p_type = PT_LOAD) {// The flag here is only a flag, remove the Segment value of the first LOAD if (flag = 0) {load1 = t_phdr; flag = 1; nModuleBase = load1-> p_vaddr; printf (load1 = % p, offset = 0x % x, load1, load1-> p_offset);} else {load2 = t_phdr; printf (load2 = % p, offset = 0x % x, load2, load2-> p_offset) ;}}if (t_phdr-> p_type = PT_DYNAMIC) {dynamic = t_phdr; printf (dynamic = % p, offset = 0x % x, dynamic, dynamic-> p_offset) ;} T_phdr ++;} // section headers_hdr = (Elf32_Shdr *) (base + ehdr-> e_shoff); // obtain the position of the newly added section, which is important, printf (addr: 0x % x, load2-> p_paddr); nNewSecAddr = ALIGN (load2-> p_paddr + load2-> p_memsz-nModuleBase, load2-> p_align); printf (new section add: % x, nNewSecAddr); if (load1-> p_filesz <ALIGN (load2-> p_paddr + load2-> p_memsz, load2-> p_align) {printf (offset: % x, (ehdr-> e_shoff + sizeof (Elf32_Shdr) * ehd R-> e_shnum); // pay attention to the execution conditions of the code here. This is actually to judge whether the section header is at the end of the file if (ehdr-> e_shoff + sizeof (Elf32_Shdr) * ehdr-> e_shnum )! = MapSZ) {if (mapSZ + sizeof (Elf32_Shdr) * (ehdr-> e_shnum + 1)> nNewSecAddr) {printf (unable to add a section); return 3 ;} else {memcpy (base + mapSZ, base + ehdr-> e_shoff, sizeof (Elf32_Shdr) * ehdr-> e_shnum ); // copy the Section Header to the end of the original file: ehdr-> e_shoff = mapSZ; mapSZ + = sizeof (Elf32_Shdr) * ehdr-> e_shnum; // Add Section Header length s_hdr = (Elf32_Shdr *) (base + ehdr-> e_shoff); printf (ehdr_offset: % x, ehdr-> e_shoff );}}} else {nNewSecAddr = load1-> p_filesz;} printf (you can also add % d sections (nNewSecAddr-ehdr-> e_shoff)/sizeof (Elf32_Shdr) -ehdr-> e_shnum-1); int nWriteLen = nNewSecAddr + ALIGN (strlen (szSecname) + 1, 0x10) + nNewSecSize; // The total length of the file after the section is added: original length + section name + section sizeprintf (write len % x, nWriteLen); char * lpWriteBuf = (char *) malloc (nWriteLen); // nWriteLen: the total size of the final file memset (lpWriteBuf, 0, nWriteLen); // ehdr-> e_shstrndx is the Offset Value of the string table of section name in the section header, modify the size of the string segment s_hdr [ehdr-> e_shstrndx]. sh_size = nNewSecAddr-s_hdr [ehdr-> e_shstrndx]. sh_offset + strlen (szSecname) + 1; strcpy (lpWriteBuf + nNewSecAddr, szSecname); // Add section name // the following code constructs a Section into newSecShdr = {0}; newSecShdr. sh_name = nNewSecAddr-s_hdr [ehdr-> e_shstrndx]. sh_offset; newSecShdr. sh_type = SHT_PROGBITS; newSecShdr. sh_flags = SHF_WRITE | SHF_ALLOC | SHF_EXECINSTR; nNewSecAddr + = ALIGN (strlen (szSecname) + 1, 0x10); newSecShdr. sh_size = nNewSecSize; newSecShdr. sh_offset = nNewSecAddr; newSecShdr. sh_addr = nNewSecAddr + nModuleBase; newSecShdr. sh_addralign = 4; // modify the Program Header information load1-> p_filesz = nWriteLen; load1-> p_memsz = nNewSecAddr + nNewSecSize; load1-> p_flags = 7; // readable and writable. // modify the count value of the section in the Elf header. ehdr-> e_shnum ++; memcpy (lpWriteBuf, base, mapSZ ); // copy the mapSZ length bytes from the base to lpWriteBufmemcpy (lpWriteBuf + mapSZ, & newSecShdr, sizeof (Elf32_Shdr )); // append the newly added Section Header to the end of lpWriteBuf // write the file fseek (fdw, 0, SEEK_SET); fwrite (lpWriteBuf, 1, nWriteLen, fdw); fclose (fdw ); fclose (fdr); free (base); free (lpWriteBuf); return 0 ;}
After reading the C ++ code parsing, I have to say a few more words here to see how simple the code in C ++ is. The reason is simple: During file byte operations, pointers in C ++ are really awesome. This is what Java looks like ..
V. Summary
Here we will introduce the formats of Elf files. if you write a parsing class by yourself, you can have a deep understanding of the formats of elf files, therefore, the best way to learn about a file format is to manually write a tool class. This article is the first article of the reverse journey and the foundation of the subsequent chapters. The following article describes how to manually add a Data Structure in elf, look forward to it ~~