Android reverse journey --- explanation of the SO (ELF) File Format

Last Update:2015-10-26 Source: Internet

Author: User

Tags fread

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Android reverse journey --- explanation of the SO (ELF) File Format
1. Preface

Starting from today, we have officially started our Android reverse journey. We should be familiar with reverse engineering knowledge. The reverse engineering field is a challenging and mysterious field. As an Android developer, everyone wants to explore this field, because once you crack other people's content, your sense of accomplishment will surely burst. But on the contrary, we should not only study the way to crack, we also need to study the encryption method, because encryption and cracking are mutually exclusive. However, the native layer, that is, the so file cracking, may be the biggest headache During the cracking process. So let's take a closer look at the content of the so file. Let's take a look at what we will introduce today. Today, we will first introduce the elf File Format, because we know that the so file in Android is an elf file, so we need to understand the so file format first, to learn more about an elf file, you can manually write a tool class to parse an elf file.

2. Prepare materials

We need to know the elf File Format. For details about the elf file format, we already have a lot of Introduction Information on the Internet. I will not explain it too much here. However, there are two materials that need to be introduced, because the content on the Internet is really a lot of complicated. These two materials are the most comprehensive and the best. I just read these two materials to operate:

The first piece of information is a classic story of a non-worm brother:

Look, isn't it super detailed? When we use Java code to parse the elf file, we will follow this figure. However, some data structures in this figure are not very clear, so the second data source comes.

Article 2: Standard Edition of Peking University Laboratory

Http://download.csdn.net/detail/jiangwei0910410003/9204051

This document is not described in detail here. It will be explained later when parsing.

For the above two documents, we have to read them carefully. This is a classic. It is also the basis of subsequent work.

Third, Tools

Of course, we also need to introduce a tool here, because this tool is also very useful when parsing the elf File below, and it is a template for checking our parsing the elf File.

Is very famous:ReadelfCommand

However, this command cannot be used in the Window, because it is Linux, so we have to do a job to install it.Cygwin. For the installation of this tool, you can refer to this article:

Http://blog.csdn.net/jiangwei0910410003/article/details/17710243

However, during the download process, I am worried that the children will encounter setbacks, so I am very considerate and put it in the cloud Disk:

Http://pan.baidu.com/s/1C1Zci

After downloading, you need to change one item to use it:

This file:

This path should be changed to the bin directory path in your local cygwin64, otherwise it will run incorrectly. After modification, run Cygwin. bat.

We will not introduce the readelf tool in detail here. We will only introduce the commands we will use:

1. readelf-h xxx. so

View the header information of the so file

2. readelf-S xxx. so

View the Section header of the so file

3. readelf-l xxx. so

View the Program segment header information of the so file (Program)

4. readelf-a xxx. so

View All content of the so file

There are a lot of command usage, so I will not elaborate on it here. There are many introductions on the Internet ~~

4. parse the Elf File by actual operations (Java code & C ++ code)

Above we introduced the elf File Format information, elf File tool, then we will come to the actual operation, to use Java code hand parsing a libhello-jni.so file. About this libhello-jni.so file:

Http://download.csdn.net/detail/jiangwei0910410003/9204087

1. First define the structure content in the elf File

We need to refer to the header file format of elf. h. This file is also available on the Internet. Here is a download link:

Http://download.csdn.net/detail/jiangwei0910410003/9204081

Let's take a look at the data structure class of the elf File defined in Java:

Package com. demo. parseso; import java. util. ArrayList; public class ElfType32 {public elf32_rel; public elf32_rela rela; public ArrayList
 
  
SymList = new ArrayList
  
   
(); Public elf32_hdr; // elf header information public ArrayList
   
    
PhdrList = new ArrayList
    
     
(); // There may be multiple program headers public ArrayList
     
      
ShdrList = new ArrayList
      
        (); // There may be multiple field headers public ArrayList
       
         StrtbList = new ArrayList
        
          (); // There may be multiple string values: public ElfType32 () {rel = new elf32_rel (); rela = new elf32_rela (); hdr = new elf32_hdr ();} /*** typedef struct elf32_rel {Elf32_Addrr_offset; Elf32_Wordr_info;} Elf32_Rel; **/public class elf32_rel {public byte [] r_offset = new byte [4]; public byte [] r_info = new byte [4]; @ Overridepublic String toString () {return r_offset: + Utils. bytes2HexString (r_offset) +; r_info: + Utils. bytes2HexString (r_info) ;}}/*** typedef struct elf32_rela {Elf32_Addrr_offset; Elf32_Wordr_info; Elf32_Swordr_addend;} Elf32_Rela; */public class elf32_rela {public byte [] r_offset = new byte [4]; public byte [] r_info = new byte [4]; public byte [] r_addend = new byte [4]; @ Overridepublic String toString () {return r_offset: + Utils. bytes2HexString (r_offset) +; r_info: + Utils. bytes2HexString (r_info) +; r_addend: + Utils. bytes2HexString (r_info) ;}}/*** typedef struct elf32_sym {Elf32_Wordst_name; struct; role; unsigned charst_info; unsigned charst_other; role;} Elf32_Sym; */public static class Elf32_Sym {public byte [] st_name = new byte [4]; public byte [] st_value = new byte [4]; public byte [] st_size = new byte [4]; public byte st_info; public byte st_other; public byte [] st_shndx = new byte [2]; @ Overridepublic String toString () {return st_name: + Utils. bytes2HexString (st_name) + st_value: + Utils. bytes2HexString (st_value) + st_size: + Utils. bytes2HexString (st_size) + st_info: + (st_info/16) + st_other: + (short) st_other) & 0xF) + st_shndx: + Utils. bytes2HexString (st_shndx) ;}} public void printSymList () {for (int I = 0; I
         
           > 4) # define ELF_ST_TYPE (x) (unsigned int) x) & 0xf) * // *** typedef struct elf32_hdr {unsigned chare_ident [EI_NIDENT]; role; Elf32_Halfe_machine; elf32_Worde_version; Elf32_Addre_entry; // Entry point Elf32_Offe_phoff; Elf32_Offe_shoff; temperature; Elf32_Halfe_ehsize; temperature;} Elf32_Ehdr; */public class elf32_hdr {public byte [] e_ident = new byte [16]; public byte [] e_type = new byte [2]; public byte [] e_machine = new byte [2]; public byte [] e_version = new byte [4]; public byte [] e_entry = new byte [4]; public byte [] e_phoff = new byte [4]; public byte [] e_shoff = new byte [4]; public byte [] e_flags = new byte [4]; public byte [] e_ehsize = new byte [2]; public byte [] e_phentsize = new byte [2]; public byte [] e_phnum = new byte [2]; public byte [] e_shentsize = new byte [2]; public byte [] e_shnum = new byte [2]; public byte [] e_shstrndx = new byte [2]; @ Overridepublic String toString () {return magic: + Utils. bytes2HexString (e_ident) + e_type: + Utils. bytes2HexString (e_type) + e_machine: + Utils. bytes2HexString (e_machine) + e_version: + Utils. bytes2HexString (e_version) + e_entry: + Utils. bytes2HexString (e_entry) + e_phoff: + Utils. bytes2HexString (e_phoff) + e_shoff: + Utils. bytes2HexString (e_shoff) + e_flags: + Utils. bytes2HexString (e_flags) + e_ehsize: + Utils. bytes2HexString (e_ehsize) + e_phentsize: + Utils. bytes2HexString (e_phentsize) + e_phnum: + Utils. bytes2HexString (e_phnum) + e_shentsize: + Utils. bytes2HexString (e_shentsize) + e_shnum: + Utils. bytes2HexString (e_shnum) + e_shstrndx: + Utils. bytes2HexString (e_shstrndx) ;}}/*** typedef struct elf32_phdr {Elf32_Wordp_type; Elf32_Offp_offset; distance;} Elf32_Phdr; */public static class elf32_phdr {public byte [] p_type = new byte [4]; public byte [] p_offset = new byte [4]; public byte [] p_vaddr = new byte [4]; public byte [] p_paddr = new byte [4]; public byte [] p_filesz = new byte [4]; public byte [] p_memsz = new byte [4]; public byte [] p_flags = new byte [4]; public byte [] p_align = new byte [4]; @ Overridepublic String toString () {return p_type: + Utils. bytes2HexString (p_type) + p_offset: + Utils. bytes2HexString (p_offset) + p_vaddr: + Utils. bytes2HexString (p_vaddr) + p_paddr: + Utils. bytes2HexString (p_paddr) + p_filesz: + Utils. bytes2HexString (p_filesz) + p_memsz: + Utils. bytes2HexString (p_memsz) + p_flags: + Utils. bytes2HexString (p_flags) + p_align: + Utils. bytes2HexString (p_align) ;}} public void printPhdrList () {for (int I = 0; I
          
            There is no problem or difficulty. When you look at the data structure defined in the elf. h file, remember the number of bytes occupied by each field.
            
            
           With the structure definition, let's take a look at how to parse it.
           Before parsing, We need to read the so file to byte [] and define a data structure type.
            
           public static ElfType32 type_32 = new ElfType32();byte[] fileByteArys = Utils.readFile(so/libhello-jni.so);if(fileByteArys == null){System.out.println(read file byte failed...);return;}
           

            
           2. parse the header information of the elf File
           
           The description of these fields depends on the description in the PDF file mentioned above.
           Here we will introduce several important fields, which will also be used when we modify the so file later:
           1), e_phoff
           This field is the Offset Value of the Program Header content in the entire file. We can use this offset value to locate the start position of the Program Header and parse the Program Header information.
           2), e_shoff
           This field is the Offset Value of the Section Header in this file. We can use this offset value to locate the start position of the Section Header and parse the Section Header information.
           3), e_phnum
           This field is the number of program headers used to parse program header information
           4), e_shnum
           This field is the number of Field headers used to parse the field header information.
           5), e_shstrndx
           This field is the index value of the String segment in the entire segment list. This field is used to locate the position of the String segment.
            
           According to the above figure, we can easily parse
            
           /*** Parse Elf header information * @ param header */private static void parseHeader (byte [] header, int offset) {if (header = null) {System. out. println (header is null); return;}/*** public byte [] e_ident = new byte [16]; public short e_type; public short e_machine; public int e_version; public int e_entry; public int e_phoff; public int e_shoff; public int e_flags; public short e_ehsize; public short e_phentsize; public short e_phnum; public short e_shentsize; public short e_shnum; public short e_shstrndx; */type_32.hdr.e_ident = Utils. copyBytes (header, 0, 16); // magic number type_32.hdr.e_type = Utils. copyBytes (header, 16, 2); type_32.hdr.e_machine = Utils. copyBytes (header, 18, 2); type_32.hdr.e_version = Utils. copyBytes (header, 20, 4); type_32.hdr.e_entry = Utils. copyBytes (header, 24, 4); type_32.hdr.e_phoff = Utils. copyBytes (header, 28, 4); type_32.hdr.e_shoff = Utils. copyBytes (header, 32, 4); type_32.hdr.e_flags = Utils. copyBytes (header, 36, 4); type_32.hdr.e_ehsize = Utils. copyBytes (header, 40, 2); type_32.hdr.e_phentsize = Utils. copyBytes (header, 42, 2); type_32.hdr.e_phnum = Utils. copyBytes (header, 44,2); type_32.hdr.e_shentsize = Utils. copyBytes (header, 46,2); type_32.hdr.e_shnum = Utils. copyBytes (header, 48, 2); type_32.hdr.e_shstrndx = Utils. copyBytes (header, 50, 2 );}Read byte according to the number of bytes for each field.
            
            
           3. parse Section Header information
           

For fields in this structure, see the description in pdf. I will not explain it here. We will manually construct such a data structure, and then describe the meaning of each field in detail.
           Follow this structure. Our analysis is also simple:
            
           /*** Parse field header information content */public static void parseSectionHeaderList (byte [] header, int offset) {int header_size = 40; // 40 bytes int header_count = Utils. byte2Short (type_32.hdr.e_shnum); // Number of headers byte [] des = new byte [header_size]; for (int I = 0; I
            
             
Note that the Section headers we see are generally multiple. Here we use a List to save
              
              
             4. parse the Program Header information
             The fields here are not explained here. For more information, see the pdf document.
             We follow this structure for parsing:
              
             /*** Parse program header information * @ param header */public static void parseProgramHeaderList (byte [] header, int offset) {int header_size = 32; // 32 bytes int header_count = Utils. byte2Short (type_32.hdr.e_phnum); // Number of headers byte [] des = new byte [header_size]; for (int I = 0; I
              
                
                Of course there are other structure parsing work, which will not be described here, because these structures will not be used in the subsequent introduction, but they also need to be understood. For details, see the pdf document.
                
               5. Verify resolution results
               The above Parsing is complete. to verify whether our Parsing is correct, we need to define a print function for each structure, that is, to write the toString method.
               
               Then, we can use the readelf tool to view the structure content of the so file and check whether the resolution is successful.
                
               Parsing Code: http://download.csdn.net/detail/jiangwei0910410003/9204119
                
               Above we use Java code for parsing. In order to take care of the vast number of programmers, we provide a C ++ version of the parsing class:
                
               # Include
                
                 
# Include
                 
                  
# Include
                  
                   
# Include elf. h/** a very important macro with simple functions: P: the segment address ALIGNBYTES: the number of bytes function: when the P value is added to the ALIGNBYTES integer multiple, this function is also called: Page function for its eg: 0x3e45/0x1000 ==> 0x4000 */# define ALIGN (P, ALIGNBYTES) (unsigned long) P + ALIGNBYTES-1 )&~ (ALIGNBYTES-1) int addSectionFun (char *, char *, unsigned int); int main () {addSectionFun (D: libhello-jni.so ,. jiangwei, 0x1000); return 0;} int addSectionFun (char * lpPath, char * szSecname, unsigned int nNewSecSize) {char name [50]; FILE * fdr, * fdw; char * base = NULL; Elf32_Ehdr * ehdr; Elf32_Phdr * t_phdr, * load1, * load2, * dynamic; Elf32_Shdr * s_hdr; int flag = 0; int I = 0; unsigned mapSZ = 0; unsigned nLoop = 0; unsigned int nAddInitFun = 0; unsigned int nNewSecAddr = 0; unsigned int nModuleBase = 0; memset (name, 0, sizeof (name )); if (nNewSecSize = 0) {return 0;} fdr = fopen (lpPath, rb); strcpy (name, lpPath); if (strchr (name ,'. ') {strcpy (strchr (name ,'. '), _ new. so);} else {strcat (name, _ new);} fdw = fopen (name, wb); if (fdr = NULL | fdw = NULL) {printf (Open file failed); return 1;} fseek (fdr, 0, SEEK_END); mapSZ = ftell (fdr); // printf (mapSZ: 0x % x, mapSZ); base = (char *) malloc (mapSZ * 2 + nNewSecSize); // 2 * Source File Size + newly added Section sizeprintf (base 0x % x, base); memset (base, 0, mapSZ * 2 + nNewSecSize); fseek (fdr, 0, SEEK_SET); fread (base, 1, mapSZ, fdr ); // copy the source file content to baseif (base = (void *)-1) {printf (fread fd failed); return 2;} // determine Program Headerehdr = (Elf32_Ehdr *) base; t_phdr = (Elf32_Phdr *) (base + sizeof (Elf32_Ehdr); for (I = 0; I
                   
                    
E_phnum; I ++) {if (t_phdr-> p_type = PT_LOAD) {// The flag here is only a flag, remove the Segment value of the first LOAD if (flag = 0) {load1 = t_phdr; flag = 1; nModuleBase = load1-> p_vaddr; printf (load1 = % p, offset = 0x % x, load1, load1-> p_offset);} else {load2 = t_phdr; printf (load2 = % p, offset = 0x % x, load2, load2-> p_offset) ;}}if (t_phdr-> p_type = PT_DYNAMIC) {dynamic = t_phdr; printf (dynamic = % p, offset = 0x % x, dynamic, dynamic-> p_offset) ;} T_phdr ++;} // section headers_hdr = (Elf32_Shdr *) (base + ehdr-> e_shoff); // obtain the position of the newly added section, which is important, printf (addr: 0x % x, load2-> p_paddr); nNewSecAddr = ALIGN (load2-> p_paddr + load2-> p_memsz-nModuleBase, load2-> p_align); printf (new section add: % x, nNewSecAddr); if (load1-> p_filesz <ALIGN (load2-> p_paddr + load2-> p_memsz, load2-> p_align) {printf (offset: % x, (ehdr-> e_shoff + sizeof (Elf32_Shdr) * ehd R-> e_shnum); // pay attention to the execution conditions of the code here. This is actually to judge whether the section header is at the end of the file if (ehdr-> e_shoff + sizeof (Elf32_Shdr) * ehdr-> e_shnum )! = MapSZ) {if (mapSZ + sizeof (Elf32_Shdr) * (ehdr-> e_shnum + 1)> nNewSecAddr) {printf (unable to add a section); return 3 ;} else {memcpy (base + mapSZ, base + ehdr-> e_shoff, sizeof (Elf32_Shdr) * ehdr-> e_shnum ); // copy the Section Header to the end of the original file: ehdr-> e_shoff = mapSZ; mapSZ + = sizeof (Elf32_Shdr) * ehdr-> e_shnum; // Add Section Header length s_hdr = (Elf32_Shdr *) (base + ehdr-> e_shoff); printf (ehdr_offset: % x, ehdr-> e_shoff );}}} else {nNewSecAddr = load1-> p_filesz;} printf (you can also add % d sections (nNewSecAddr-ehdr-> e_shoff)/sizeof (Elf32_Shdr) -ehdr-> e_shnum-1); int nWriteLen = nNewSecAddr + ALIGN (strlen (szSecname) + 1, 0x10) + nNewSecSize; // The total length of the file after the section is added: original length + section name + section sizeprintf (write len % x, nWriteLen); char * lpWriteBuf = (char *) malloc (nWriteLen); // nWriteLen: the total size of the final file memset (lpWriteBuf, 0, nWriteLen); // ehdr-> e_shstrndx is the Offset Value of the string table of section name in the section header, modify the size of the string segment s_hdr [ehdr-> e_shstrndx]. sh_size = nNewSecAddr-s_hdr [ehdr-> e_shstrndx]. sh_offset + strlen (szSecname) + 1; strcpy (lpWriteBuf + nNewSecAddr, szSecname); // Add section name // the following code constructs a Section into newSecShdr = {0}; newSecShdr. sh_name = nNewSecAddr-s_hdr [ehdr-> e_shstrndx]. sh_offset; newSecShdr. sh_type = SHT_PROGBITS; newSecShdr. sh_flags = SHF_WRITE | SHF_ALLOC | SHF_EXECINSTR; nNewSecAddr + = ALIGN (strlen (szSecname) + 1, 0x10); newSecShdr. sh_size = nNewSecSize; newSecShdr. sh_offset = nNewSecAddr; newSecShdr. sh_addr = nNewSecAddr + nModuleBase; newSecShdr. sh_addralign = 4; // modify the Program Header information load1-> p_filesz = nWriteLen; load1-> p_memsz = nNewSecAddr + nNewSecSize; load1-> p_flags = 7; // readable and writable. // modify the count value of the section in the Elf header. ehdr-> e_shnum ++; memcpy (lpWriteBuf, base, mapSZ ); // copy the mapSZ length bytes from the base to lpWriteBufmemcpy (lpWriteBuf + mapSZ, & newSecShdr, sizeof (Elf32_Shdr )); // append the newly added Section Header to the end of lpWriteBuf // write the file fseek (fdw, 0, SEEK_SET); fwrite (lpWriteBuf, 1, nWriteLen, fdw); fclose (fdw ); fclose (fdr); free (base); free (lpWriteBuf); return 0 ;}
                   
                  
                 
                
                
                After reading the C ++ code parsing, I have to say a few more words here to see how simple the code in C ++ is. The reason is simple: During file byte operations, pointers in C ++ are really awesome. This is what Java looks like ..
                V. Summary
               Here we will introduce the formats of Elf files. if you write a parsing class by yourself, you can have a deep understanding of the formats of elf files, therefore, the best way to learn about a file format is to manually write a tool class. This article is the first article of the reverse journey and the foundation of the subsequent chapters. The following article describes how to manually add a Data Structure in elf, look forward to it ~~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More