Copy from:11900000076529370x00 Preface
The best way to parse the Dex file format is to find an introductory document, write a simple demo yourself and then use 010Editor to compare the analysis. Documents can refer to the official document Http://source.android.com/devices/tech/dalvik/dex-format.html, English Poor can also find a Chinese, for example, I ...
010Editor This tool is more useful, it is also used to analyze the elf file before. In fact, as long as the template installed, you can analyze a lot of files. Although it is a paid software, there is a 30-day free trial.
But what if you use a Mac???? ????????~/.config/SweetScape/010 Editor.ini.
0x01 File Layout
The Dex file can be divided into 3 modules, header file (header), index area (xxxx_ids), and data area. The profile of the header file describes the distribution of the entire Dex file, including the size and offset of each index area. IDs in the index area areidentifiersabbreviations that represent the identity of each data, and the index area is primarily an offset to the data area.
010Editor In addition to the data area is not displayed, the other sections are displayed, the other link_data in the template is designated as Map_list
0x02 Header
The header describes the Dex file information, and the indexes for each of the other extents. 010Editor (write 010Editor a little trouble below directly write 010) using the structurestruct header_itemto describe the header.
Two data types, char, uint, are used. The char here is C + + char in 8-bit, the char in Java is 16-bit a little bit different, but we can show him Short/ushort This later describes the latest written tool. Official documents are defined by Ubyte, which is still official.
Structure Description:
ubyte 8-bit unsinged int
uint 32-bit unsigned int, little-endian
struct header_item
{
ubyte[8] magic;
unit checksum;
ubyte[20] signature;
uint file_size;
uint header_size;
unit endian_tag;
uint link_size;
uint link_off;
uint map_off;
uint string_ids_size;
uint string_ids_off;
uint type_ids_size;
uint type_ids_off;
uint proto_ids_size;
uint proto_ids_off;
uint method_ids_size;
uint method_ids_off;
uint class_defs_size;
uint class_defs_off;
uint data_size;
uint data_off;
}
In addition to magic, checksum, signature, file_size, Endian_tag, Map_off other elements are paired. _off represents the offset of an element, and _size represents the number of elements. The remaining 6 descriptions are primarily information of the Dex file.
{0x64, 0x65, 0x78, 0x0A, 0x30, 0x33, 0x35, 0x00} = "dex\n035\0"
The middle is a newline, and the back 035 is the version number.
-
Checksum: File check code, use the ALDER32 algorithm checksum file to remove all the remaining file areas MAIGC, checksum, to check for file errors.
-
Signature: Use the SHA-1 algorithm hash to remove all remaining file areas except magic, checksum, and signature for uniquely identifying this file.
-
File_size:dex File Size
-
The size of the Header_size:header area is currently fixed to 0x70
-
Endian_tag: Size end label, dex file format small end, fixed value 0x12345678 constant
-
The offset address of the Map_off:map_item, which belongs to the contents of the data area, with a value greater than or equal to the size of Data_off, at the end of the Dex file.
0x03 String_ids
The String_ids section describes all the strings in the Dex file. The format is simple with only one offset, and the offset points to a string in the String_data segment:
The above description mentions the LEB128 (little endian base 128) format, which is an indefinite length encoding based on 1 bytes. If the highest bit of the first byte is 1, then the next byte is also required to describe it until the highest bit of the last byte is 0. The remaining bits of each byte are used to represent the data, as shown in the following table. In fact, the largest LEB128 can only reach 32-bit to read the Dalvik in the Leb128.h source to see.
The data structure is:
ubyte 8-bit unsinged int
uint 32-bit unsigned int, little-endian
uleb128 unsigned LEB128, valriable length
struct string_ids_item
{
uint string_data_off;
}
struct string_data_item
{
uleb128 utf16_size;
ubyte data;
}
Where data holds the value of the string. String_ids is more critical, and many of the subsequent sections are directly pointing to the index of String_ids. You also need to extract the string_ids when writing the tool for comparison.
0x04 Type_ids
The Type_ids area indexes all data types in the Dex file, including the class type, the array type (array types), and the base type
(primitive types). The element format in the section is Type_ids_item, and the structure is described as follows:
uint 32-bit unsigned int, little-endian
struct type_ids_item
{
uint descriptor_idx; //-->string_ids
}
The meaning of the DESCRIPTOR_IDX value inside the Type_ids_item is the index number in the String_ids, which is the string used to describe this type.
0x05 Proto_ids
Proto means that method prototype represents a prototype of a method in the Java language. The elements in the Proto_ids are Proto_id_item and are structured as follows:
uint 32-bit unsigned int, little-endian
struct proto_id_item
{
uint shorty_idx; //-->string_ids
uint return_type_idx; //-->type_ids
uint parameters_off;
}
-
Shorty_idx: Like Type_ids, its value is a string_ids index number, which is ultimately a short string description to illustrate the method prototype.
-
RETURN_TYPE_IDX: Its value is the index number of a type_ids that represents the return value type of the method prototype.
-
Parameters_off: A parameter list that points to the method prototype Type_list, if method has no parameters, the value is 0. The format of the parameter list is type_list, which is described below.
0x06 Field_ids
The Filed_ids area has all the field referenced by the Dex file. The element format of the section is Field_id_item, with the following structure:
ushort 16-bit unsigned int, little-endian
uint 32-bit unsigned int, little-endian
struct filed_id_item
{
ushort class_idx; //-->type_ids
ushort type_idx; //-->type_ids
uint name_idx; //-->string_ids
}
-
Class_idx: Represents the class type to which the field belongs, the value of Class_idx is an index of type_ids and must point to a class type.
-
Type_idx: Represents the type of this field, and its value is also an index of type_ids.
-
Name_idx: Represents the name of this field, and its value is an index of string_ids.
0x07 Method_ids
Method_ids is the last entry in the index area, describing all the method in the Dex file. The Method_ids element format is Method_id_item, and the structure is similar to Fields_ids:
ushort 16-bit unsigned int, little-endian
uint 32-bit unsigned int, little-endian
struct filed_id_item
{
ushort class_idx; //-->type_ids
ushort proto_idx; //-->proto_ids
uint name_idx; //-->string_ids
}
-
Class_idx: Represents the class type to which the method belongs, the value of Class_idx is an index of type_ids and must point to a class type. <font Color=red>ushort Type is also why we say a Dex can only have 65,535 methods for the reason that more must be subcontracting </font>.
-
Proto_idx: Represents the type of method, and its value is also an index of type_ids.
-
Name_idx: Represents the name of the method, and its value is an index of string_ids.
0x08 Class_defs
Class_def section is mainly the definition of class, its structure is very complex, look at me a little dizzy, one layer of a layer. Let's look at a 010 structure diagram:
Look at all dizzy, don't say the time to parse.
Class_def_item
The CLASS_DEF_ITEM structure is described as follows:
uint 32-bit unsigned int, little-endian
struct class_def_item
{
uint class_idx; //-->type_ids
uint access_flags;
uint superclass_idx; //-->type_ids
uint interface_off; //-->type_list
uint source_file_idx; //-->string_ids
uint annotations_off; //-->annotation_directory_item
uint class_data_off; //-->class_data_item
uint static_value_off; //-->encoded_array_item
}
- CLASS_IDX: Describes the specific class type, and the value is an index of type_ids. The value must be a class type and cannot be an array type or base type.
- Access_flags: Describes the type of access for class, such as public, final, static, and so on. In dex-format.html "Access_flags definitions" has a specific description.
- SUPERCLASS_IDX: Describes the type of supperclass, in the form of a value similar to CLASS_IDX.
- Interfaces_off: The value is the offset address, which points to the interfaces of class, and the data structure to which it is directedtype_list. Class if there is no interfaces value of 0.
- Source_file_idx: Represents the source code file information, and the value is an index of string_ids. If this information is missing, this value is assigned a value of NO_INDEX=0XFFFF FFFF.
- Annotions_off: The value is an offset address, which is the comment of the class, located in the data area, in the formatannotations_direcotry_item. If this is not the case, the value is 0.
- Class_data_off: The value is an offset address, which refers to the data used by the class, and is in the format in data areaclass_data_item. If not, the value of this entry is 0. This structure has a lot of content, detailed description of the Class field, method, method of execution code and other information, will be described laterclass_data_item.
- Static_value_off: The value is an offset address that points to a list in the data area, in the formatencoded_array_item. If not, the value of this entry is 0.
Type_list
Type_list in the data section, Class_def_item->interface_off refers to the information here. The data structure is as follows:
uint 32-bit unsigned int, little-endian
struct type_list
{
uint size;
type_item list [size]
}
struct type_item
{
ushort type_idx //-->type_ids
}
Annotations_directory_item
Class_def_item->annotations_off points to the data section, defines the annotation related data description, data structure as follows:
uint 32-bit unsigned int, little-endian
struct annotation_directory_item
{
uint class_annotations_off; //-->annotation_set_item
uint fields_size;
uint annotated_methods_size;
uint annotated_parameters_size;
field_annotation field_annotations[fields_size];
method_annotation method_annotations[annotated_methods_size];
parameter_annotation parameter_annotations[annotated_parameters_size];
}
struct field_annotation
{
uint field_idx;
uint annotations_off; //-->annotation_set_item
}
struct method_annotation
{
uint method_idx;
uint annotations_off; //-->annotation_set_item
}
struct parameter_annotation
{
uint method_idx;
uint annotations_off; //-->annotation_set_ref_list
}
-
Class_annotations_off: This offset pointsannotation_set_item to a specific description that can be seen on the dex-format.html.
-
Fields_size: Indicates the number of attributes
-
Annotated_methods_size: Indicates the number of methods
-
Annotated_parameters_size: Indicates the number of parameters
Class_data_item
Class_data_off points to the CLASS_DATA_ITEM structure in the data area, Class_data_item contains the various data used by this class, and the following is the structure of the Class_data_item:
uleb128 unsigned little-endian base 128
struct class_data_item
{
uleb128 static_fields_size;
uleb128 instance_fields_size;
uleb128 direct_methods_size;
uleb128 virtual_methods_size;
encoded_field static_fields[static_fields_size];
encoded_field instance_fields[instance_fields_size];
encoded_method direct_methods[direct_methods_size];
encoded_method virtual_methods[virtual_methods_size];
}
struct encoded_field
{
uleb128 filed_idx_diff;
uleb128 access_flags;
}
struct encoded_method
{
uleb128 method_idx_diff;
uleb128 access_flags;
uleb128 code_off;
}
Class_data_item
-
Static_fields_size: Number of static member variables
-
Instance_fields_size: Number of instance member variables
-
Direct_methods_size: Number of direct functions
-
Virtual_methods_size: Number of virtual functions
Here are a few of the descriptions for
Encoded_field
-
Method_idx_diff: The prefix METHD_IDX indicates that its value is an index of method_ids, and the suffix _diff indicates that it is a difference from another method_idx, which is relative to Encodeed_method [] The difference between the method_idx of the previous element in the array. In fact encoded_filed-> Field_idx_diff said the same meaning, just compiled Hello.dex file is not used in the class filed so no careful, detailed reference dex_format.html Official documents.
-
Access_flags: Access rights, such as public, private, static, final, and so on.
-
Code_off: An offset address to the data area where the target is the code implementation of this method. The structure pointed to is Code_item, which has nearly 10 elements.
Code_item
The CODE_ITEM structure describes the specific implementation of a method, and its structure is described as follows:
struct code_item
{
ushort registers_size;
ushort ins_size;
ushort outs_size;
ushort tries_size;
uint debug_info_off;
uint insns_size;
ushort insns [insns_size];
ushort paddding; // optional
try_item tries [tyies_size]; // optional
encoded_catch_handler_list handlers; // optional
}
The 3 flags at the end are optional, indicating that there may or may not be, according to the specific code.
-
Registers_size: The number of registers to use for this section of code.
-
Ins_size:method the number of parameters passed in.
-
Outs_size: The number of arguments required for this code to call other method.
-
The number of tries_size:try_item structures.
-
Debug_off: Offset address, the location of the debug information that points to this code is adebug_info_itemstructure.
-
Insns_size: The size of the instruction list, in 16-bit units. Insns is the abbreviation of instructions.
-
padding: A value of 0 that is used to align bytes.
-
Tries and handlers: for handling exception in Java, the common syntax is try catch.
Encoded_array_item
The Class_def_item->static_value_off offset points to the segment data.
uleb128 unsigned LEB128, valriable length
struct encoded_array_item
{
encoded_array value;
}
struct encoded_array
{
uleb128 size;
encoded_value values[size];
}
0x09 map_list
Most of the item in Map_list is the same as the corresponding description in the header, which describes the offsets and sizes of each area, but is more comprehensive in map_list, including Header_item, Type_list, String_data_item, Debug_info_item and other information.
010 Map_list is indicated as:
The data structure is:
ushort 16-bit unsigned int, little-endian
uint 32-bit unsigned int, little-endian
struct map_list
{
uint size;
map_item list [size];
}
struct map_item
{
ushort type;
ushort unuse;
uint size;
uint offset;
}
Map_list first with a uint description is followed by a size map_item, followed by a corresponding size Map_item description. The MAP_ITEM structure has 4 elements: type denotes the map_item, Dalvik the definition of type Code in executable Format; Size indicates the number of the type to subdivide this item, and offset is the offset of the first element for the initial position of the file; Unuse is used to align bytes with no practical use.
[Android Security] Dex File Format Analysis