Android mobile Reverse (iii)-android Dalvik virtual machine

Source: Internet
Author: User

everyone knows that the Java program is running on the Java Virtual machine, the Android program ?
Although the Android platform uses the Java language to develop applications, the Android program is not running on a standard Java virtual machine. Google has designed a virtual machine for the Android platform to run the Android program –dalvik virtual machine, the Dalvik VM.

The role of this article:

    • Literacy Dalvik virtual Machines
    • Understand the syntax of Smail, can read Smail file
Dalvik Overview of Dalvik features (relative to JVM)
    • Small size, low memory footprint;
    • Proprietary DEX executable file format, smaller size, faster execution;
    • Constant pool uses 32-bit index value, addressing class method name, field name, and often light faster;
    • Register-based architecture with a complete set of instruction systems
    • Provides important functions such as Object Lifecycle management, stack management, thread management, security and exception management, and garbage collection;
    • All Android programs run in the Android system process, each of which corresponds to a dalvik virtual machine instance;
Dalvik the difference between a virtual machine and a Java virtual machine
    1. Java Virtual machine running Java bytecode, Dalvik virtual machine is running Dalvik byte code
    2. Dalvik executable file size is smaller
      A little analysis:
      A tool called DX in the SDK is responsible for converting Java bytecode to Dalvik bytecode. The DX tool rearranges the Java class files, eliminating any redundant information that appears in the class file, and avoiding duplicate file loading and parsing during initialization of the virtual machine.

      Give me a chestnut:
      In Java, a large number of string constants are reused in multiple class files, which directly increase the volume of the file, and colleagues can seriously affect the efficiency of the virtual machine parsing files. The DX tool deals with this problem in a special process that decomposes the constant pools in all Java class files. Eliminate the redundant information and regroup it into a constant pool, where all class files share the same constant pool. The DX tool conversion process, because the DX tool compresses the constant pool, is the same string, the constants appear only once in the Dex file, reducing the size of the file.


3. Java Virtual machine differs from Dalvik virtual machine architecture

Simply put: Java Virtual machine based on the stack architecture, Dalvik based on the register architecture;

Dalvik instruction Format

The General Dalvik Assembly code consists of a series of Dalvik directives, which are determined by the instruction's bit description and the instruction format identifier. The bit description conventions are as follows

    • Each 16-digit word is separated by a space;
    • Each letter represents four bits, each of which begins in order from a high byte and is arranged to a low byte. A vertical line may be used between every four bits "|" To represent a different content;
    • The order uses a-Z to turn the capital letter as a 4-bit opcode, op represents a 8-bit operation code;
    • “?” To indicate that all bits of this field are 0 values;

Chestnut
"A| B|op BBBB f| e| d| C
There are two spaces in the middle of the instruction, each part is 16 bits, a total of 3 16 bits constitute this instruction;
The first 16-bit is "a| B|op "High 8 bits consist of A and B, and the low byte is composed of opcode op;
The second 16-bit is made up of BBBB, which represents a 16-bit offset value;
The third 16 bits are composed of a total of four bytes from the f,e,d,c, where they represent the parameters of the register.

The use of a bit ID alone can not determine the meaning of an instruction, you must pass the instruction format identification to specify the format of the instruction code, the Convention is as follows

    • Instruction format identification is mostly composed of three characters, the first two are numbers, the last is the letter;
    • The first digital identification instruction consists of the number of 16-bit words;
    • The second digital ID instruction uses the maximum number of registers, the special mark "R" identifies a certain range of registers;
    • The third letter is the type code, which identifies the type of additional data used by the instruction, see:
    • There is also a special case where there may be another letter at the end, if "s" means that the instruction takes a static link, and if "I" indicates that the instruction should be inline-handled.

Chestnuts

"22X" has three messages to read

    1. instruction consists of 2 16-bit words
    2. Instruction uses 2 registers
    3. No additional data is used

In addition, the Dalvik directive provides some explanations for the syntax, as agreed

    • Each instruction starts from the operation code, followed by parameters, the number of parameters is variable, each parameter is separated by commas;
    • The arguments for each instruction start with the first part of the instruction, OP is in the low 8 bits, the high 8 bits can be a 8-bit parameter or two 4-bit arguments, or it can be empty. If the instruction exceeds 16 bits, then the subsequent part is the parameter;
    • If the parameter is identified by means of "VX", it indicates that it is a register, such as V0,V1, etc.;
    • If the parameter is "#+x", it means that it is a constant number;
    • If the parameter is "+x", it indicates that it is the address offset of a relative instruction;
    • If the parameter is "[email protected]", it indicates that it is a constant pool index value. Where kind represents a constant pool type, such as [email protected], which means the string constant pool index bbbb;

Chestnut
"Op VAA [email protected]"
The high 8 bits are empty, 1 register parameter vaa are used, and a string constant pool index bbbb is used.

Dalvik Register

Literacy's over, it's starting to focus.

Dalvik byte code type, method, and field representation method
  1. type
    Dalvik byte code only two types, basic and reference, words not much to say, look at the picture;

    Each Dalvik register is a 32-bit size, and for a small fish or a type equal to 32 bits, a register can hold a value of that type, and a 64-bit type such as J (Long), D (double), whose values are stored using contiguous registers, V0 and V1 or vn and vn+1, etc.;
    L Understand, that means any Java class, in the Dalvik assembly code, they are "lpackage/name/objectname; "Indicate, note the last semicolon, such as" ljava/lang/string; " equivalent to string;
    [Type is all arrays, [followed by a basic type descriptor, such as [i] represents an integer one-dimensional array,->int[],[[i represents int[][]<-> [ljava/lang/string; an array of objects String [];
  2. Method
    Dalvik describes a method using a method name, a type parameter, and a return value;
    The format is as follows:
    Lpackage/name/objectname;->methodname (III) Z
    Description
    Lpackage/name/objectname; is a type;
    MethodName Method Name
    (III) parameters, three int parameters
    Z return value void
    Chestnut
    Method (i[[iiljava/lang/string;[ ljava/lang/string;) ljava/lang/string;
    Cough, according to the above knowledge, convert it to Java in the form of code:
    String method (int, int[][],string,string[])
  3. Field
    Fields and methods are similar, that is, there are no parameters and return values, instead of the type of the field, the format is as follows
    Lpackage/name/objectname;->fieldname:type
    Description
    Lpackage/name/objectname; is a type;
    FieldName Field Name
    Type field types
    FieldName and type are separated by colons
    Chestnut
    name:ljava/lang/string;
    Transformation:
    String name;
    The field code in the Dalvik code begins with the. field directive and, depending on the field type, may be annotated with the pound sign "#" at the beginning of the field instruction ;
Dalvik instruction Set Instruction features

The Dalvik directive mimics the calling convention of the C language in the invocation format. The Dalvik instruction syntax and the particle character have the following characteristics:

    • The parameter takes the way from the target (destination) to the source;
    • Depending on the layout and options of the bytecode, some bytecode suffixes are added to disambiguate, which is separated by adding a slash "/" after the main name of the bytecode;
    • In the description of the instruction set, each letter in the width value represents a width of 4 bits;
    • Depending on the size and type of the bytecode, some bytecode adds a name suffix to disambiguate:
      • 32-bit regular type byte code, no suffix added;
      • 64-bit regular type byte code with-wide suffix;
      • Special types of bytecode add suffixes based on specific types, and they can be one of the-boolean,-byte,-char,-short,-int,-long,-float,-double,-object,-string,-class,-void;

Chestnut
"Move-wide/from16 vaa,vbbbb"

Move is the underlying bytecode. Identify this is the basic operation;
Wide is the name suffix. Identifies the data width of the instruction operation (64 bits);
FROM16-bit bytecode suffix. Identifies the source as a 16-bit register reference variable;
VAA is the destination register, and it always takes a value range of v0-v2^8-1 (255) in front of the source;
VBBBB is a source register with a value range of v0-v2^16-1 (65535)

Null operation instruction

The mnemonic for the null operation instruction is NOP, his value is 00, and the NOP instruction is usually used for the alignment code, which is not very useful;

Data manipulation Directives

The data operation instruction is the move. The move directive is prototyped as a move Destination,source or move destination,move instruction, depending on the size and type of the byte code, followed by a different suffix.
Chestnuts (show too much direct, all the same)

return instruction

The return instruction refers to the last instruction that runs at the end of the function. His base byte code return, total of the following four return instructions
Chestnuts

    1. "Return-void" returns a void
    2. Return VAA Returns a value of 32-bit non-object type, a register VAA that returns a register bit of 8 bits;
    3. "Return-wide VAA" returns a 64-bit non-object-type value that returns a register of value register bit 8 bits VAA;
    4. "Return-object VAA" returns the value of an object type, returning a register bit of 8 bits of the value register VAA;

Data definition Directives

The data definition directive is used to define the variables, strings, classes and other data in the program, and his underlying bytecode is const .
chestnuts (denotes too much direct-_-)

Lock command

The lock instruction is used in multi-threaded programs to operate on the same object, and there are two lock directives in the Dalvik instruction set.

    1. "Monitor-enter VAA" acquires a lock for the specified object
    2. "Monitor-exit VAA" releases the lock for the specified object

Instance Operation directives

Instance-related operations include instance type transfer, check and new, etc.

    • "Check-case Vaa,[email protected]" Converts the object references in VAA to BBBB types;
(BBBB)vAA;
    • "Instance-of Va,vb,[email protected]" to determine whether the object reference in VB can be turned into a CCCC type, can va=1, not va=0;
if(vB.instanceof(type@CCCC)){    vA =1;}else{    0;}
    • "New-instance Vaa,[email protected]" Create a new BBBB object VAA,BBBB cannot be an array
BBBB vAA = new BBBB();
    • "Check-cast/jumbo Vaaaa,[email protected]" has the same effect as check-case Vaa,[email protected], just a larger range (Android 4.0 added)
    • "Instance-of/jumbo Vaaaa,vbbbb,[email protected]" and "instance-of Vaa,vbb,[email protected" have the same effect, but with a larger range of values (Android 4.0 new)
    • "New-instance/jumbo Vaaaa,[email protected]" has the same effect as new-instance Vaa,[email protected], just a larger range (Android 4.0 added)

Array manipulation directives

Array operations include getting the array length (refers to the number of entries in the array), creating new arrays, assigning values to arrays, and assigning values and assignments to array elements;

    • "Array-length Va,vb"
vA = vB.length//  将vB的长度赋值给vA
    • "New-array Va,vb,[email protected]"
vA = CCCC[vB]; //   构建一个vB大的CCCC类型的数组赋值给vA
    • The rest of the drawings

Exception directives

An instruction in the Dalvik instruction set is used to throw an exception

    • "Throw VAA" throws an exception of the specified type in the VAA register

Jump Instructions

There are three kinds of jump commands in the Dalvik instruction set: Unconditional jump (goto), branch jump (switch), conditional jump (IF)

  • "Goto +AA" unconditionally jumps to the specified offset, the offset AA cannot be 0;
  • "GOTO/16+AAAA" unconditionally jumps to the specified offset, the offset AAAA cannot be 0;
  • "GOTO/32+AAAAAAAA" unconditionally jumps to the specified offset;
  • "Packed-switch vaa,+bbbbbbbb" branch jump instruction. The VAA register is the value that needs to be judged in the Switch branch (switch (VAA)), bbbbbbbb points to an offset table in packed-switch-payload format, and the values in the table are regularly incremented. (First of all, so remember, interested can find Baidu.)
  • "Sparse-switch vaa,+bbbbbbbb" branch jump instruction, VAA register is the value that needs to be judged in the Switch branch (switch (VAA)), BBBBBBBB points to an offset table in the Sparse-switch-payload format, and the values in the table are irregular offsets.
  • "If-test VA,VB,+CCCC" conditional jump instruction, compare the value of VA and VB, if the comparison results to jump to CCCC the specified offset, the offset CCCC cannot be a 0,if-test type of instruction has the following:
    • "If-eq va, vb,: cond_xx" if VA equals VB jump to: cond_xx
    • "If-ne va, vb,: Cond_xx" If VA is not equal to VB then jump to: cond_xx
    • "If-lt va, vb,: Cond_xx" If VA is less than VB jump to: cond_xx
    • "If-ge va, vb,: Cond_xx" If VA is greater than or equal to VB then jump to: cond_xx
    • "If-gt va, vb,: Cond_xx" If VA is greater than VB jump to: cond_xx
    • "If-le va, vb,: Cond_xx" If VA is less than or equal to VB then jump to: cond_xx
  • "If-testz vaa,+bbbb" conditional jump instruction, that VAA compared with 0, satisfies the result or does not satisfy the result jumps to the BBBB the specified offset BBBB cannot be 0, the IF-TESTZ type instruction has the following several:
    • "If-eqz va,: cond_xx" if VA equals 0 jump to: cond_xx
    • "If-nez va,: cond_xx" If VA is not equal to 0 jump to: cond_xx
    • "If-ltz va,: cond_xx" If VA is less than 0 jump to: cond_xx
    • "If-gez va,: cond_xx" If VA is greater than or equal to 0 jump to: cond_xx
    • "If-gtz va,: cond_xx" If VA is greater than 0 jump to: cond_xx
    • "If-lez va,: cond_xx" If VA is less than or equal to 0 jump to: cond_xx

Compare directives

Comparison directives are used to compare the values of two registers (floating-point or long) in the form of:
"Cmpkind VAA,VBB,VCC"

There are 5 comparison directives in the Dalvik command set:

    • "Cmpl-float" compares two values of float;
if(vBB == vCC){    vAA =0;}elseif(vBB>vCC){    vAA = -1;}elseif(vBB<vCC>){    vAA = 1;}
    • "Cmpg-float" compares the values of two float
if(vBB == vCC){    vAA =0;}elseif(vBB<vCC){    vAA = -1;}elseif(vBB>vCC>){    vAA = 1;}

When CMPG or CMP, B > C is a = 1, vice versa-1; When Cmpl, B > C is a = 1 whereas 1;

    • "Cmpg-double" compares the values of two double
    • "Cmpl-double" compares the values of two double
    • "Cmp-long" compares two long values

Field Operation directives

The field action directives are used to read and write the fields of an object instance.
The type of the field can be a valid data type in Java, with two sets of instructions for normal and static field operations, namely "Iinstanceop Va,vb,[email protected" and "Sstaticop Vaa,[email protected]" .
In the Android 4.0 system, there are "Iinstanceop/jumbovaaaa,vbbbb,[email protected" and "Sstaticop/jumbo Vaaaa,[email protected]". The same as the above two functions, just add the Jmpbo suffix, the register and the instruction index value range is larger (the following will only say the/jumbo instruction suffix of the instruction set, the role is not specified)
The instruction prefix of the normal field instruction is I, as in. For normal field reads use the iget instruction, the write operation uses the iput instruction, and the instruction prefix for the static field is S, as in. The read operation for the static field is Sget, and the write operation is sput;
Depending on the type of field accessed, the field action instruction is followed by the suffix of the field type, such as the iget-byte instruction, which indicates that the value type of the Read instance field is byte;

Method Invocation Directives
The method invocation instruction is responsible for invoking the method of the class instance, its underlying instruction is invoke, the method invocation instruction has "invoke-kind{vc,vd,ve,vf,vg},[email protected]" and "INVOKE-KIND/RANGE{VCCCC ... Vnnnn},[email protected] "two classes, these two types of instruction function is not different, the latter when the parameter register is set to use a range to specify the scope of the register, depending on the method type, there are 5 methods to invoke the command:

    1. Virtual method for "Invoke-virtual" call instance
    2. "Invoke-super" invokes the parent class method of the instance
    3. Direct method for "Invoke-direct" to invoke an instance
    4. static method for "Invoke-static" call instance
    5. Interface method for "Invoke-interface" invoke instance
      Android 4.0 has a jumbo instruction set;

The return value of the method invocation instruction must be obtained using the move-result* directive:
Invoke-static{},landroid/os/parcel;->obtain () Landroid/os/parcel;
Move-result-object V0;

Data Conversion Directives
The Data Conversion directive is used to convert one type of numeric value to another, and his format is "Unop Va,vb" to do a certain operation (conversion) of the data in VB in VA: (relatively simple, direct)

Data Operation Instructions

The data operation instruction includes the arithmetic operation instruction and the logical operation instruction:

    • Arithmetic operation instruction: Add, subtract, multiply, divide, mold, shift, etc.
    • Logical Operation directives: inter-and, or, non-, or so;
      Last picture.

      Where the base byte code behind the-type can be-int,-long,-float,-double, the following 3 classes of instructions are similar, not listed, comprehend by analogy;

== == == == == == == == == == == == == == == == = = = = == = = == = = =

Android mobile Reverse (iii)-android Dalvik virtual machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.