This article is reprinted and pasted here to facilitate searching for some commands.
From: http://blog.csdn.net/canfengxiliu/article/details/20144119
---------------------
Disclaimer: This article is based on the notes written by <Android software security and reverse analysis>.
Dalvik Command FormatA piece of Dalvik assembly code consists of a series of Dalvik commands. The command syntax is determined by the instruction bit description and the Instruction format identifier. Bit descriptions are as follows: ● every 16 characters are separated by spaces. ● Each letter represents four digits, and each letter starts from the high byte in order and is arranged to the low byte. The vertical line "|" may be used between four digits to indicate different content. ● The sequence is ~ A single uppercase subtitle of Z is used as a 4-bit operation code, and OP represents an 8-bit operation code. ● "Phi" indicates that all bits in this field are 0 values. The Command Format "A | G | op BBBB f | E | d | C" is used as an example. There are two spaces in the command, each of which is 16 characters in size, therefore, this command consists of three 16 characters. The first 16 bits are "a | G | op". The 8 bits are composed of A and G, and the low bits are composed of OP codes. The second 16-bit is composed of BBBB, which represents a 16-bit offset value. The third 16-bit consists of four 4 Bytes: F, E, D, and C. Here they represent register parameters. A single bit indicates that a command cannot be determined. The format encoding must be specified by the Command Format identifier. Its conventions are as follows: ● Command Format identifiers are mostly composed of three characters, the first two are numbers, and the last one is letters. ● The first digit represents the number of 16 characters in a command. ● The second number indicates the maximum number of registers used by commands. The special mark "R" identifies registers within a certain range. ● The third letter is the type code, indicating the type of extra data used by the command. The values are shown in the following table.
Mnemonic |
Bit size |
Description |
B |
8 |
8-Bit Signed instant count |
C |
16, 32 |
Constant pool Index |
F |
16 |
Interface constant (only valid for the static link format) |
H |
16 |
Signed immediate count (32-bit or 64-bit high-value bits, low-value bits are 0) |
I |
32 |
Immediate number, signed integer or 32-bit floating point number |
L |
64 |
Immediate number, signed integer or 64-bit double-precision floating point number |
M |
16 |
Method constant (only valid for the static link format) |
N |
4 |
4-digit immediate count |
S |
16 |
Immediate number of short Integers |
T |
8, 16, 32 |
Jump, Branch |
X |
0 |
No additional data |
The following example uses the command format to identify 22x: the first digit 2 represents a combination of two 16-bit characters, and the second digit 2 represents that the command uses two registers, the third letter X indicates that no additional data is used. The Dalvik command describes the syntax as follows: ● each command starts from the operation code, followed by a parameter, and the number of parameters is not fixed. Each parameter is separated by a comma. ● The parameters of each command start from the first part of the command. The op is at the lower 8 bits. The higher 8 bits can be an eight-bits parameter or two four-bits, it can also be blank. If the command exceeds 16 bits, the subsequent part is used as the parameter. ● if the parameter is expressed as "VX", it indicates that it is a register, such as V0 and V1. V is used here instead of R to avoid conflicts with Register names based on the virtual machine architecture itself. For example, the ARM architecture register name starts with R. ● If the parameter is expressed as "# + X", it indicates that it is a constant number. ● If the parameter is expressed as "+ X", it indicates that it is an address offset relative to the instruction. ● If the parameter is expressed as "[email protected]", it indicates that it is the index value of a constant pool. Kind indicates the constant pool type, which can be a string String constant pool index), type (type constant pool index), and field (field constant pool index) or "meth" (method constant pool index ). A document instruction-formats.html is provided under the Dalvik/docs directory of andorid4.0 source code, which lists in detail all the formats of the Dalvik command.
Type, method, and field representation of Dalvik bytecode1. Type
Syntax |
Description |
V |
Void, used only for the return value type |
Z |
Boolean |
B |
Byte |
S |
Short |
C |
Char |
I |
Int |
J |
Long |
F |
Float |
D |
Double |
L |
Java class type |
[ |
Array type |
The L type can represent any class in the Java type. These classes are represented in the Dalvik assembly code by package. Name. objectname in Java code. They are expressed in the form of lpackage/name/objectname. Note that there is a semicolon at the end. [Types can represent arrays of all basic types. [Follows the basic type descriptor. For example, [I indicates int [], [I indicates int [] [], and so on. Note that the number of dimensions of a multi-dimensional array is 255 at most. 2. The method format is as follows: lpackage/name/objectname;-> methodname (iii) Z lpackage/name/objectname is the method of the class to which the function belongs. Methodname indicates the specific method name. (Iii) Z is the signature part of the method. III is the parameter of the method (three int parameters are included here), and Z is the return type (Boolean Type) of the method. 3. the format of field fields is as follows: lpackage/name/objectname;-> fieldnmae: lpackage/name/objectname; field type (lpackage/name/objectname;), field name (fieldname) it is composed of the field type (lpackage/name/objectname. The field names and field types are separated by the colon ":" to separate the field code generated by baksmali. The beginning of the Field Command. According to the different field types, the start of the field command may be annotated with, for example, "# instance fields" indicates that this is an instance field "# static fields" indicates that this is a static field. The syntax and auxiliary characters of the Dalvik command have the following characteristics:
1. parameters are transmitted from the target to the source.
2. According to the size and type of bytecode, some bytecode are suffixed with names to eliminate ambiguity.
● 32-bit general-type bytecode without any suffix ● 64-bit general-type bytecode added-wide suffix
● Add a suffix for a special type of bytecode based on the specific type. They can be-Boolean,-byte,-Char,-short,-int,-long,-float,-double,-object,-string,-void.
3. According to the layout and options of bytecode, some bytecode suffixes are added to eliminate ambiguity. These suffixes are separated by adding a slash "/" to the suffix of the bytecode Master name.
4. In the instruction set description, each subtitle in the width value indicates four characters in width. For example, this command: "Move-wide/from16 Vaa, vbbbb" move is the basic bytecode. MARK: this is a basic operation. Wide is the name suffix. Specifies the data width (64-bit) of the command operation ). From16 is the suffix of the bytecode. Identifies the source as a 16-bit register reference variable. VAA is the destination register. It always comes before the source. The value range is V0 ~ V255. Vbbbb is the source register. The value range is V0 ~ V65535 most commands in the Dalvik Instruction Set use registers as the destination or source operands. A/B/C/D/E/F/g/h Represents a four-digit value, can be used to represent V0 ~ Registers of V15. AA/BB/.../HH represents an 8-digit value. AAAA/BBBB/.../hhhh represents a 16-bit value.
Null Operation CommandThe mnemonic of the null Operation Command is NOP. The value is 00, and the notification is used for alignment code without actual operations.
Data operation commandsThe data operation command is move. The prototype of the move command is the move target and source. The move command is followed by different suffixes Based on the bytecode size and type. Move-Object/from16 VAA and vbbbb assign values to objects. The source register is 8 bits, and the destination register is 16 bits. Move-Object/16 vaaaa, vbbbb for object replication. Both the Source and Destination registers are 16-bit move-result-wide VAA pairs of the previous invoke TYPE command operation (if there is no-wide, it is single) the non-object result is assigned to the VAA register move-result-object Vaa, and the non-object result of the previous invoke TYPE command operation is assigned to the VAA register move-exception VAA to save a running exception VAA register. This command must be a command of the abnormal processor where an exception occurs. Otherwise, the command is invalid.
Return commandThe return instruction refers to the last instruction that is run at the end of the function. There are four Return commands: Return-voidreturn vaareturn-wide vaareturn-object VAA
Data Definition commandsData Definition commands are used to define constants, strings, classes, and other data used in a program. Its basic bytecode is const. Const/4 va, # + B extend the numeric symbol to 32 bits and then assign it to the Register vaconst/16 Vaa, # + BBBB extend the numeric symbol to 32 bits and assign the value to the Register vaaconst VAA. # + bbbbbbbb pay the value to the Register vaaconst/high16 VAA, # + bbbb0000 extend the value 0 to 32 bits to the Register vaaconst-wide/16 Vaa, # + BBBB extend the numeric symbol 64-bit and then assign the value to the Register pair vaaconst-wide Vaa, # + bbbbbbbbbbbbbbbb assign the value to the Register pair vaaconst-wide/high16 VAA, # + bbbb000000000000 extend the value 0 to 64 bits and then pay the Register to vaaconst-string Vaa, [email protected] constructs a string through the string index and assigns it to the register for vaaconst-string/jumbo Vaa, [email protected] using the string index (relatively large) construct a string and pay the Register to vaaconst-class VAA. [email protected] obtain a class reference through the type index and pay the register vaaconst-class/jumbo vaaaa, [email protected] obtain a class index from the index of the given type and pay it to the Register vaaaa (this instruction occupies two bytes with a value of 0x00ff, which is a new instruction in android4.0)
Lock commandLock commands are used to operate the same object in multi-threaded programs. The Dalvik instruction set contains two lock commands. Monitor-enter VAA gets the lock for the specified object Monitor-exit VAA releases the lock for the specified object
Instance operation commandsInstance-related operations include instance type conversion, check, and new check-cast VAA. [email protected] converts the object reference in the VAA register to the specified type, if it fails, a classcastexception is thrown. If type B specifies the basic type, for non-basic type A, the runtime will always fail. Instance-of VA, VB, [email protected] determines whether the object reference in the VB register can be converted to the specified type. If the VA register can be assigned a value of 1, otherwise the VA register is 0new-instance Vaa, [email protected] constructs a new instance of an object of the specified type and assigns the object reference value to the VAA register. The type specified by the Type symbol cannot be an array class. Check-cast/jumbo vaaaa, [email protected] instance-of vaaaa, vbbbb, [email protected] New-instance/jumbo vaaaa, [email protected] These three commands correspond to the preceding three commands respectively, but the register value and the index value range of the commands are too large (commands added in android4.0)
Array Operation commandsArray Operations include reading the length of an array, creating an array, assigning values to an array, and assigning values to an array element. Array-length va. VB obtains the length of the array in the given VB register and assigns the value to the VA register. array length refers to the number of entries in the array. New-array va, VB, [email protected] constructs an array of the specified type ([email protected]) and size (VB), and assigns the value to the VA register. New-array/jumbo vaaaa, vbbbb, [email protected] command function is the same as the previous command, but the range of register and instruction index value is larger (the new instruction in android4.0) filled-New-array {VC, Vd, VE, VF, VG}, [email protected] construct the specified type ([email protected]) and size (VA) and fill in the array content. Va registers are implicitly used. In addition to specifying the size of the array, the number of parameters is also specified ~ VG is the parameter register sequence filled-New-array/range {vcccc ,..., vnnnn}, [email protected] specifies the function is the same as the previous command, but the parameter Register uses the range bytecode suffix to specify the value range, VC is the first parameter register, n = a + C-1. Filled-New-array/jumbo {vcccc ,..., vnnnn}, the [email protected] command function is the same as the previous command, but the index value range of the register and command is greater (the instruction added in android4.0) Fill-array-data Vaa, + bbbbbbbb fills the array with the specified data. The VAA register is an array reference and the reference must be an array of the basic type. The command is followed by a data table arrayop Vaa, vBB, VCC enters the value and value assignment for the array elements specified by the vBB register. The VCC Register specifies the array element index, and the VAA register is used to forward the value of the read or array element to be set. The read element uses the aget class instruction, the element value assignment uses the aput instruction, and the element value assignment uses the aput class instruction. According to the Type instruction stored in the array, different instruction suffixes are followed, command lists include aget, aget-wide, aget-object, aget-Boolean, aget-byte, aget-Char, aget-short, aput, aput-wide, aput-Boolean, and aput. -byte, aput-Char, and aput-short.
Exception commandThe Dalvik Instruction Set has a command to throw an exception. Throw VAA throws an exception of the specified type in the VAA register.
Jump commandThe jump command is used to jump from the current address to the offset specified by the child. There are three jump commands in the Dalvik Instruction Set: Goto, switch, and if ). Goto + AA jumps to the specified offset unconditionally. The offset AA cannot be 0 goto/16 + AAAA and the offset AAAA cannot be 0. Goto/32 + aaaaaaaa jump to the specified offset unconditionally. Packed-switch Vaa, + bbbbbbbb branch jump command. The VAA register is the value to be determined in the switch branch. bbbbbbbbbb points to an offset table in the packed-switch-payload format, and the values in the table increase regularly. Sparse-switch Vaa, + bbbbbbbbbb branch jump command. The VAA register is the value to be determined in the switch branch. bbbbbbbbbb points to an offset table in the sparse-switch-payload format. The value in the table is an irregular Offset Table, the value in the table is an irregular offset. If-test va, VB, + CCCC condition jump command. Compare the values of the VA and VB registers. If the comparison result is satisfied, the system jumps to the offset specified by CCCC. The offset CCCC cannot be 0. Commands of the IF-test type include the following: ● if-EQ redirects if VA is not equal to VB. The Java syntax is if (Va = VB) ● if-ne, if VA is not equal to VB, jump. Java Syntax: If (va! = VB) ● if-lt, if VA is smaller than VB, It is redirected. The Java syntax is if (va <VB) ● if-Le, if VA is smaller than or equal to VB, It is redirected. Java Syntax: If (va <= VB) ● if-GT: Jump if VA is greater than VB. Java Syntax: If (va> VB) ● if-Ge: jumps if VA is greater than or equal to VB. Java Syntax: If (va> = VB) If-testz Vaa, + BBBB conditional jump command. Compare the VAA registers with 0. If the comparison result is satisfied or the value is 0, the system jumps to the offset specified by Bbbb. The offset BBBB cannot be 0. Command of the IF-testz type has several items: ● if-nez, if VAA is 0, the jump is made. Java Syntax: If (VAA = 0) ● if-eqz if VAA is not 0, the jump is made. Java Syntax: If (VAA! = 0) ● if-LTZ jumps if VAA is smaller than 0. Java Syntax: If (VAA <0) ● if-lez, if VAA is less than or equal to 0, the jump is made. Java Syntax: If (VAA <= 0) ● if-GTZ jumps if VAA is greater than 0. Java Syntax: If (VAA> 0) ● if-GEZ, if VAA is greater than or equal to 0, the jump is made. Java Syntax: If (VAA> = 0)
Comparison commandThe comparison command is used to compare the values of two registers (floating point type or long integer type. The format is cmpkind Vaa, vBB, and VCC. The vBB register and VCC register are two register or two register pairs to be compared, and the comparison result is placed in the VAA register. There are five comparison commands in the Dalvik command set. CMPL-float compares two single-precision floating point numbers. If the vBB register is smaller than the VCC register, the result is 1. If the value is equal, the result is 0. If the value is greater than the value, the result is-1. Cmpg-float compares two single-precision floating point numbers. If the vBB register is greater than the VCC register, the result is 1. If the value is equal, the result is 0. If the value is smaller than the value, the result is-1. CMPL-double compares two double-precision floating point numbers. If the vBB register is smaller than the VCC register, the result is 1. If the value is equal, the result is 0. If the value is greater than the value, the result is-1. Cmpg-double compares two double-precision floating point numbers. If the vBB register is greater than the VCC register, the result is 1. If the value is equal, the result is 0. If the value is smaller than the value, the result is-1. CMP-long compares two long integers. If the vBB register is greater than the VCC register, the result is 1. If the value is equal, the result is 0. If the value is smaller than the value, the result is-1.
Field Operation instructionsThe field operation command is used to read and write the fields of the object instance. The field type can be a valid data type in Java. There are two instruction sets for operations on common fields and static fields: iinstanceop va, VB, [email protected] And sstaticop VAA, [email protected] The command prefix of a common field command is I. For example, the iget command is used for reading a common field and the Iput command is used for write operations. The command prefix of a static field is S, for example, the sget command is used for static field read operations, and sput commands are used for write operations. Depending on the type of the accessed field, the field operation command follows the suffix of the field type. For example, the iget-byte command indicates that the value type of the read/write instance field is byte, the Iput-short command indicates that the value type of the Instance field is set to a short integer. The operation results of the two types of commands are the same, except that the command prefix is different from the field type of the operation. Common Field Operation Commands include: iget, iget-wide, iget-object, iget-Boolean, iget-byte, iget-Char, iget-short, Iput, Iput-wide, Iput-object, Iput-Boolean, iput-byte, Iput-Char, and Iput-short. Static Field Operation Commands include: sget, sget-wide, sget-object, sget-Boolean, sget-byte, sget-Char, sget-short, sput, sput-wide, sput-object, sput-Boolean, sput-byte, sput-Char, and sput-short. In the android4.0 system, instanceop/jumbo vaaaa, vbbbb, [email protected], sstaticop/jumbo vaaaa, and [email protected] are added to the Dalvik command set, they serve the same purpose as the two types of commands described above, except that jumbo bytecode suffixes are added to the commands, and the range of register values and Directive indexes is greater.
Method call commandThe Method Invocation command is used to call methods of class instances. Its basic commands are invoke. The commands used for method invoke include invoke-kind {VC, Vd, VE, VF, VG }, [email protected] and invoke-kind/range {vcccc ,..., vnnnn} and [email protected] are two types of commands, which do not have different functions. However, range is used to specify the register range when parameter registers are set. Depending on the method type, there are five methods to call commands: invoke-virtual or invoke-virtual/range call the virtual method invoke-super or invoke-Super/range of an instance call the instance's parent method invoke-direct or invoke-Direct/range the Direct Method of invoke-static or invoke-static/range calls the instance's static method of invoke-interface or invoke-Interface/range in the android4.0 system, invoke-kind/jumbo {vcccc ,..., vnnnn}, [email protected], which serves the same purpose as the preceding two types of commands, but adds the jumbo bytecode suffix to the commands, the range of register value and Directive index value is greater. You must use the move-result-* command to obtain the returned value of the command for Method Invocation. The following two Commands: invoke-static {}, landroid/OS/parcel;-> obtain () landroid/osparcel; move-result-object V0
Data conversion commandsThe data conversion command is used to convert a value of one type to another type in the format of unop va and VB. The VB or vB register pair stores the data to be converted. The converted result is stored in the VA register or VA register pair. Neg-int integer complement not-int complement neg-long complement not-long complement neg-float complement neg -Double calculate the double-precision floating-point type and fill int-to-long to convert the integer number to a long integer int-to-float to convert the integer number to a single-precision floating-point int-to-double to convert the integer number. convert to double precision float type long-to-int type to convert long integer to integer long-to-float type to single precision float type long-to-double type to convert long integer double-precision float-to-int converts Single-precision float to integer float-to-long and converts Single-precision float to long float-to-double. double-to-int converts double-precision floating point to integer double-to-long converts double-precision floating point to long integer double-to-float converts double-precision floating point to single-precision floating point int- to-byte: Convert integer to byte int-to-Char. Convert integer to string int-to-short to short integer.
Data operation commandsData operation Commands include arithmetic operation commands and logical operation commands. Arithmetic Operation commands are used to perform operations between values, such as addition, subtraction, multiplication, division, modulo, and shift. logical operations are mainly used to perform operations between values and, Or, non, exclusive, or. There are four types of data operation commands (the data operation may be performed between registers or register pairs. the following commands are described by registers when used): binop Vaa, vBB, VCC performs operations on the vBB register and the VCC register, and saves the results to the VAA register binop/2 ADDR va. VB performs operations on the VA register and the VB register, the results are saved to the VA registers binop/lit16 va, VB, # + CCCC, And the VB registers and constant CCCC are computed. The results are saved to the VA registers binop/lit8 Vaa, vBB, # + CC calculate the vBB register and the constant CC, and save the result to the VAA register. The three types of commands have ADDR, lit16, lit8, and other instruction suffixes more than those of the 1st class commands. In the four types of commands, the base bytecode is followed by a data type suffix. For example,-Int or-long indicates that the data type of the operation is an integer or a long integer. 1st commands can be classified as follows: add-type vBB register and VCC register value addition operation (vBB + VCC) sub-type vBB register and VCC register value subtraction (vBB-VCC) mul-type vBB register and VCC register value multiplication (vBB * VCC) div-type vBB register and VCC register value division operation (vBB/VCC) REM-type vBB register and VCC register value modulo operation (vBB % VCC) and-type vBB registers and VCC register values (vBB & VCC) or-type vBB registers and VCC register values (vBB | VCC) XOR-type vBB registers and VCC register values undergo exclusive or operation (vBB ^ VCC) SHL-type vBB registers (number of symbols) shift left VCC bits (vBB <VCC) SHR-type vBB register (number of symbols) shift right VCC bit (vBB> V CC) ushr-type vBB register (unsigned number) shift right VCC bit (vBB> VCC) the-type following the basic bytecode can be-int,-long,-float, and-double. The following three types of commands are similar. So far, all commands supported by the Dalvik virtual machine have been introduced. Before Android, the bytecode of each instruction was used only in one byte, with a value ranging from 0x0 ~ -0x0ff. In the android4.0 system, some commands are extended. These commands are extended commands. If the jumbo suffix is added after the host times, the range of registers and constants is increased.
For a simple exercise, let's take a look at writing Hello world for Dalvik.