Before analyzing the running of the Dalvik virtual machine, you must first understand the davlik command, but before learning about the davlik command, you must first understand the command format so that you can understand how commands are structured, expressed, and viewed, next we will carefully learn the instruction format, so that when we see the instruction in the code, we will naturally know what this instruction is. There is such a document under the directory of the Dalvik Virtual Machine. Let's take a closer look at it as follows:
This article describes the bytecode Instruction format in the Dalvik virtual machine. This instruction format is used in another document, instruction bytecode reference.
Instruction bit description
The first column in the table below shows the layout of each command bit. Each 16-bit word is separated by a space. Each letter represents four digits, and each letter is sorted from the upper section to the lower byte in order. Vertical bars (|) may be used between four digits to indicate different content, improving the readability of the instruction. Uppercase letters A and B are used to indicate the meaning of each four digits. Op indicates the eight-digit operation code, and 0 indicates that all digits of this field are 0.
For example, if the command format is "B | A | opcccc", it indicates that this command has two 16-bit characters, because there are spaces in the middle, each of which represents 16 characters separately. The first 16 bits are "B | A | op". The high byte is composed of B and A, and the low byte is composed of the OP code. The second 16-bit is composed of CCCC, which represents a 16-bit value.
Command Format identifier
The second column in the table below is the instruction format encoding, which is used in other documents and code to describe the instruction format. The Instruction format encoding is composed of three characters. The first two are numbers and the last one is letters. The first number represents the number of 16-bit characters to form this command. The second number indicates the maximum number of registers used by this command. The last letter indicates other data processing commands. For example, the "21t" format is composed of two 16-bit characters, using a register, t indicates that there is a branch. It is possible to add a letter "S" to the end to indicate static connections.
The following table defines the following letters and meanings:
Letter |
Number of digits |
Meaning |
B |
8 |
The byte is unsigned immediately. (Byte) |
C |
16, 32 |
Constant pool Index |
F |
16 |
Interface constant |
H |
16 |
Indicates the value in the upper order. |
I |
32 |
An unsigned integer or 32-bit floating point number. |
L |
64 |
An unsigned long integer or a 64-bit double-precision floating point number. |
M |
16 |
Method constant. |
N |
4 |
The number of unsigned half bytes immediately. |
S |
16 |
An unsigned short integer. |
T |
8, 16, 32 |
Jump, Branch. |
X |
0 |
No additional data. |
Syntax description
In the third column of the table below, familiar syntaxes are used to describe how commands work. Each command starts from the operation code and follows the following parameters. The number of parameters is not fixed. Each parameter is separated by a comma. Regardless of the field in the first column, if a four-digit letter is used, this method is also used here. For example, if 8 bits are represented as "BB" in the first column, "BB" is also used in the syntax description. When a parameter is represented by a register, the "VX" method is used. Here, V is used instead of R to avoid conflicts with the representation in the code.
When the parameter is a constant number, it is expressed as "# + X", and when the parameter is expressed as "+ X" for the access relative address, when the parameter is a constant pool index of the table method, it is represented as "kind @ X", and kind indicates that the constant pool is used.