Dalvik is an important component of Android. It is of great help to understand its operating mechanism. This article introduces the single-step debugging of GDB and the use of dexdump tools, hoping to lay a foundation for exploring Dalvik.
1. Dalvik Compilation
To facilitate debugging of Dalvik, We need to compile a Dalvik running on x86 and related tools. The compilation procedure is as follows:
- First, go to the android source code root directory.
- Source build/envsetup. Sh)
- Lunch 2 then we can see that target_product is sim. Target_arch is x86
- Make or make dalvikvm and make dexdump (make is time-consuming to compile all programs. Sometimes some modules cannot be compiled. To save time, you can use make dalvikvm to directly compile Dalvik, make dexdump compile dexdump directly)
2. GDB preparation for debugging Dalvik
When using GDB to start Dalvik, you need to set up some environments, which is complicated. Here, you need to create a script to simplify these processes. The Script Name Is grund. Sh, which is stored in the android source code root directory. The script content is as follows.
#! /Bin/sh
Base = 'pwd'
Root = $ base/out/debug/host/linux-x86/PR/SIM/System
Export android_root = $ Root
Bootpath = $ root/framework
Export bootclasspath = $ bootpath/CORE. jar: $ bootpath/EXT. jar: $ bootpath/framework. jar: $ bootpath/Android. Police. Jar
Export android_data =/tmp/dalvik_test
Mkdir-p $ android_data/Dalvik-Cache
Exec GDB $ root/bin/dalvikvm
3. debug Dalvik using GDB
- Prepare a simple Java program, such as hello. java. After compilation, copy hello. jar to the android source code root directory. (For Hello. Java and makefile, see the appendix)
- Go to the android source code root directory
- ./Grund. Sh (execute the above script and you will see the gdb prompt)
- Enter "set ARGs-CP hello. Jar hello" at the gdb prompt"
- In this case, you can set a breakpoint and perform one-step tracking! If you are not familiar with GDB, please google it. The main () function is the entry function. Set a breakpoint in line main. c 212 (enter "B 212" at the gdb prompt ")
- Enter "R" and "OK" at the gdb prompt. We can see that Dalvik is started and executed by GDB, and then stops at line 1, before executing the jni_createjavavm function, check the content of gdvm (input P gdvm ). Then run the jni_createjavavm function (input "N") and check the content of gdvm. By comparing the changes before and after execution, you can probably know what the jni_createjavavm function is doing.
- The main. c 249 line of code is used to load hello. Class and set a breakpoint in line 249. After the interruption, check the content of slashclass (input "P slashclass"). slashclass is the "hello" string. Next, step in and execute it (input "S"), and view the function call stack (input "BT "). We can see that the findclass function in JNI. C is being executed. Through this method, we can see what function the function pointer points.
- The 255 line of Main. C code is used to obtain the bytecode after the main function is compiled in hello. java. Similar to step G, we can see that the function executed at this time is the getstaticmethodid function in JNI. C.
- The Byte Code after the main. c 273 line code executes the main function compilation, which is similar to the G step. We can see that the function executed at this time is the 2681 line in JNI. C. The macro definition is not easy to find. However, it can be precisely located through GDB debugging. If the program continues to run at this time, "Hello World" will appear in front of us!
This is only a simple analysis. You can explore the content you are interested in. In the following article, the execution of class loading and bytecode will be analyzed in detail.
4. dexdump to view the JAR File
Dexdump: the executable file is stored in the out directory. You can use the "Find out/-name dexdump" command to find dexdump.
The "dexdump-F hello. Jar" command can print the header information of the JAR file.
"Dexdump-d hello. Jar" can print the compiled bytecode.
The header information is as follows:
Opened 'hello. jar', Dex version '035'
Dex file header:
Magic: 'dex
035'
Checksum: f2f85a9c
Signature: 0404... 7831
File_size: 740
Header_size: 112
Link_size: 0
Link_off: 0 (0x000000)
String_ids_size: 14
String_ids_off: 112 (0x000070)
Type_ids_size: 7
Type_ids_off: 168 (0x0000a8)
Field_ids_size: 1
Field_ids_off: 232 (0x0000e8)
Method_ids_size: 4
Method_ids_off: 240 (0x0000f0)
Class_defs_size: 1
Class_defs_off: 272 (0x000110)
Data size: 436
Data_off: 304 (0x000130)
String_ids, type_ids, field_ids, method_ids, and class_defs can all be regarded as indexes. Through these indexes, you can find the real data storage location. Data_off is the real data storage location.
The bytecode is as follows:
#1: (IN lhello ;)
Name: 'main'
Type: '([ljava/lang/string;) V'
Access: 0x0009 (public static)
Code-
Registers: 3
INS: 1
Outs: 2
Insns size: 10 16-bit code units
000148: | [000148] Hello. Main :( [ljava/lang/string;) V
000158: 6200 0000 | 0000: sget-object v0, ljava/lang/system;. Out: ljava/IO/printstream; // field @ 0000
00015c: 1a01 0900 | 0002: const-string V1, "Hello World" // string @ 0009
000160: 6e20 0200 1000 | 0004: invoke-virtual {v0, V1}, ljava/IO/printstream;. println :( ljava/lang/string;) V // method @ 0002
000166: 2a00 0000 0000 | 0007: goto/32 #00000000
Catches: (none)
Positions:
0x0000 line = 4
0x0007 line = 5
Locals:
Virtual Methods-
Source_file_idx: 10 (hello. Java)
Appendix:
Hello. Java:
Public class Hello {
Public static void main (string ARGs []) {
System. Out. println ("Hello World ");
While (true ){}
}
}
Makefile:
Android_src_dir: =/Android/platform_sim
Android_dir_dx = $ (android_src_dir)/out/host/linux-x86/bin/dx
ALL:
Javac hello. Java
$ (Android_dir_dx) -- Dex -- output = Hello. Jar hello. Class
Clean:
@ RM *. jar *. Class
After the Java source code is compiled, a file suffixed with class is generated, that is, a bytecode file. Then, use the DX tool in Android to convert it to a DEX file with the suffix jar. The Dalvik virtual machine is responsible for interpreting and executing the compiled bytecode. Before interpreting and executing the bytecode, you must read the file, analyze the content of the file, and obtain the bytecode before performing the interpretation. In the entire loading process, the most important thing is to load the class-the class contains method, and the method also contains code. By loading the class, we can obtain the bytecode to be executed.
This article starts with dexfile File Analysis and the data structure in class loading, and analyzes the entire loading process in combination with the main process. It is expected to help you.
1. dexfile ing in memory
In the Android system, the Java source file is compiled into a DEX file with a suffix of jar, which is called dexfile in the code. Before loading the class, you must read the corresponding JAR file. We usually use the READ function to read the content in the file. However, using the MMAP function in Dalvik is different from the READ function. The MMAP function maps the DEX file to the memory, so that the content in the DEX file can be accessed through normal memory read operations.
Shows the file format of dexfile, which consists of three parts: Header, index, and data. We can see the index position and number through the header, and the starting position of the Data zone. Classdefsoff specifies the starting position of classdef in the file, and dataoff specifies the starting position of data in the file. classdef can be understood as the class index. You can obtain the basic information of the class by reading classdef. classdataoff specifies the position of the class data in the data zone.
After the dexfile file is mapped to the memory, the dexfileparse function is called to analyze the file. The analysis results are stored in the data structure named dexfile. In dexfile, baseaddr points to the starting position of the ing area, and pclassdefs points to the starting position of classdefs (class index. Because the class name is used for searching, a hash table is created to speed up the searching. Hash the class name in the hash table and generate an index. These operations are all completed during file parsing, so although the loading process is time-consuming, it can indeed save a lot of search time during the running process.
2. classobject-class representation after loading
After parsing the file, you need to load the specific content of the class! In Dalvik, The classobject Data Structure stores the loaded information. As shown in, the loading process stores directmethods, virtualmethods, sfields, and ifields in the alloc areas of the memory. This information is read from the data zone of the DEX file. First, the system reads the detailed information about the class, and obtains information about directmethod, virtualmethod, sfield, ifield, and so on. This is the diagram after loading. This section does not describe the loading details. If you are interested, you can use the two figures to analyze them.
Note that there is a member named super in the classobject structure. The super member is used to point to its superclass.
3. findclassnoinit-the function that loads the class and generates the corresponding classobject.
Section 2 introduces the loaded data structure. This section analyzes the function findclassnoinit for loading. Note that the class index can be divided into two types: basic class library files and user class files. Grund. sh contains the statement "Export bootclasspath = $ bootpath/core. jar: $ bootpath/Ext. jar: $ bootpath/framework. jar: $ bootpath/android. police. jar"
. This statement specifies the basic library file required by Dalvik. Without this statement, Dalvik will return an error and exit during startup.
The loadclassfromdex function first reads the specific data of the class (from classdataoff), and then loads directmethod, virtualmethod, ifield, and sfield as shown in the following figure.
As the product of the top companies in the industry, we must pay attention to the execution efficiency. First, we need to cache it after loading for future convenience. Secondly, in the search process, if we search in sequence, it is of course very slow. This is of course unacceptable by our senior engineers. Therefore, the hash table gdvm. loadedclasses has been launched. What, this classmate said there are not several classes. Is it so big. Let's take a look. Through the methods described in the preparation section, we set breakpoints in line main. c 249. At this time, the basic library has been loaded. When the program stops, let's look at the value of gdvm. We can see that the value of numloadedclasses is 212.
! That means we didn't do anything at this time, and the user class was not loaded. The number of classes that Dalvik has loaded has reached 212.
Dvmlinkclass, which is long, but eventually seems to call findclassnoinit again. Well, it is understandable. If a subclass needs to call a superclass function, it must first load the superclass. If possible, it may even load the superclass ^_^.
It is a virtual object. You can debug it with GDB.
Set a breakpoint in the findclassnoinit function (enter "B findclassnoinit" after the gdb prompt) and execute "C" and "BT" several times in a row after the gdb prompt ". The following information is displayed. You can view findclassnoinit multiple times on the function call stack.
(GDB) BT
#0 findclassnoinit (descriptor = 0xfef4c7f4 "?????? % ", Loader = 0x0, pdvmdex = 0x0)
At Dalvik/Vm/OO/class. C: 1373
#1 0xf6fc4d53 in dvmfindclassnoinit (descriptor = 0xf5046a63 "ljava/lang/object;", loader = 0x0)
At Dalvik/Vm/OO/class. C: 1194
#2 0xf6fc6c0a in dvmresolveclass (referrer = 0xf5837400, classidx= 290,
Fromunverifiedconstant = false) at Dalvik/Vm/OO/resolve. C: 94
#3 0xf6fc3476 in dvmlinkclass (clazz = 0xf5837400, classesresolved = false)
At Dalvik/Vm/OO/class. C: 2537
#4 0xf6fc1b67 in findclassnoinit (descriptor = 0xf6ff0df6 "ljava/lang/class;", loader = 0x0,
Pdvmdex = 0xa04c720) at Dalvik/Vm/OO/class. C: 1489
Now let's look at it from another angle. Set a breakpoint in row 2575 of class. C and wait for the program to stop. Check the content of clazz.
(GDB) P clazz-> super-> Descriptor
$6 = 0xf5046a63 "ljava/lang/object ;"
(GDB) P clazz-> Descriptor
$7 = 0xf5046121 "ljava/lang/class ;"
4. Loading of basic class library files
Set a breakpoint in the findclassnoinit function, run the program, and wait for the program to stop.
(GDB) B findclassnoinit
Breakpoint 2 at 0xf6fc13e0: file Dalvik/Vm/OO/class. C, line 1373.
(GDB) c
Continuing.
Let's see who is the first loaded class and its call relationship.
(GDB) BT
#0 findclassnoinit (descriptor = 0x0, loader = 0x0, pdvmdex = 0x0) at Dalvik/Vm/OO/class. C: 1373
#1 0xf6fc32a1 in dvmlinkclass (clazz = 0xf5837350, classesresolved = false)
At Dalvik/Vm/OO/class. C: 2491
#2 0xf6fc1b67 in findclassnoinit (descriptor = 0xf6ff1ded "ljava/lang/thread;", loader = 0x0,
Pdvmdex = 0xa04c720) at Dalvik/Vm/OO/class. C: 1489
#3 0xf6f92692 in dvmthreadobjstartup () at Dalvik/Vm/thread. C: 328
#4 0xf6f800e6 in dvmstartup (argc = 2, argv = 0xa041190, ignoreunrecognized = false, penv = 0xa0411a0)
At Dalvik/Vm/init. C: 1155
#5 0xf6f8b8e3 in jni_createjavavm (p_vm = 0xf6ff0df6, p_env = 0xf6ff0df6, vm_args = 0xfef4d0b0)
At Dalvik/Vm/JNI. C: 4198
#6 0x08048893 in main (argc = 3, argv = 0xfef4d168) at Dalvik/dalvikvm/Main. C: 212
The function call sequence is clearly visible: Main-> jni_createjavavm-> dvmstartup-> dvmthreadobjstartup-> dvmfindsystemclassnoinit-> findclassnoinit. If you observe it carefully, you may ask if you have seen the callback in the call stack, why are you writing this? I guess the compiler has optimized it as inline, so GDB cannot see the stack with dvmfindsystemclassnoinit. We can also clearly see from the rollback Stack
5. User-class file loading
The loading of user class files is rather tortuous. It loads a class first. Then this class is responsible for loading user class files. Of course, this class will call findclassnoinit through JNI. This is left for your own analysis. In fact, the landlord does not quite understand why it is so costly.