Maybe you've written countless lines of code, maybe you can use high-level languages very well, but you don't necessarily know the execution of those high-level languages. For example, Java, which has a large line of its way.
Java claims to be a "compile-and-run" language, but how deep is our understanding of this sentence? From the Java file we write to the Java bytecode file (i.e.. class file) compiled by the compiler, this process is the Java compilation process, and our Java Virtual machine executes the bytecode file. Regardless of where the bytecode file comes from, which compiler compiles it, or even a handwritten bytecode file, it can execute the bytecode file as long as it conforms to the Java Virtual Machine specification. So this article mainly talk about Java bytecode file related knowledge. Next we have a detailed demo to understand:
1 First, let's write a Java source file.
Javasrc.png
Above is a Java program we write, very simple, only a member variable A and a method TestMethod ().
2 Next we compile the Java source file into a Java bytecode file using the Javac command or the IDE tool.
Demo.png
is a compiled bytecode file, we can see a bunch of 16 bytes in the binary. If you use the IDE to open it, you may see the Java code that we are familiar with, and this is the pure bytecode, which is what we need to talk about today.
Maybe you'll get a headache with such a bunch of bytecode, but it doesn't matter, we slowly try to understand it, maybe a different harvest. Before we get started, let's take a look at the picture.
Java_byte.jpeg
This graph is an overview of the Java bytecode, and we interpret the bytecode in the order above. There are 10 parts, including magic number, version number, constant pool, and so on, then we follow the sequence of the step-by-step interpretation.
3.1 Magic number
From the overview map above we know that the first 4 bytes represent the magic number, corresponding to our demo is 0XCAFE BABE. What is a magic number? A magic number is a symbol used to differentiate between file types, typically expressed in the first few bytes of a file. For example, 0XCAFE babe is a class file, then someone will ask, the file type can be determined by the filename suffix ah? Yes, but the file name can be modified (including the suffix), then in order to ensure the security of the file, the files type is written inside the file to ensure that it is not tampered with.
Bytecode file types from Java we see that cafe babe translates to the meaning of the coffee baby, and then look at the Java icon.
Java_icon.png
Cafe BABE = coffee.
3.2 Version number
After we have identified the file type, we need to know the version number next. The version number contains both the major and minor version numbers, each accounting for 2 bytes. This demo species is 0x0000 0033. The previous 0000 is the minor version number, and the subsequent 0033 is the major version number. The minor version number is 0 and the major version number is 51, which is obtained by the binary conversion.
From the official Oracle website we are able to know that 51 corresponds to the official jdk1.7, and the second version is 0, so the file version is 1.7.0. If validation is required, you can either output the version number with the Java–version command, or modify the build target version--target recompile to see if the compiled bytecode file version number has been modified accordingly.
At this point, we have a total understanding of the meaning of the first 8 bytes, the following talk about the constant pool related content.
3.3 Constant Pool
Immediately following the major version number is the constant pool entry. Chang is a repository in the class file, and in the following we will find a lot of places to be involved, such as class name,interfaces. There are 2 major types of constants in a constant pool: literal and symbolic references. Literals such as literal strings, constant values declared as final in Java, and so on, and symbolic references such as the globally qualified names of classes and interfaces, the names and descriptors of fields, the names and descriptors of methods.
Why do you need globally qualified names for classes and interfaces? Does the system refer to a class or interface when it is not operated by a memory address? Here you think about it, the Java virtual machine does not load the class into memory at all when no memory address, there is no memory operation, so the Java virtual machine first need to load the class into the virtual machine, then this process is designed to locate the class (need to load the B class under a package, cannot be loaded into another class under another package), so you need to identify uniqueness by globally qualified names. That's why it's called global, limited meaning, or uniqueness.
Before you perform a specific constant pool analysis, let's look at the project type table for the constant pool:
Jvm_constant.png
The above table describes the structure of the data type in 11, in fact, after jdk1.7 added 3 more (Constant_methodhandle_info,constant_methodtype_info and Constant_ Invokedynamic_info). There are 14 of them. Next we translate each of the demo's bytecode.
0X0015: Because the number of constant pools is not fixed (n+2), it is necessary to place an U2 type of data at the entrance of the constant pool to represent the number of constant pools. So the 16 binary is 21, which means there are 20 constants and the index range is 1~20. It is 21, why is it 20? Because the class file format stipulates, the designer will say the No. 0 item is reserved, for the future. From here we know that next we need to translate 20 constants.
Constant #1 (a total of 20 constants, this is the first one, and so on ...) )
0x0a-: From the constant type table, we find that the first data are U1 type tag,16 0a is the decimal 10, corresponding to the methodref_info in the table.
0x-00 04-:class_info Index Item # #
0x-00 11-:nameandtype Index Entry #17
Constant #2
0x-09:fieldref_info
0x0003:class_info Index Entry # # #
0x0012:nameandtype Index Entry #18
Constant #3
0x07-: Class_info
0x-00 13-: Globally qualified name constant index is #19
Constant #4
0x-07:class_info
0X0014: Global qualified name constant index is #20
Constant #5
0x01:utf-8_info
0x-00 01-: String length is 1 (select Next byte-length escape)
0x-61: "A" (hexadecimal to ASCII character)
Constant #6
0x01:utf-8_info
0x-00 01: String length is 1
0x-49: "I"
Constant #7
0x01:utf-8_info
0x-00 06: String length is 6
0x-3c 696e 6974 3e-: "<init>"
Constant #8
0x01:utf-8_info
0X0003: String length is 3
0x2829: "() V"
Constant #9
0x-01:utf-8_info
0x0004: String length is 4
0x436f 6465: "Code"
Constant #10
0x01:utf-8_info
0x00 0f: string length is 15
0x4c 696e 654e 756d 6265 7254 6162 6c65: "Linenumbertable"
Constant #11
Ox01:utf-8_info
0x00 12 string length is 18
0x-4c 6f63 616c 5661 7269 6162 6c65 5461 626c: "Localvariabletable"
Constant #12
0x01:utf-8_info
0x0004 string length is 4
0x7468 6973: "This"
Constant #13
0x01:utf-8_info
0x0f: string length is 15
0x4c 636f 6d2f 6465 6d6f 2f44 656d 6f3b: "Lcom/demo/demo;"
Constant #14
0x01:utf-8_info
0x00 0a: string length is 10
ox74 6573 744d 6574 686f: "TestMethod"
Constant #15
0x01:utf-8_info
0X000A: string length is 10
0x536f 7572 6365 4669 6c65: "SourceFile"
Constant #16
0x01:utf-8_info
0x0009: string length is 9
0x-44 656d 6f2e 6a61 7661: "Demo.java"
Constant #17
0x0c:nameandtype_info
0x0007: field or name constant Entry Index # #
0x0008: field or method description constant Index # #
Constant #18
0x0c:nameandtype_info
0x0005: field or name constant Entry Index # #
0x0006: field or method description constant index number
Constant #19
0x01:utf-8_info
0x00 0d: string length is 13
0x63 6f6d 2f64 656d 6f2f 4465 6d6f: "Com/demo/demo"
Constant #20
0x01:utf-8_info
0x00 10: string length is 16
0x6a 6176 612f 6c61 6e67 2f4f 626a 6563: "Java/lang/object"
So far we have parsed all the constants. Next is the parse access flag bit.
3.4 Access_flag Access Flag
The access flag information includes whether the class file is classes or interfaces, if it is defined as public, whether it is abstract, and if it is a class, whether it is declared final. From the source code above, we know that the file is a class and is public.
Access_flag.png
0x 00 21: Is the 0x0020 and 0x0001. Where 0x0020 this flag value refers to the bytecode instruction, later there will be a topic on the bytecode instruction to explain. Looking forward to ...
Class 3.5 Index
Class indexes are used to determine the fully qualified name of a class
0x00 03 is a reference to the 3rd constant, and the 3rd constant refers to the 19th constant, looking for "Com/demo/demo". #3. #19
3.6 Parent Class Index
0x00 04 similarly: #4. #20 (Java/lang/object)
3.7 Interface Index
Through the Java_byte.jpeg diagram we know that this interface has 2+n bytes, the first two bytes represent the number of interfaces, followed by the interface table. We don't have any interfaces for this class, so it should be 0000. Sure enough, finding the bytecode file gets 0000.
3.8 Field table Collection
The field table is used to describe variables declared in classes and interfaces. The fields here contain class-level variables and instance variables, but do not include local variables declared inside the method.
Again, next is the 2+n field property. We have only one attribute a, which is supposed to be 0001. Finding the file is 0001.
So then we're going to parse this field. Attached field table structure diagram
Field table structure. png
0x00 02: Access flag is private (self-search field access flag)
0x00 05: Field name index is # #, corresponding to "a"
0x 00 06: Descriptor Index for the "I"
0x 00 00: The number of attribute tables is 0, so there is no attribute table.
Tips: Some of the less important tables (fields, methods to access the sign table) can be self-search, here is not posted out, to prevent too much space.
3.9 Methods
We only have one method TestMethod, according to the reason should be the first 2 bytes is 0001. Found by find is 0x00 02. What is the reason for this, which means there are 2 ways of doing it? and continue to see ...
Method table structure. png
is a method table structure diagram, according to this diagram we analyze the following bytecode:
1th Method:
0x00 01: Access Flag Acc_public, indicating that the method is public. (You can access the flag table by your own search method)
0x00 07: Method Name index is # #, corresponding to "<init>"
0x00 08: The method descriptor index is # #, corresponding to "() V"
0x00 01: The number of attribute tables is 1 (one attribute table)
Then the attribute table is involved. What is a property sheet? It can be understood that it is to describe some proprietary information, the above method with a property sheet. The structure of all property sheets is as follows:
An U2 property name index, an U2 property length plus an info for the property length.
There are a number of predefined properties for the virtual machine specification, such as Code,linenumbertable,localvariabletable,sourcefile and so on, which can be found online.
property sheet structure. png
Follow the table structure above to get the following information:
0x0009: The name index is # # ("Code").
0x000 00038: The property length is 56 bytes.
Then parse a code attribute table and follow the parsing
Code.png
The first 6 bytes (name index 2 bytes + property length 4 bytes) have been parsed, so the next step is to parse the remaining 56-6=50 bytes.
0x00 02:max_stack=2
0x00 01:max_locals=1
0x00 0000 0a:code_length=10
0x2a b700 012a 04b5 0002 B1: This is the code that can be found through the virtual machine bytecode directive.
2a=aload_0 (pushes the first reference variable to the top of the stack)
B7=invokespecial (Call Parent class constructor method)
00= don't do anything.
01 = null is pushed to the top of the stack
2a= Ibid.
04=iconst_1 the Int type 1 to the top of the stack
B5=putfield Assigning a value to an instance variable of a specified class
00= Ibid.
02=ICONST_M1 int Type-1 push stack top
B1=return returning void from the current method
Finishing, removing no-action instructions to get below
0:aload_0
1:invokespecial
4:aload_0
5:iconst_1
6:putfield
9:return
About Virtual machine bytecode directive this piece of content, later will continue to go deeper ... Now you just need to know. Next, follow the Code property sheet to continue parsing:
0x00 00:exception_table_length=0
0x00 02:attributes_count=2 (also contains 2 property sheets inside the Code property sheet)
0x00 0a: The first attribute table is "linenumbertable"
Linenumbertable.png
0x00 0000 0a: "Property length is 10″
0x00 02:line_number_table_length=2
Line_number_table is a set of Line_number_table_length, type Line_number_info, Line_number_info table includes start_pc and Line_ Number two a data item of type U2, which is a byte code line, which is a Java source line number
0x00 00:start_pc =0
0x00 03:end_pc =3
0x00 04:start_pc=4
0x00 04:end_pc=4
0x00 0b The second property sheet is: "Localvariabletable"
Local_variable_table.png
Local_variable_info.png
0x00 0000 0c: Property length is 12
0x00 01:local_variable_table_length=1
Then follow the Local_variable_info table structure to parse:
0x00 00:start_pc=0
0x00 0a:length=10
0x000c:name_index= "This"
0x000d:descriptor_index #13 ("Lcom/demo/demo")
0000 index=0
——-here, the first method is resolved ——-//
Method (<init>) – 1 Properties Code table-2 attribute tables (linenumbertable, localvariabletable) Next parse the second method
2nd method:
0x00: "Protected"
0x00 0e: #14 ("TestMethod")
0x00: "() V"
0X0001: Number of attributes =1
0x0009: "Code"
0x0000 002b Property Length is 43
Parse a Code table
0000:max_stack =0
0001:max_local =1
0000 0001:code_length =1
0xb1:return (This method returns void)
0x0000 Exception Table Length =0
0x0002 property sheet length is 2
First property sheet
0x000a: #10, linenumbertable
0x0000 0006: Property length is 6
0x0001:line_number_length = 1
0X0000:START_PC =0
0X0008:END_PC =8
Second property sheet
0x000b: #11, localvariabletable
0x0000 000c: Property length is 12
0x0001:local_variable_table_length =1
0X0000:START_PC = 0
0x0001:length = 1
0x000c:name_index = #12 "This"
0x000d: Description index #13 "LCOM/DEMO/DEMO;"
0000 index=0
So far, the method parsing has been completed, looking back at the top parsing sequence diagram, we will then parse attributes.
3.10 Attribute
0X0001: Similarly, there are 1 attributes.
0X000F: #15 ("SourceFile")
0x0000 0002 attribute_length=2
0x0010:sourcefile_index = #16 ("Demo.java")
The SourceFile property is used to record the name of the source file that generated the class file.
Source_file.jpeg
4 other words
In fact, we write so much is really troublesome, but this process to experience the results of the obtained is different. Now, use the Java-built anti-compiler to parse the bytecode file.
Javap-verbose Demo//without suffix. class
Javap_result.png
5 Summary
So far, the interpretation of the class file is completed so that we can read the bytecode file later. Understanding the structure of a class file is important to further understand the virtual machine execution engine, so this is a fundamental and important step.
The article makes you understand Java bytecode