Java--the symbol reference and direct reference of the JVM
https://www.zhihu.com/question/50258991
During the class loading process in the JVM, during the parsing phase, the Java virtual Opportunity replaces the symbolic reference in class two-level data with a direct reference.
1. Symbol reference (symbolic References):
A symbol reference describes a target in a set of symbols, which can be any form of literal, as long as it can be used without ambiguity in the target. For example, in a Class file it appears as constants of type Constant_class_info, Constant_fieldref_info, Constant_methodref_info, and so on. The symbolic reference is independent of the memory layout of the virtual machine, and the referenced target does not necessarily load into memory. In Java, a Java class will be compiled into a class file. At compile time, the Java class does not know the actual address of the referenced class, so it can only be substituted with symbolic references. For example, the Org.simple.People class refers to the Org.simple.Language class, and at compile time the people class does not know the actual memory address of the Language class, so only the symbolic org.simple.Language is used (assuming this is, of course, actually by a C-like Onstant_class_info represented) to represent the address of the language class. The memory layouts implemented by various virtual machines may be different, but they can accept a consistent symbol reference because the literal form of the symbol reference is clearly defined in the Java Virtual Machine specification's class file format.
2. Direct reference: A direct reference can be (1) A pointer to a target directly (for example, to a "type" "Class object", a class variable, a direct reference to a class method may be a pointer to a method area) (2) A relative offset (for example, pointing to an instance variable, The direct reference to the instance method is offset) (3) a direct reference to a handle that can be indirectly anchored to the target is related to the layout of the virtual machine, and the same symbolic reference is generally not the same as the direct reference that is translated on the different virtual machine instances. If there is a direct reference, the referenced target must already be loaded into memory. Rednaxelafx's explanation:
Rednaxelafx
Links: https://www.zhihu.com/question/50258991/answer/120450561
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
I understand how symbolic references are converted to direct references when calling a function, but for class variables, the parsing method for instance variables is unclear.
Symbolic references contain only semantic information and do not involve concrete implementations, whereas direct references after parsing (resolve) are closely related to specific implementations. So when it comes to how a symbolic reference is resolve into a direct reference, it must be discussed in conjunction with a specific implementation.
After reviewing the data many people say an offset, what is the offset from?
The "offset from what" is just part of the "Implementation Details" described above.
For example, the object model used by the HotSpot VM has undergone a change between JDK 6/7.
In terms of object instances, the object model used by the HotSpot VM is a more intuitive one: Java references are implemented by direct pointers (directly pointer) or by compressed pointers (compressed pointer), which are semantically direct pointers. The pointer points to the real starting position of the object (no data is placed on the negative offset).
The layout inside the object is: The first is the object header, there are two VM internal fields: _mark and _klass. It is followed by all instance fields of the object, in a compact arrangement, and the more shallow the inheritance is, the more the fields declared by the class that inherit deeper, the more the fields declared by the class with the deeper inheritance. The fields declared in the same class are reordered by the type width of the field, and the default ordering for the normal Java class is: Long/double-8 Byte, int/float-4 byte, short/char-2 byte, byte/boolean-1 byte, Finally, the Reference type field (4 or 8 bytes). Each field is aligned by its width, and the final object defaults to a 8-byte alignment. If there are gaps in the bounds of the class inheritance, you can pull the field of the class into the void. This arrangement allows the original type fields to be arranged in the most compact form, reducing the gap between fields due to alignment, and also allowing reference type fields to be arranged as closely as possible, reducing the overhead of oopmap.
About the memory layout of an object instance, as I explained in a speech before, please refer to:http://www. Valleytalk.org/wp-content/uploads/2011/05/java_program_in_action_20110727.pdf, start on page 112th.
For example, for the following Class C,
class A {
boolean b;
Object o1;
}
class B extends A {
int i;
long l;
Object o2;
float f;
}
class C extends B {
boolean b;
}
Its instance object layout is: (assuming a 64-bit hotspot VM, with the compression pointer turned on)
--> +0 [ _mark ] (64-bit header word)
+8 [ _klass ] (32-bit header word, compressed klass pointer)
+12 [ A.b ] (boolean, 1 byte)
+13 [ (padding) ] (padding for alignment, 3 bytes)
+16 [ A.o1 ] (reference, compressed pointer, 4 bytes)
+20 [ B.i ] (int, 4 bytes)
+24 [ B.l ] (long, 8 bytes)
+32 [ B.f ] (float, 4 bytes)
+36 [ B.o2 ] (reference, compressed pointer, 4 bytes)
+40 [ C.b ] (boolean, 1 byte)
+41 [ (padding) ] (padding for object alignment, 7 bytes)
So the object instance size of Class C, which is 48 bytes in this setting, where 10 bytes is the padding,12 byte that is wasted for alignment, is the object header, and the remaining 26 bytes are the instance fields of the user's own code declaration.
Notice that the layout of the fields in Class C is in this order: Object header-The field (none) of the declared field-a declared field-the fields declared by C-are arranged by the depth of inheritance from shallow to deep. The order of the fields in each class is reordered according to the previous rule, by width. At the same time, if there are gaps in the class inheritance boundary (for example, there would have been a 4-byte gap between A and b here, but B would have just declared a number of fields not wider than 4 bytes, it would have been possible to pull the first field not wider than 4 bytes into that void, that is, the position of B.I).
Also note that both Class A and Class C declare a field with the name B. What is the relationship between them? --it doesn't matter.
In Java, fields are not involved in polymorphism. If a derived class declares a field with the same name as the base class, two fields will exist in the final instance, and the version of the derived class will only obscure (shadow/hide) the name of the base class field and not merge with or make it disappear from the base class field. The example above illustrates the same situation that a.b and c.b exist at the same time.
You can easily see the same information using the Jol tool:
$ sudo ~/sdk/jdk1.8.0/Contents/Home/bin/java -Xbootclasspath/a:. -jar ~/Downloads/jol-cli-0.5-full.jar internals C
objc[78030]: Class JavaLaunchHelper is implemented in both /Users/krismo/sdk/jdk1.8.0/Contents/Home/bin/java and /Users/krismo/sdk/jdk1.8.0/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined.
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
C object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 09 00 00 00 (00001001 00000000 00000000 00000000) (9)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) be 3b 01 f8 (10111110 00111011 00000001 11111000) (-134136898)
12 1 boolean A.b false
13 3 (alignment/padding gap) N/A
16 4 Object A.o1 null
20 4 int B.i 0
24 8 long B.l 0
32 4 float B.f 0.0
36 4 Object B.o2 null
40 1 boolean C.b false
41 7 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 3 bytes internal + 7 bytes external = 10 bytes total
So, for one such object model, the "offset" of the instance field is calculated from the start of the object. For such a byte code:
getfield cp#12 // C.b:Z
(Here the meaning of the 12th item of the constant pool is represented by cp#12)
This c.b:z symbol reference will eventually be parsed (resolve) to +40 such offsets, plus some of the VM's own metadata.
This offset, plus the extra metadata, is wider than the original constant pool index and cannot be placed in the original constant pool, so the hotspot VM has another thing called the constant pool cache to store them.
In a hotspot VM, the above byte code is parsed and becomes:
fast_bgetfield cpc#5 // (offset: +40, type: boolean, ...)
(This is the meaning of the 5th item of the constant pool cache with cpc#5)
Then the parsed offset information is recorded in the constant pool cache, GetField according to the parsed constant pool cache entry recorded type information is rewritten to the corresponding version of the byte code fast_ Bgetfield to avoid parsing every time, and then Fast_bgetfield can access the field with the correct type based on the offset information.
Then talk about static variables (or some people like to call "class variables").
For a hotspot VM from JDK 1.3 to JDK 6, the static variable is saved at the end of the class's metadata (Instanceklass). For hotspot VMs starting with JDK 7, the static variables are saved at the end of the Java image (Java.lang.Class instance) of the class.
In a hotspot VM, the relationship between the object, the metadata for the class (Instanceklass), and the Java image of the class is this:
Java object InstanceKlass Java mirror
[ _mark ] (java.lang.Class instance)
[ _klass ] --> [ ... ] <-\
[ fields ] [ _java_mirror ] --+> [ _mark ]
[ ... ] | [ _klass ]
| [ fields ]
\ [ klass ]
In the object header of each Java object, the _klass field points to a Instanceklass object within the VM used to record the metadata for the class, and Insanceklass has a _java_mirror field that points to the Java image for that class-- Java.lang.Class instance. The HotSpot VM injects a hidden field "Klass" to the class object, which is used to refer back to its corresponding Instanceklass object. This way, there is a two-way reference between Klass and mirror, which can be navigated back and forth.
In this model, the Java.lang.Class instance is not responsible for documenting the real class metadata, but just a wrapper for the Instanceklass object inside the VM for the reflection access of Java.
In JDK 6 and prior to the Hotspot VM, the static field is attached to the end of the Instanceklass object, whereas in the Hotspot VM starting with JDK 7, the static field is attached to the end of the Java.lang.Class object.
If there is such a class:
class A { static int value = 1; }
Then in JDK 6 or before the hotspot VM:
Java object InstanceKlass Java mirror
[ _mark ] (java.lang.Class instance)
[ _klass ] --> [ ... ] <-\
[ fields ] [ _java_mirror ] --+> [ _mark ]
[ ... ] | [ _klass ]
[ A.value ] | [ fields ]
\ [ klass ]
You can see that this a.value static field is stored at the end of the Instanceklass object.
In this case, I drew a better picture on page 121th of the speech I mentioned earlier.
In the hotspot VM in JDK 7 or later:
Java object InstanceKlass Java mirror
[ _mark ] (java.lang.Class instance)
[ _klass ] --> [ ... ] <-\
[ fields ] [ _java_mirror ] --+> [ _mark ]
[ ... ] | [ _klass ]
| [ fields ]
\ [ klass ]
[ A.value ]
You can see that this a.value static field is stored at the end of the Java.lang.Class object.
So for the object model of the hotspot VM, the "offset" of the static field is:
- JDK 6 or earlier: Offset from the starting position of the instanceklass (actually wrapped instanceklass klassoopdesc) object corresponding to the class
- JDK 7 or later: The offset from the starting position of the Java.lang.Class object corresponding to the class.
The other details are similar to the instance fields and are not mentioned.
===========================================
Curious students may be concerned about the above mentioned in the hotspot VM Instanceklass and Java.lang.Class instances are put where?
In JDK 7 or prior to the hotspot VM, Instanceklass is packaged in a GC-managed Klassoopdesc object, stored in the GC heap in the so-called Permanent Generation (abbreviated PermGen).
The hotspot VM starting with JDK 8 completely removes the PermGen and stores the metadata in native memory instead. The new memory space for storing metadata is called the Metaspace,instanceklass object, which exists here.
As for the Java.lang.Class object, they are always "normal" Java objects, with the common Java heap (part of the GC heap) that exists in the same way as other Java objects.
===========================================
What if it's not a hotspot VM, but another JVM?
--what possibilities are there. In short, "offset" what is the whole look at a specific JVM implementation of the internal details of what is.
For example, a JVM can put all the static fields of all classes in a large array, and each new load class allocates a space from the array to put the static fields of the class. At this point the static field "offset" may be directly the address of the static field (assuming that the array holding them does not move), or it may be based on the offset of the starting address of the array.
Another example is that when a JVM implements the object model, it may make the pointer not point to the object's true beginning, but instead point to a location in the middle of the object. For example, a hotspot VM-like object layout, the pointer can be selected to point to a number of places are reasonable: (below or assume 64-bit hotspot VM, open compression pointer)
- Point to the beginning of the object: _mark is located at +0, which is the practice of Hotspot VM selection;
- Point to the second field of the object header: _klass is located at +0,_mark at-8. This approach may speed up the vtable dispatch by _klass in some architectures, so it is also reasonable;
- Point to the beginning of the actual field: _mark is located at -12,_klass at-4, the first of the fields is at +0. The main thing is that field access can be more frequent, potentially sacrificing the speed of a bit of object header access.
The object model of the Maxine VM can be selected between the Ohm model and the HOM model. The so-called Ohm is origin-header-mixed, that is, the pointer to the first field of the object's practice, so-called Hom is header-origin-mixed, that is, the pointer to the object after the head (that is, the first field) practice.
There are more interesting ways to lay out objects: bidirectional layouts (bidirectional layout), such as layouts used by Sable VMs. A typical scenario is to put the reference type field all on a negative offset, the pointer to the object header, and then all the original type fields on the positive offset. The benefit is that the GC only needs to scan a contiguous piece of memory when scanning the reference type field of an object, which is very convenient.
For more examples of object layouts, skip the Portal: Why is the address of the BS virtual function table (int*) (&BS) not the same as the virtual function address (int*) * (int*) (&BS)? -Rednaxelafx's answer
For example, to cite an extreme example, the preceding discussion is based on the assumption that all data in an object instance is allocated to a contiguous memory. But obviously this is not the only way to achieve it.
An extreme approach is that objects are implemented with linked lists, where each node on the list holds the value of one field and the chain that points to the next field. Just like this:
typedef union java_value_tag {
int32_t int_val;
int64_t long_val;
/* ... */
object_slot* ref_val;
} java_value;
typedef struct object_slot_tag {
java_value val;
struct object_slot_tag* next;
} object_slot;
Then if a class has 3 fields, then the instance of this class is composed of 4 such object_slot nodes: The third field, the second field, first field, object header, and NULL.
Who's going to do this (the table!)
Actually, there really is. Some interesting implementations, in order to simplify the implementation of GC heap, to reduce the accumulation of external debris, but can be implemented as a GC heap Object_slot large Array. This is because the data in each unit is necessarily the same size, so the external fragments can be effectively eliminated-the cost is to artificially break the data continuity of an object, adding internal fragmentation.
Of course, the implementation of this choice is very very little, so we have not seen how it is normal ... >_<
Java Virtual machines-symbolic references and direct reference understanding