JVM-how to determine whether a piece of data is real data or object reference

Source: Internet
Author: User

To determine whether a piece of data is a data or a reference type, the JVM must first determine the method used. This option usually affects the implementation of GC.


I. Simplified

If the JVM chooses not to record any data of this type, it cannot distinguish whether the data at a location in the memory should be interpreted as a reference type, an integer, or something else. Under such conditions, the implemented GC will be"Conservative GC)".During GC, the JVM starts scanning memory from some known locations (such as the JVM stack, each time you see a number during the scan, you can see that it does not look like a pointer to the GC heap ".This will involve the upper and lower boundary check (the upper and lower boundary of the GC heap is known) and the alignment check (usually there will be alignment requirements when allocating space, if it is 4-byte alignment, the number that cannot be divisible by four is definitely not a pointer. Then recursively scan it out.

The advantage of structured GC is that it is easier to implement and can be conveniently used in programming languages that do not have special support for GC to provide automatic memory management. Boehm-Demers-Weiser GC is a typical representative of large-scale GC and can be embedded into programs written in C or C ++.

Little history story:
Microsoft's JScript and earlier versions of VBScript also use the Protocol-type GC, and Microsoft's JVM is also. VBScript was later replaced with reference count. The future of Microsoft JVM, that is, the CLR in. net, is switched to completely accurate GC.
In order to catch up with the announcement at a conference, Microsoft's initial JVM prototype only took about a month from the start of work to the java standard. Therefore, we had to use a simple method to implement it, so we naturally chose the protocol-type GC.
Information Source: Patrick dussud interview in Channel 9, about 23 minutes

The following are the disadvantages of GC:
1. Some objects should have been dead, but some suspected pointers point to them to escape GC collection. This is safe for program semantics, because all objects that should live will be active, but it is not a good thing for memory usage, there will always be unnecessary data that occupies the GC heap space. The specific implementation can be adjusted to reduce the proportion of such useless objects, which can alleviate (but cannot cure) the problem of large memory usage.

2. Since we do not know whether the suspected pointers are actually pointers, their values cannot be rewritten. Moving an object means correcting pointers. In other words, objects cannot be moved. There is one way to support object movement while using the implicit GC, that is, to add an indirect layer, instead of directly using pointers for reference, but adding a layer of "handle" (handle) in the middle, all references first refer to a handle table, and then find the actual object from the handle table. In this way, you only need to modify the content in the handle table to move the object. However, the access speed of the reference is reduced. Sun JDK's classic VM has used this full handle design, but the effect is really poor.
[That is, access to objects through a handle] l another method is to access the memory through a direct pointer.


To support a wide range of reflection functions, the JVM needs to enable the object to understand its own structure, and this information GC can also be used. Therefore, few JVMs use fully writable GC. Unless it is really lazy...

------------------------------------------
Ii. Semi-conservative
JVM can choose not to record type information on the stack (Same as traditional), And record the type information on the object. In this way, the process will be the same when scanning the stack, but the object in the GC heap is scanned because the object contains sufficient type information, JVM can determine where the data in the object is of the reference type. This is"Semi-complete GC", Also known as" conservative with respect to the roots )".

To support semi-structured GC, the runtime must contain enough metadata on the object. If it is JVM, the data may be calculated in the class loader or object model module, but it does not require special support from the JIT compiler.

As mentioned above, Boehm GC supports both completely conservative and semi-conservative methods. Gcj and mono both use Boehm GC in semi-conservative mode.

Early versions of Google Android's Dalvik VM are also examples of using semi-linear GC. However, by the middle of 2009, the internal version of Dalvik VM began to support accurate GC-the cost was that the size of the optimized Dex file expanded by about 9%.
In fact, many older JVMs choose this implementation method.

Since the semi-linear GC data in the heap is accurate, it can directly use pointers to implement the movement of some objects under the conditions of reference, the method is to set only the objects that can be directly scanned by conservative scanning to inmovable (pinned), and the objects that can be scanned from them can be moved.
Fully conservative GC usually uses algorithms that do not move objects, such as Mark-sweep. The semi-conservative GC method can use both mark-sweep and moving partial object algorithms, such as the Bartlett-style mostly-copying GC.

The support for JNI method calling by semi-synchronous GC is easier: whether it is a JNI method call or not, the stacks are swept over... It's done. No additional processing is required for the reference. Of course, there will be a "suspected Pointer" problem at the same price as the fully written method.

------------------------------------------
Iii. Accuracy


Compared with the conventional GC,Accurate GC", Which can be precise GC, exact GC, accurate GC, or type accurate GC. Foreigners are also quite troublesome. The word "accuracy" cannot be unified?
What is "accurate? The key is "type". That is to say, you must know the exact type of a piece of data at a given location so that you can reasonably interpret the meaning of the data; GC is concerned about "whether this piece of data is a pointer ".
To implement such GC, the JVM must be able to determine whether the data at all locations points to references in the GC heap, including the data in the active records (stacks + registers.

There are several methods:

1. Add tags to the data ). This method is not common in JVM, but is embodied in some other language implementations. I will not go into details. The marking method is more common in semi-Signed GC. For example, cruby is a semi-labeled GC. The CLDC-HI is interesting. Every slot on the stack is paired with a long-character tag to illustrate its type. In this way, the overhead of stack map is reduced; similar implementations have never been seen in other places.
2. Let the compiler generate special scan code for each method. I have not seen this in the JVM implementation, although I have seen it in other languages.
3. Store type information from an external record as a ing table. Currently, hotspot, jrockit, and j9 are three mainstream high-performance JVM implementations. Where,Hotspot calls this data structure oopmapJrockit is called livemap, and j9 is called GC map. Apache Harmony's drlvm also calls it gcmap.
To implement this function, the interpreter and JIT compiler in the virtual machine must have corresponding support to generate enough metadata for GC.
There are two ways to use such a ing table:
1. The original ing table is traversed each time, and the offset of each cycle is scanned in the past ";
2. Generate a custom scan code for each ing table (imagine that the cycle of the scan ing table is expanded). The generated scan code will be executed directly each time the ing table is used; this method is also called "compile ".

In the hot spot, the object type information records its own oopmap, which records the data type at what offset in the object of this type.. Therefore, external scanning from the object can be accurate. The data is calculated during the class loading process.

Each method compiled by JIT will alsoSome specific locationsRecord oopmap, which records the locations on the stack and registers that are referenced when a command is executed to this method. In this way, the GC will query these oopmaps when scanning the stack to know where the reference is. These specific locations are mainly in:
1. End of the loop
2. Before or after the call command of the method is returned
3. location where exceptions may be thrown
This location is calledSafety Point). The reason for selecting a specific location to record oopmap is that if oopmap is recorded for each instruction (location), these records will be relatively large, therefore, the space overhead is not worthwhile. Selecting some key points to record can effectively reduce the amount of data to be recorded, but it can still achieve the purpose of distinguishing references. In this way, GC can be entered at the safepoint instead of any location in the hotspot.

Only when the security point is reached will the program being executed be stopped for GC.

But for the security point, another question to be considered is how to make all threads (not including the threads called by JNI) run to the security point and pause when GC is required, the following describes the public connection method:

1. Preemptive interruption

No need to actively cooperate with the Code related to thread execution. During GC, all threads are interrupted first. If some threads do not reach the security point, the thread is restored, let him run to the security point. --- Currently, this preemption method is rarely used to respond to GC.

2. Active interruption

When the GC needs to interrupt the thread, it does not directly perform operations on the thread, but only sets a flag. Each thread actively polls the flag. When this flag is found to be true, this thread is suspended,
The methods still executed in the interpreter can automatically generate oopmap for GC using the functions in the interpreter.

These oopmap files are compressed and stored in the memory. They are extracted and used as needed during GC.
Hotspot uses the "Explanatory" method to use oopmap. Every time the items in the variable are cyclically scanned for the corresponding offset.

For the JNI methods in the Java thread, they are neither executed by the interpreter in the JVM nor generated by the jvm jit compiler, so the oopmap information is missing. So how can GC maintain accuracy when encountering such stack frames?
Hotspot's solution is: all references that pass through the JNI call boundary (call the parameters passed in the JNI method, return values returned from the JNI method) must be packaged with a "handle. When JNI needs to call Java APIs, it must also wrap pointers with handles. In this implementation, the "jobject" written in the JNI method does not directly point to the object pointer, but points to a handle first and can be indirectly accessed through the handle. In this way, you do not need to scan its stack frame when scanning the JNI method-you only need to scan the handle table to obtain all the objects in the GC heap that can be accessed by the JNI method.
However, this means that calling the JNI method has the overhead of packaging/unpacking the handle, which is one of the reasons for the slow call of the JNI method.

JVM-how to determine whether a piece of data is real data or object reference

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.