Dynamic code evolution for Java dcevm Principle

Source: Internet
Author: User
Tags netbeans

In hostswap dcevm, we have a simple understanding of dynamic code evolution VM.ArticleThe implementation principle of dynamic code evolution VM is introduced.

There are two concepts that need to be different:

    • Dynamic code evolution (hereinafter referred to as DCE): generally refers to Java modification at runtimeProgramSuch as AOP.
    • Dynamic code evolution VM (hereinafter referred to as dce vm) is a patch in Java hotspot. The modified VM supports unlimited RuntimeCodeModifications (DCE) can be considered as a dce vm implementation method.

The level of English is limited, so do not disturb the readers. (in some cases, I feel that the English is poor.) If the English is OK, try to read the original article directly.

Overview:

DCE is a Java technology used to modify a program (class) at runtime (a bit of Dynamic Language ). In Java (object-oriented language), you can use the new version of class to replace a group of classes. By modifying the Java hostspot virtual machine, the dce vm allows any modifications to the loaded class: adding or deleting variables, methods, and interface inheritance relationships, which effectively improves the development efficiency. The Evaluation Section shows that the modification to the virtual machine has no performance impact. fast in-place instant updateAlgorithmThis ensures that the performance is equivalent to full GC. Dce vms can be used in standard development environments without additional tools or dependencies.

1. Introduction

By modifying the JVM, the dce vm adds a layer to the execution program and hardware (the mechanical code after code compilation, to implement dynamic code modification during runtime (DCE ). DCE is limited in existing product-level VMS, but it is very useful in some fields. We will discuss four main applications of DCE:

    • Debugging

      Developers frequently modify the program. After the program is modified and compiled, the developer does not need to restart the program (it takes a lot of time to restart each time). Just continue to run the program.
      No other work is required for any modification at any time.

    • Server Applications

      Key services can be upgraded through DCE without stopping services, depending on the security and correctness of the upgrade. We believe this can be done. We need to consider the code changes when designing the application and restrict updates to some predefined points. The service speed will not decrease before or after code upgrade.

    • Dynamic ages

      In dynamic languages, DCE is the most common feature. However, to run dynamic languages on static virtual machines, you need to do a lot of work [See for example 33]. Using DCE as a VM mechanism simplifies the Implementation of Dynamic languages. The requirement here is that small incremental changes, e.g., adding a eld or method, can be carried out fast.

    • Dynamic AOP

      DCE is also a feature related to Aspect-oriented programming (AOP. Some dynamic AOP tools (eg: gluonj) are running on Java hotspot VM. These tools enable the modified code to take effect immediately. (This part will be introduced in later classworking articles .)

Dce vm focuses on improving developers' efficiency. just as developers can set breakpoints at any location, the dce vm can complete modifications at any location of the program, and at the same time, it can require all threads to stop at the next safe point, causes the Java program to be suspended. these points are often used to pause all threads before GC execution. In other words, the dce vm uses a GC-like method to suspend the application and replace the new class. Java VM ensures that all threads reach next safepoint within a limited time range during the program running. When the VM is paused, code replacement is executed.

Currently, Java hostspot VM has a strong demand for code evolution support (only modification of method bodies is allowed). This demand is one of the five major improvements of JVM. In addition, the improved development efficiency brought about by rapid code modification is one of the advantages of dynamic languages compared with static languages (such as Java. When the DCE function of Java Virtual Machine is enabledProgramming LanguageIt brings this advantage.

The main contributions of this paper are:

    • Dce vm is modified on the product-level VM to support DCE.
    • The dce vm is committed to any modification to the Code, including modification to the subclass relationship. However, it does not produce any indirect problems, nor does it cause performance loss.
    • Dce vm allows different versions of code to coexist. Therefore, code Updates can occur at any time.
    • Dce vm can be used in any IDE that complies with the standard Java debug wire protocol (jdwp. For example, netbeans or eclipse

2. Levels of code Evolution

DCE is divided into multiple layers. In terms of implementation complexity and Java semantics, this article attempts to divide DCE into four levels ,:

 

Swapping method bodies:Replacing the bytecode of the Java method body may be the simplest modification. the implementation of the method does not depend on data such as other bytecode or type information, so the modification of the method body can be completed independently. java hostspot VM supports this type of modification.

Adding or removing methods:When modifying the class method set, the virtual method table (used for Dynamic assignment) needs to be modified synchronously. At the same time, modifying the class may cause the virtual method table of the subclass to be modified simultaneously (for details, see section3.2 ). The method index in the virtual method table may change and cause machine code (including fixed encoding) to become unavailable (see section3.6 ). The machine code may also contain static links pointing to existing methods, which may be unavailable.

Adding or removing fields:At this level, the previous modification only affected the metadata of the VM. now the object instance needs to be modified according to the class or parent class. VM needs to convert an object from an old version to a new version (this version has different fields and different sizes ), we use the modified mark-and-compact GC to change the object layout (see section 3.5 ). similar to the virtual method table, field offsets are used in multiple ways in the Parser (Interpreter) and compiled machine code. these areas need to be correctly adjusted or made unavailable.

Adding or removing supertypes:Modifying the parent class set of a class in Java is the most complex situation in DCE. Modifying supertypes means modifying the methods and attributes of the class. At the same time, the metadata of the class needs to be modified to reflect the new supertype relationship.

The VM uses two operations to modify the method signature, field type, and name: Add a member and delete another member. The modification of the interface (initerfaces) can be considered as a modification of the class (classes. adding or deleting an interface method affects subinterfaces and interface tables of the class, but does not affect the instance. The modification of superinterfaces is similar.

Another situation of modifying Java class is the modification of static fields or static method set. This modification does not affect subclass or instance, but may make the current Code unavailable. For example, the class has static filed offsets. At the same time, code evolution algorighm needs to determine how to initialize static fields: either run the static initializer of the new class (class initialization method ), either copy the static fields from the old version of lcass to the new version of class.

Java program modifications can also be categorized based on whether the program is Binary compatible in different versions: compatible (light gray in Figure Figure1) and incompatible (dark gray in Figure Figure1 ). for binary compatibility modification, the validity of the old Code is not affected. in the new class, we define the bytecode of the deleted or replaced method as different methods, called the Old Code. When an update occurs at any point (for execution with the VM), the Java thread may still execute the update in the method. In this case, the old code can be executed after the code is modified.

Modify the may break old Code if the binary is incompatible. The correct semantics in the old Code is unavailable in the new class. this is not clear in Java and Java Virtual Machine descriptions. considering this situation, we will introduce binary incompatible modifications for the following classification:

Removing fields or methods:The binary code of the deleted or replaced method may still be referenced by an ongoing program, but these methods do not exist in the new class. in the execution of the old Code (since there is a reference, it may be executed), the VM may access the old code, it also needs to determine how to handle the call to the deleted method or access the Deleted fields.

Removing suppertypes:When the class's parent class is removed (when narrwing the type of a class), it violates an important unchanging rule of the running Java program: static or non-static variables do not maintain the subclass relationship. At the same time, the caller and the called are no longer compatible.

Section4 describes how to handle binary incompatibility.

3. Implementation 

The dce vm implementation (Implementation) is a modification to the Java hostspot VM. The wide pot VM is a high-performance Vm with a built-in parser and two real-time compilers (client compiler and server compiler ). Based on the existing rules that allow you to modify the method body, dce vm extends to support any modification to the loaded type. We focus on implementing code evolution on the existing Vm, and make as few but necessary modifications to the VM, including the Garbage Collector, system dictionary, and number of classes. In particular, we will not make any changes to the parser and real-time compiler, and will not affect the VM.

Figure figure2 is an overview of VM modifications. Code evolution is triggered by the Java debug wire protocol (jdwp) command. first, the algorithm collects all the affected classes and sorts them according to the subclass relationship. Then, new classes is loaded and added to the VM and partitioned with old classes. The modified full garbage collection version is replaced, that is, code evolution. After the code status is processed, the VM continues to execute the program.

3.1 class redefinition command

We use the jdwp type redefinition command to trigger DCE. jdwp is the standard interface between Vm and Java debugger. therefore, the modified VM can be debugged through the jdwp protocol immediately. that is to say, we can use common debugging tools that use the jdwp protocol, such as the debugging functions of netbeans or eclipse, to debug code and trigger the redefinition of classes.

All the redefinition classes required by the class redefinition command have been loaded by VM. If a class has not been loaded, class redefinition is not required, the new class version can be directly loaded as the initial version. each class has a number for unique identification and an array of class bytecode. the modified VM fully implements class redefinition commands Based on the jdwp specification and does not require additional information for code evolution.

The first few steps of class redefinition (the next three steps will be introduced) can be executed in parallel with the program running. all Java threads Must be paused only when the following GC occurs. we use the same safety point rule as GC to suspend all active threads.

It can be understood that redefination is a GC-like process, but redifination only processes the reference relationship of the object.

3.2 finding affected types

When modifying a class is not just a method body, it indirectly affects its related classes, such as subclasses. Add a field to the class to add the field to all the child classes. Adding a method may change the virtual method table of the subclass.

Therefore, the algorithm needs to expand the set of redifined classes and add the affected subclass to it. In figure3, a column is given.

Classa and C are redefined. B is a subclass of A. Therefore, Class B needs to be added to the set of the redefinition class and changed to B ', B 'is the same as B, but it may have different attributes due to a's modification. Therefore, the metadata of B (including the virtual method table) needs to be re-initialized to re-load B.

The same rule applies to interface redefinition.

3.3 ordering the types

The class redefinition command does not sort the classes to be redefined. For users, class modification should be atomic. Our algorithm performs topological sorting on the relationship between classes and subclasses. Classes or interfaces should be redefined before the subclass is redefined. The new class version is incompatible with the parent class of the old class version. Therefore, the parent class can be loaded normally only when it is replaced with a new version subclass.

To ensure the inheritance relationship of the class, our sorting should be the relationship between the improved code rather than the current code. The link information of a class can be obtained only after it is loaded by the VM. Therefore, we need to analyze the class file before the class is loaded to obtain the new class relationship. In Example figure3, You need to first redefine C as c' and then redefine a to A', because in the new class, A is a subclass of C.

3.4 building a side universe

In the system, the new and old versions of the class will coexist, which is necessary for the code being executed (based on the old calss). The class instance objects of different versions may also coexist in the memory. This is the only solution to circular dependency during code evolution. For example, B depends on a before redefinition, but a depends on B before redefinition. When a new class is added, we construct a separate space (side universe) for the new class so that the class space (type universe) remains consistent. therefore, the old version of class does not affect the loading and verification of other new versions of class.

Java hotspot VM maintains a class system dictionary, which queries by class name and classloader. after a new class is loaded, we will immediately replace the old class with the new class. pre-arrange the order and re-define the classes to ensure that the side universe can be created normally. in instance figure3, when class A is loaded, Class C is returned for Class C retrieval, because Class C has been redefined before Class. if the static class field name and signature match, Vm will copy the static fields value of the old class to the static fields of the new class (not through class initialization ). figure4 shows the status of the class space (universe) after the size universe is created.

We maintain different versions of the same class in the memory. different versions of the class are interconnected through a two-way linked list. this helps to redirect to different versions during GC (this helps navigating through the versions during garbage collection ). maintain reference to the latest class in the system dictionary.

3.5 Garbage Collector adjestments

The core part of the class redefinition algorithm is the Mark-and-compact GC algorithm modification:

    1. Forward pointers: This algorithm calculates the forward pointer of each surviving object. Forward Pointer Points to the memory address of the object after memory compression.
    2. Adjust pointers: The next step is to traverse the memory. All reference pointers (pointing to the old class) will be modified to the new class ).
    3. Comaction phase: in the final compression phase, the object will be moved to the new address.

In the Pointer Adjustment stage, the reference pointing to the old class is modified to the forward pointer address of the new class. The modification of instance reference is similar to the compact stage. therefore, code evolution is implemented after GC is modified, which not only reuses the code, but also maintains high performance when the instance is updated. this makes sure that we can implement code evolution without indirect or added data interfaces. the following two sections describe two major changes.

3.5.1 swapping pointers

After updating Class C to Class C ', we need to ensure that all Class C instances are also updated to Class C' instances. The reference to the object is saved in the object instance. however, in Java hotspot Vm, no instance reference is recorded for an object. Therefore, we need to traverse the memory to find all instance objects. in addition, other references of the system, such as native code, also need to synchronously update references pointing to the old class.

Figure5 shows the GC and the processing process of pointer modification. Assume that there is an instance object X of Class A in the initial memory and the new version of classa is recorded as '. Step 1: The Collector calculates the address of all live objects after compaction, and adds (install) a pointer to the new address in each object: Forward pointer. In the Pointer Adjustment stage, the object's reference pointer and class pointer are modified to a new address. We intercepted the operations at this stage, to ensure that all the pointers pointing to old class A after Compaction point to the new address (X points to ').

3.5.2 updating instances

We need a policy to initialize fields for the new object instance to update the instance object. for fields with the same name and type, we adopt a simple algorithm: copying attribute values from the old instance to the new instance, all other fileds are initialized to 0, null, or false.

In this way, we can update instances through efficient memory operations (copying or filling is the default value ). Update information only needs to be calculated once for each class and is saved to the metadata of the class temporarily. The modified GC reads this information and copies or clears the memory of each instance. This algorithm is faster than other custom conversion methods. We believe that developers are more willing to adopt this approach that lacks flexibility but does not require additional input after balancing ease of use during debugging.

The instance update is completed in the compaction phase of GC, so no additional memory is required to maintain the coexistence of the New and Old instances. After the new object instance is created and copied to the field, the old example will be destroyed immediately. (Does GC occur after the class is updated? Strange: Is the modified GC run separately from the normal GC?)

Considering that the size of the new object instance may change (for example, the class field is increased), we adjusted the forward pointer algorithm and modified it in the garbage collector. in this case, in the compaction phase of mark-and-compact GC, the instance is not necessarily allocated to the low-level memory space under certain necessary conditions. instead, it will be allocated to the high memory address (be rescued) called the side buffer ). Otherwise, the recycler overwrites the uncopied object and destroys its fields value (see the following example ). After all the instances are copied or rescued, the data recorded in the SIFE buffer is used for the new instance that is initially rescued. To reduce the number of objects to be copied to the side buffer, the forward pointer algorithm automatically places the objects to be put in the side buffer to the highest bit in the memory, so that other objects can be normally copied to the memory status space. (This is a bit difficult to understand. See the example)

In figure6, the size of X increases, so x overwrites other objects that have not been copied in the target address, such as y, and overwrite the fields value of Y (can you say that ??). The modified forward pointer algorithm detects that X is an instance that needs to be partitioned, and then places it in the Side Buffer (the highest bit address of the memory ). this saves space for Y and Z. putting the objects to be reduced in the Side Buffer significantly reduces the number of objects to be reduced (Y and Z do not need to be put in the Side Buffer ), therefore, the space required by side buffer is reduced. in the compaction stage, X is copied to the side buffer, and Y and Z are processed normally. construct a new instance of X based on the data in Side Buffer X.

3.6 state invalidation

Modifications caused by code evolution violate multiple VM rules. Java hotspot VM does not consider code evolution in design, and many unchanged assumptions are made. For example, the field offset will not change. With these assumptions broken, some sub-modules in the VM need to be modified to avoid errors. This section will provide an overview.

3.6.1 compiled code

Before code evolution, the machine code generated by the real-time compiler needs to verify its validity. Most of the obvious potentially invalid information is the virtual method table index and fields offset. In addition, the assumptions about class inheritance (such as whether a class is a leaf node) and call (such as whether a call can be statically bound) become invalid.

Java hotspot VM has a built-in mechanism called deoptimization, which can abolish the optimized method machine code. If there is a method activation (there is an activation of the method on the stack) in the stack ). Stack frame will be converted to interpreter frame, and the code will continue to be executed in interpreter. In addition, the VM will control the entry point so that the machine code will not be executed, but will enter interpreter. We can deoptimaze all compiled methods in this way to ensure that no machine code generated on the assumption of errors is running.

3.6.2 constant pool Cache

Java hotspot VM maintains the constant pool cache of A Class. This method significantly improves the interpreter execution speed compared to the method of getting constants from the constant pool in the Java class file each time. Original entries only contains symbol references to fields methods and classes. cached entries contains direct references to object metadata. The entries related to code evolution include fields entries (the offset of A fi eld is cached), method entries (for a statically bound call a pointer to the method meta object, for a dynamically bound call the virtual method table index is cached ). We traverse the constant pool cache entries and clear the entries related to code evolution (for example, the members of the redefinition class ). When interpreter accesses a cleared entry, it is retrieved again. When you search for a class from system dictionary, the new class is automatically returned. Therefore, the entry is reinitialized to maintain the correct field offset or method information.

3.6.3 class pointer values

In Java hotspot Vm, some data structures depend on the real address of the class metadata object. For example, a hash table mapped to the class and the jdwp object (what does this example mean ?). Make sure that the data structure is reinitialized after code evolution. Class objects may also be moved during GC execution, and pointer swapping may also change the order of the two class objects. The just-in-time compiler uses a binary search array for Compiler interface objects that depends on the order of the class objects and therefore must be resorted after a code evolution step.

4. Binary incompatible changes

If the old code is corrupted, the modification is binary incompatible. This section describes how to deal with the two binary incompatible modifications mentioned in section2. In section7, we will discuss other solutions we will use in the future.

4.1 Deleted fields and Methods

When modifying the class method body or adding a field or method, the old code can be executed continuously without calling a new method or accessing the new fields. Because the old code in the system stack can continue execution, when a method or field is deleted, the deleted method may be called or the Deleted field may be accessed. The old code is unavailable. An example of this situation is shown in figure7:

 

The program is paused (modified GC) before calling the bar. The Foo method is redefined as the foo 'method, and the bar method is deleted. the next call to the foo method points to the new code, but the executed Foo method is in the old Code, and the deleted bar method is called. (In the previous example, this problem was not encountered in the development environment every time, because it is difficult for us to control that when the class is redifine, the program is paused before the bar call or before the foo method call .)

The new foo' is correct because it does not call the bar method. Is it possible that the status change is more intelligent? For example, convert the execution location of the obtained stack value and binary code to the new stack value and binary code location. This is possible. However, in general, such conversions do not conform to users' intuition.

Our current practice is that the old method can continue to be executed in interpreter. When you run the call to bar, bar references need to be resolve (we have already known the constant pool cache section3.6 In the redefinition phase ). resolution will not find this method, and then throw a nosuchmethodexception exception. (This is easy to understand, just like modifying the dependent class. The difference is that this is the method for deleting the dependent class at runtime ).

4.2 type narrowing

The class interface or parent class set is added, and the old code can run normally. It does not use instances of the class as instances of their added interfaces or supertypes, but executes as before. Conversely, if the class interface or parent class set is reduced, the old Code may not be available. Figure8 shows an example of this situation.

 

Classb is not inherited after being redefined. currently, instance B cannot be treated as instance. as shown in the code above, instance variable A may exist, which is a reference to instance variable B. After code evolution, these variables become unavailable because B is no longer a subclass of A. A. Foo () and has no meaning.

The current dcevm can correctly perform Code evolution, but the call of Foo will cause the VM to stop running. We think this is an acceptable solution in the debug environment, and we will discuss other feasible solutions in section7.

5. Evalution

This section will evaluate our implementation scheme from the following three aspects:

    1. We will discuss our support for code evolution at different levels.
    2. The modified VM has the same performance as the modified VM.
    3. Micro benchmarks is used to discuss the performance characteristics of the modified GC.

5.1 Functional Evaluation

Our solution supports any modifications to the class. When the modification is Binary compatible, the code can be executed as expected. Binary incompatibility modification may cause exceptions (methods or fields do not exist) or VM stop (Type mismatch) based on the running state of the program. However, these problems are not easy to occur, because adding methods or attributes is usually more common than deleting them, even if they are deleted, this method is less likely to be active after the class is redefined. In addition to deleting the parent class (super type or interface), Java programs (mainly old Code) are semantically compatible when they continue to run. Table1 provides the supported modification types discussed in section2:

 

During program debugging, updating the Code may cause the above problems, which is much better than restarting the service. The worst case is that the developer restarts the service, which is required when no code evolution is available. Because code evolution is not taken into account in the design of the java standard, it is not easy to clearly describe the problem when there is a problem. Therefore, we believe that compared with the hidden problem, throwing an exception or terminating the VM is acceptable and can avoid confusion.

After java1.4, JPDA defines class redefinition commands. The VM defines three identifiers to inform the debugger of its code evolution capabilities: canredefineclasses, which can define classes; canaddmethod, which can be used to add methods to classes; canunrestrictedlyredefineclasses, which can modify a class at will. As we know, dce vm is the first VM that can return true under three identifiers. A method can be added to a large span that can be modified at will. Based on the implementation complexity in section2, we plan to make a more detailed division of the Code evolution level.

Due to different application fields or development habits, it is difficult to measure the usage of code evolution (the ratio of different types of modifications, etc ). Gustavsson [20] provides an example to study the changes of a Web server project in different versions. Result 37% of method body modifications, 16% of method additions and deletions, 33% of arbitrary code modifications, and 14% of others (for example, Code cannot change to inactive state or requires modifications from outside the VM. In this example, we can increase the number of changes that do not need to be restarted from 37% to 86%.

5.2 effects on normal execution

5.3 micro benchmarks

6 related work

6.1 General Discussions

6.2 procedural ages

6.3 object-oriented ages

Java 6.4

7. Future work

The current implementation of dce vm is based on Debug. We plan to extend it to the Update Service. compared with debug, the update service is more secure. In addition, more suitable update points are required. We want to extend the class redefinition command to the secure update request .... Todo

8 conclutions

 

References:

Dynamic code evolution for the Java hotspot TM Virtual Machine (Version1)

Dynamic code evolution for Java (current version) original

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.