Do you really know the life history of a Java program?

Source: Internet
Author: User

As a program ape, we write code every day, but do you really know the life cycle of it? Let's talk about its life course today, say a Java Code, from birth to game over a few steps: Compile, class load, run, GC.

Compile

The compilation period of the Java language is actually a process of "uncertainty", Because it could be a front-end compiler's process of turning a. java file into a. class file, or it might refer to the JVM's backend runtime compiler (JIT compiler) converting bytecode into machine code, or the process of compiling the. java file directly using a static advance compiler (AOT compiler) to compile the cost of the machine code. But here we are talking about the first class. is also in line with our popular knowledge of compiling. What processes have been compiled during this time period?

Lexical and grammatical analysis

Lexical analysis is the conversion of the source code character flow into the token collection, while the syntax analysis is based on the token sequence abstract constructs the syntax tree (AST) process, the AST is to describe the program code syntax structure of the tree representation, each node of the syntax tree represents a syntax structure in the program code, such as the package , types, modifiers, operators, interfaces, return values, and even code comments can all be a syntactic structure.

Fill symbol table

After completing the syntax and lexical analysis, the next step is to populate the symbol table, and the information registered in the symbol table is used at different stages of the compilation process. Extend the concept of the symbol table here. What is a symbol table? It is a form of a set of symbolic addresses and symbolic information, the simplest of which can be understood as the k-v value pairs of a hash table. Why do you use the symbol table? One of the earliest applications of symbol tables is information about organizing program code. Initially, the computer program was just a bunch of simple numbers, but the program apes soon found it much easier to use symbols to represent operations and memory addresses (variable names). Associating a name with a number requires a list of symbols. As the program grows, the performance of the symbol table operation becomes the bottleneck of the program development efficiency, so many data structures and algorithms are created to improve the efficiency of the sequential table. What are the so-called data structures and algorithms? General: Order lookup in unordered lists, binary lookup in ordered array , binary find tree, Balance lookup tree (in this case we are primarily exposed to red and black trees), hash table (hash list based on zipper method, hash list based on linear detection method). Java.util.TreeMap and Java.util.HashMap in Java, respectively, are based on a hash table of red-black tree and zip-up notation. The concept of the symbol table mentioned here is no longer in detail and interested in finding relevant information.

Semantic analysis

After the last two steps, we obtained the program code of the Abstract syntax tree representation, the syntax tree can represent a correct source code abstraction, but can not guarantee that the source program is logical, this time the semantic analysis came into being, its main task is to structure the correct source program for the context-related nature of the review. Labeling inspection, data and control flow analysis, the solution of syntactic sugar is a few steps in the semantic analysis phase, in this specific syntax of the concept of sugar. Syntactic sugar refers to a grammar that is added to the language of a computer, which has no effect on the functionality of the language, but is more convenient for the application of apes. The most commonly used syntactic sugars in Java are generics, variable-length parameters, since boxing/unpacking, traversal loops, the JVM does not support these syntaxes at run time, and they revert back to a simple basic syntax structure during the compilation phase, which is the process of parsing the syntax sugar. As an example of a generic erase,,list<integer> and list<string> are generic erased after compilation and become the same native type list<e>.

Byte code generation

Bytecode generation is the last phase of the Javac compilation process, in which the information generated from the previous steps is translated into bytecode written to disk, and a small amount of code is added and converted. The instance constructor <init> () method and class constructor <clinit> () method (the instance constructor here does not refer to the default constructor, and if the user code does not provide any constructors, the compiler will add a Accessibility is a default constructor that is consistent with the current class, and this work is done in the fill symbol table phase, while the class constructor <clinit> () method refers to all class variable assignment actions in the compiler's auto-collection class and the statement merges in the static statement block, which are added to the syntax tree at this stage. This concludes the entire compilation process.

Class loading

After compiling the program into bytecode, the next step is the process of loading the class into memory.

The process of class loading is done in the method area of the virtual machine memory, which involves the memory of the virtual machine, so we first briefly introduce the concept of the internal storage area distribution of the program. The memory area of the virtual machine is divided into: program counter, Stack, local method stack, heap, method area (some area is run constant pool), direct memory.

Program counter

A program counter is a small amount of memory space, which can be seen as the line number indicator of the bytecode executed by the current thread. In the JVM conceptual model, the bytecode interpreter works by changing the value of this counter to select the next byte-code instruction to execute.

Stack

Stacks are used to store information such as local variable tables, operand stacks, dynamic links, method exits, and so on. The local variable table holds various basic data types and object references that are restrained during compilation. This is the same thread-private as the program counter.

Local method Stack

The local method stack is similar to the virtual machine stack described above, but the difference is that the virtual machine stack executes the Java Method (bytecode) service for the virtual machine, while the local method stack is the native method service used by the virtual machine, and even some virtual opportunities combine the two pieces.

Heap

The heap is the largest piece of JVM-managed memory. It is an area shared by all threads whose sole purpose is to hold object instances where almost all object instances allocate memory (like special class objects that allocate memory in the method area). This place is also the main area of garbage collection management, from a memory recovery perspective, the garbage collector now uses a generational collection algorithm (detailed later), so the Java heap can be further subdivided: the new generation and the old age, and the new generation further subdivision: Eden Space, from Survivor space, to Survivor space. For efficiency reasons, the heap may also be divided into multiple thread-private allocation buffers (Tlab). In any case, regardless of the content of the storage, no matter what area, storage is still an object instance, they exist only to better reclaim and allocate memory.

Method area

The method area, like the heap, is a thread-shared memory area used to store data such as class information, constants, static variables, and instant compiler-compiled code that have been loaded by the virtual machine. The run-time constant pool is a part of the method area, and it is used primarily for compiling declarations of various literal and symbolic references.

Direct Memory

Direct memory is not part of the data area of the virtual runtime, nor is the memory area defined in the Java specification, which you can easily understand as out-of-heap memory, which is not limited by the Java heap size but is limited by the entire memory size.

Having said the concept of the memory area of the virtual machine, let's get back to the point, what is the process of class loading? Load, validate, prepare, parse, initialize five steps. where loading, validation, preparation, initialization are executed sequentially, and parsing is not necessarily, it may be executed after initialization.

Load

During the load phase, the JVM needs to complete three steps: First, the fully qualified name of the class is used to obtain the binary byte stream that defines this class, and then the static storage structure represented by the byte stream is converted to the runtime data structure of the method area. Finally, a Java.lang.Class object representing this class is generated in memory as a variety of data portals for the class of the method area. The first step in obtaining a binary byte stream does not explicitly indicate that the flexibility obtained from a *.class file allows us to obtain from the ZIP (which provides the basis for the jar, the Ear/war format) package, from the network Fetch (Applet), the run-time compute generation (dynamic proxy), Other file generation (JSP file generated Class), obtained from the database.

Verify

Verification, as the name implies, is to ensure that the class file byte stream contains information to meet the requirements of the JVM, because the source path of the class file is not necessarily the same as the compiler generated, it is possible to write the class file directly with the hexadecimal editor. The verification process is file format verification, metadata validation, bytecode verification, the specific security check method in this place no longer elaborate.

Get ready

The preparation phase formally allocates memory for class variables and sets the initial value of the stage where the memory used by these variables is allocated in the method area.

Analytical

The parsing phase is the process by which the JVM replaces a symbolic reference in a constant pool with a direct reference (a pointer to a target, a relative offset, or a handle), and the value of the compiled fill symbol table We talked about earlier is reflected in this place. Parsing is nothing more than parsing a class or interface, field, or interface method.

Initialization

The class initialization phase is the last step in the class loading process, where the variables have been assigned an initial value at the time of preparation, and in this step, class variables and other resources are initialized according to the requirements of the program ape customization. At this stage is the process of executing the <clinit> () method mentioned in the previous compile bytecode generation process. The virtual machine is also guaranteed in the multi-threaded environment When this method is called at the same time by the correct locking, synchronization, to ensure that only one thread to execute this method and other threads blocking wait, the author previously wrote an article "from a simple Java single example to talk about concurrency," A single example of thread safety based on class initialization involves this piece, the interest can be combined together to see. This place also involves another knowledge point that we are more concerned about, when does Java trigger initialization of the class?

    • When encountering the 4 bytecode directives of new, getstatic, putstatic, or invokestatic, if the class is not initialized, it needs to trigger its initialization, with the various fork instructions in front of the ghost, the simple understanding is the new object, When reading or setting a static field of a class, call the static method of a class.
    • When you use the Java.lang.reflect package method to make a reflection call to a class, you need to trigger its initialization if the class is not initialized. When initializing a class and discovering that its parent class has not yet been initialized, the initialization of its parent class is triggered first.
    • When the virtual machine starts, the user needs to specify a main class to be executed (the class where the Main method resides), and the virtual opportunity initializes the main class first.
    • When using dynamic language support above JDK1.7, if a Java.lang.invoke.MethodHandle instance has the final parse result of Ref_getstatic, Ref_putstatic, ref_ Invokestatic the method handle, and the corresponding class of the method handle is not initialized, the initialization action is triggered.
Run

After the above two stages, the program starts to run normally, we all know that the program execution process involves the calculation of various instructions, how the program executes it? This place will use the back-end compiler (JIT-instant compiler) + interpreter mentioned at the beginning of the article to use the hybrid mode (the hotspot virtual machine uses the interpreter and a compiler by default), and the bytecode execution engine is responsible for the tasks of this kind of computation operation of various programs. It is possible to have both options for interpreting execution (through the interpreter) and for compiling (generating local code execution through the immediate compiler) when executing Java code, or both. Stack frame is used to support the virtual machine method call and execution of the data structure, the specific stack stack of instructions to calculate the idea involves a classic algorithm--dijkstra algorithm, as to how to perform interested in their own information it will not be too deep in this place. The optimization of the runtime is equally important at this stage, while the JVM design team concentrates on the optimization of performance to this stage, so that the class files that are not generated by the JAVAC also enjoy the benefits of compiler optimizations, and what are the specific optimization techniques? There are a number of representative optimization techniques: Common sub-expression elimination, array boundary check elimination, method inline, escape analysis, and so on.

Gc

Finally speaking of the procedure to enter the stage of death. How does the JVM determine the program pills? In fact, this place uses the accessibility analysis algorithm, the basic idea of this algorithm is through a series of objects called "GC Roots" as the starting point, starting from this node down search, the path of the search is called the reference chain, when an object to the GC Roots no reference chain connected (in the graph theory, That is, from GC roots to this object is not available, it proves that this object is unavailable and is then judged to be a recyclable object. When we already know when the object to be recycled triggers garbage collection? The safe point is that some of the places where the program is tentatively executed to perform the GC, it is easy to know that the time the GC pauses is the core of garbage collection. All garbage collection algorithms and derived garbage collectors are built around minimizing GC pauses, and now the newest G1 garbage collector can create predictable pause-time models that plan to avoid full-area garbage collection throughout the Java heap. Before introducing the concept of memory area distribution, we talked about the new generation, the old age, and different garbage collectors may be acting on the new generation, but also may be in the old age, and even no generational concept (such as the G1 collector), to this, the following detailed introduction of garbage collection algorithm and corresponding garbage collector

Tag-Purge algorithm

The most basic collection algorithm, the algorithm is divided into two stages: marking and clearing, first mark all the objects to be recycled, after the mark is complete, all the tagged objects are collected uniformly. Its biggest disadvantage is inefficient, but also generate a lot of discontinuous memory fragmentation, which causes problems when the program runs a large object, even if there is enough memory in the heap, but cannot find enough contiguous memory can only have to trigger a GC operation. The corresponding garbage collector for this place is the CMS collector.

Replication Algorithms

The replication algorithm is designed to solve the efficiency problem, it can divide the available memory capacity into two blocks of equal size, use only one block at a time, and when this piece of memory is used up, copy the surviving objects to the other, and then clean up the used memory space once. This makes GC for the entire half of the time, and does not cause problems such as memory fragmentation. Today's commercial virtual machines mostly use this algorithm to reclaim the new generation, and also divide the memory ratio to 1:1, like the hotspot default Eden (a piece of Eden) and survivor (two blocks survivor area) of the size ratio of 8 : 1, each use of Eden and one of the SURVIOVR area, that is, the new generation of free memory space is the entire new generation of 90%, when recycled, Eden and one of the survivor also survived objects once copied to another survivor, Finally clean up Eden and just used survivor space, careful readers in this place may have found, if the copy process is not used survivor enough to do? It is time to rely on the old age for distribution guarantees, and the guarantee succeeds in moving Eden and the surviving objects of one of the survivor into the old age, and the guarantee fails to trigger a garbage collection in the old age. This place extends, the new generation of garbage collection called minor GC, because Java objects are mostly facing the nature of death, so minor GC is very frequent, the general recovery speed is also fast, and the old garbage collection called Major gc/full gc,major GC speed generally than minor GC is a lot slower, from the previous analysis process we can easily infer that there is a major GC, often accompanied by a minor GC, but not absolute, so our GC's purpose is to control the frequency of reducing major GC as much as possible by tuning. The corresponding garbage collector for this place is the serial collector, the Parnew collector (the multi-threaded version of the serial collector, which works in conjunction with the old age collector CMS mentioned later), the Parallel scavenge collector.

Tagging-sorting algorithms

This algorithm is used in the old age garbage collection algorithm, because the old age is not like the replication algorithm, the recovery frequency is high, but also wasted space. The mark-and-organize process is similar to mark-erase, except that the next step is not to purge the recyclable objects directly, but rather to have all the surviving objects move toward one end, and then directly clean out the memory outside the end boundary. The garbage collector that corresponds to this place is the serial old collector, the Parallel old collector.

Generational collection Algorithms

The current commercial virtual machine uses this algorithm, its idea is that we mentioned in the previous generation of heap memory area, the new generation and the old era, different regions adopt different garbage collection algorithm. The new generation uses the replication algorithm, the old age with the marker-collation or the mark-clear algorithm.

Review

In front of so many, perhaps everyone on a Java code life history a little concept, or not how to understand ah, in this place we give an example to review the whole process, when we new an object, what will go through? In combination with the previous one, when the JVM encounters a new directive, it first checks to see if the entire instruction parameter can be positioned in the constant pool of the method area to a symbolic reference of a class, and checks whether the class represented by the entire symbol reference has been loaded, parsed, and initialized, and if not, the corresponding class loading process must be performed first. After the class load check passes, the JVM will then allocate memory for the new object, which is done in the heap, the allocation size can be determined after the class load is complete, if the heap memory is regular, then the pointer moves the object size equal distance, this allocation method is called "pointer collision", if it is fragmented, The JVM maintains a list of what memory is available, allocates and updates the list record, which is called the "Free list", and in which case it depends on which garbage collector is determined by the heap we mentioned earlier. After dividing the object memory, the virtual opportunity to perform the necessary initialization, the next need to make the necessary settings for the object, which is set in the object header (class metadata information, object hash code, the object's GC generational age, and so on), after the completion of the work, a new object produced, this place is actually not finished, The next step is to call the <init> () method to perform the assignment of the Object field to the program ape plan, and finally set the reference in the stack to the memory address of the object in the heap (direct reference), when a truly usable object has been created, As for the subsequent operations on the object and the final death is the bytecode execution engine mentioned earlier AH GC AH I believe you are no longer unfamiliar.

Reference

This article refers to "deep Java Virtual Machine (2nd edition)", "Algorithm (4th edition)", interested in self-access.

Do you really know the life history of a Java program?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.