JVM Tuning Summary (1): Basic garbage collection Algorithm __ algorithm

Source: Internet
Author: User

Data Type

In Java virtual machines, data types can be grouped into two categories: the base type and the reference type . A variable of the base type holds the original value, that is, the value that he represents is the number itself, and the variable of the reference type holds the reference value. A reference value represents a reference to an object, not the object itself, where the object itself resides at the address represented by the reference value.

Basic types include: byte,short,int,long,char,float,double,boolean,returnaddress

Reference types include: class type , interface type , and array .

Heap and Stack

Heap and stack are the key to program operation, it is necessary to make their relationship clear.

Figure 1 Heap and stack of programs

The stack is the unit of the runtime, and the heap is the unit of storage .

The running problem of the stack resolver, that is, how the program executes, or how to handle the data, the heap solves the problem of data storage, that is, how the data is placed and where it is placed.

It is easy to understand that a thread in Java corresponds to a line stacks, because different thread execution logic differs, so a separate line stacks is required. And the heap is shared by all threads. Stack because it is a running unit, the information stored in it is related to the current thread (or program) information. This includes local variables, program running status, method return values, and so on, while the heap is only responsible for storing object information. Why should the heap and stack distinguish between. Is it possible to store data in stacks?

First, from the point of view of software design, the stack represents the processing logic, and the heap represents the data. This separation makes the processing logic clearer. The thought of divide and conquer. The idea of isolation and modularity is embodied in all aspects of software design.

Second, heap and stack separation, so that the contents of the heap can be shared by multiple stacks (also can be understood as multiple threads to access the same object). The benefits of this sharing are many. On the one hand, this sharing provides an effective means of data interaction (e.g. shared memory), on the other hand, shared constants and caches in the heap can be accessed by all stacks, saving space.

Third, the stack because of the needs of the runtime, such as saving the context of the system operation, the need for the division of the address section. Because stacks can only grow upwards, they limit the ability of the stack to store content. In the heap, the objects in the heap can grow dynamically as needed, so the stack and heap splits make the dynamic growth possible, and only one address in the stack should be recorded in the corresponding stacks.

The object-oriented is the perfect combination of heap and stack. In fact, there is no difference in execution between object-oriented programs and previously structured programs. However, the introduction of object-oriented, so that the way of thinking about the problem has changed, and more close to the natural way of thinking. When we take the object apart, you will find that the object's attributes are actually data, stored in the heap, and the object's behavior (method) is to run the logic and put it on the stack. When we write the object, we actually write the data structure and the logic of processing it. I have to admit, the object-oriented design is really beautiful. in Java, the main function is the starting point of the stack and the starting point of the program .

There is always a starting point for a program to run. Like the C language, main in Java is the starting point. Whatever Java program, find main and find the entry for program execution:) What is in the heap. What is stored in the stack .

objects are stored in the heap. The underlying data types and the references to objects in the heap are stored in the stack. the size of an object is either unpredictable or dynamically variable, but in the stack, an object corresponds to only one 4btye reference (the benefit of the stack separation).

Why not put the basic type in the heap. Because the space they occupy is generally 1~8 bytes--It takes less space, and because it's the basic type, there's no dynamic growth--the length is fixed, so storage is enough in the stack, and it doesn't make sense to put him in the heap (and it's a waste of space, later). It can be said that the basic type and object references are stored in the stack, and are a few bytes of a number, so when the program is running, they are handled in a uniform manner. But the basic type, the object reference, and the object itself are different, because one is the data in the stack and one is the data in the heap. One of the most common problems is the problem of passing parameters in Java. parameters in Java pass the newsletters value. or a reference .

To illustrate this issue, you need to be clear about two points:

1. do not attempt to analogy with C, Java does not have the concept of pointers.

2. The program is always running in the stack, so when the parameter is passed, only the problem of passing the basic type and object reference exists . The object itself is not directly transmitted.

Clear above two points after. Java, when a method call passes an argument, because there is no pointer, it is a call to the value (this can refer to C's value call). So, there are many books that say that Java is a value-transfer call, which is no problem, but also simplifies C complexity.

But how did the illusion of passing the citation result? In the runtime stack, the basic type and reference processing is the same, is the value, so, if it is a reference to the method call, but also can be understood as a "reference value" of the value call, that refers to the processing and the basic type is exactly the same. But when you go to the called method, the value of the reference that is passed is interpreted (or found) by the program to the object in the heap, which corresponds to the real object. If you modify this at this point, you modify the reference object instead of the reference itself, that is, the data in the heap is modified. So this modification can be maintained.

objects, in a sense, are made up of basic types. you can look at an object as a tree, and the object's properties, if they are objects, are still a tree (that is, not a leaf node), and the base type is the leaf node of the tree. when a program parameter is passed, the value being passed cannot be modified in itself, but if the value is a non-leaf node (that is, an object reference), you can modify all the contents of the node below it.

Stacks and stacks, the stack is the most fundamental thing to run the program. Programs can run without heaps, but not without stacks. And the heap is for the stack of data storage services, plainly heap is a shared memory. However, it is because of the separation of heap and stack thought that the Java garbage collection is possible.

Java, the size of the stack by-XSS to set, when the stack of storage data for a long time, you need to adjust the value of the appropriate, otherwise there will be java.lang.StackOverflowError anomalies. The common occurrence of this exception is recursion that cannot be returned, because the information stored in the stack is the record point returned by the method.

size of Java objects

The size of the basic data type is fixed, and there is no more to say. The size of a Java object of a non basic type is debatable. The following description is based on a 32-bit Oracle HotSpot JVM.

In Java, the size of an empty object object is 8byte, which is just the size of the object in the heap that does not have any attributes. look at the following statement:

Object OB = new Object ();

This completes the life of a Java object in the program, but the space it occupies is: 4byte+8byte。 4byte is the space required to save references in the Java stack described in the above section. And that 8byte is the information of the objects in the Java heap. Because all Java-primitive types of objects need to inherit object objects by default, their size must be greater than 8byte, regardless of the Java object.

With the size of object objects, we can calculate the size of other objects.

Class newobject {
    int count;
    Boolean flag;
    Object ob;
The size is: Empty object size (8byte) +int size (4byte) +boolean size (1byte) + null object reference size (4byte) =17byte. But because Java is divided into 8 integer times when allocating object memory,So the nearest 8 integer number greater than 17byte is 24, so the size of this object is 24byte.

Here you need to pay attention to the size of the base type wrapper type . Because this type of packaging has become an object, it is necessary to treat them as objects. The size of the wrapper type is at least 12byte (the space required to declare an empty object), and 12byte does not contain any valid information, and because the Java object size is an integer multiple of 8, a base type wrapper class is at least 16byte in size. This memory footprint is very scary, it is using the basic type of N times (n>2), some types of memory footprint is exaggerated (just think about it). As a result, you should try to use less wrapper classes if possible. After JDK5.0, the Java virtual opportunities are optimized for storage because of the addition of automatic type replacement.

Reference type

Object reference types are divided into strong references, soft references, weak references, and virtual references .

Strong reference: That's what we generally declare an object is a reference that is generated by a virtual machine, and in a strong reference environment, the garbage collection requires that the current object be strongly referenced and not garbage collected if it is strongly referenced.

Soft References: Soft references are generally used as caching. The difference from strong references is that when a soft reference is garbage collected, the virtual opportunity decides whether to recycle the soft reference based on the remaining memory of the current system. If the remaining memory is tense, the virtual opportunity reclaims the space referenced by the soft reference, and if the remaining memory is relatively rich, it is not recycled. In other words, there must be no soft references when a virtual machine occurs outofmemory.

Weak references: weak references are similar to soft references and are used as caches. However, unlike a soft reference, a weak reference is bound to be reclaimed when it is garbage collected, so its lifecycle exists only within a garbage collection cycle.

Strong references Needless to say, our system is generally used with strong references. "Soft references" and "weak references" are relatively rare. They are generally used as caching, and are generally cached in the case of a relatively limited memory size. Because if the memory is large enough, you can use a strong reference directly as a cache, while controllability is higher. Thus, they are commonly used in the caching of desktop application systems.

Here's a basic garbage collection algorithm. The garbage collection algorithm can be divided from different angles.

according to the basic recycling strategy

reference count (Reference counting):

The older collection algorithm. The principle is that this object has a reference, that is, adding a count, and deleting a reference reduces a count. When garbage collection is collected, only objects with a collection count of 0 are used. The most lethal of this algorithm is the inability to handle circular references. mark-Clear (mark-sweep):

Figure 2 Mark-purge policy

This algorithm performs in two phases. The first stage marks all referenced objects starting from the reference root node, and the second phase traverses the entire heap, clearing the unmarked objects. This algorithm needs to suspend the entire application, while generating memory fragmentation. Replication (copying):
Figure 3 Replication Strategy

This algorithm delimits the memory space to two equal regions, using only one of the regions at a time. When garbage collection, traverse the current area of use and copy the objects in use to another area. This algorithm only processes the objects in use each time, so the replication cost is small, and after the replication of the past can also be a corresponding memory collation, there will be no "fragmentation" problem. of course, the disadvantage of this algorithm is also very obvious, that is, twice times the memory space required. mark-Compress (mark-compact):
Figure 4 Marking-compression policy

This algorithm combines the advantages of "tag-clear" and "replicate" two algorithms. It is also divided into two phases, the first phase marks all referenced objects from the root node, the second phase traverses the entire heap, clears the unmarked objects and "compresses" the surviving objects into one of the heaps, discharging them sequentially. This algorithm avoids the fragmentation problem of "tag-purge" and avoids the space problem of the "copy" algorithm.

divided by the way the partition is treated

Incremental Collection (incremental collecting): real-time garbage collection algorithms, i.e. garbage collection while the application is in progress. Don't know why the collector in JDK5.0 does not use this algorithm.

Generational Collection (generational collecting): a garbage collection algorithm based on the analysis of object lifecycle. The object is divided into the young generation, the old generation, the permanent generation, the different life cycle objects using different algorithms (one of the above methods) for recycling. Now the garbage collector (starting from j2se1.2) uses this algorithm.

divide by system thread

Serial Collection: Serial collection uses a single-threaded process for all garbage collection, because it is easy and efficient to implement without multi-threaded interaction. However, its limitations are also obvious, that is, the advantages of multiprocessor can not be used, so this collection is suitable for single processor machines. Of course, this collector can also be used on multiprocessor machines with small amounts of data (around 100M).

Parallel Collection: Parallel collection uses multithreading to process garbage collection, so it is fast and efficient. And theoretically the more the number of CPUs, the more can reflect the advantages of parallel collectors.

Concurrent Collection: in comparison to serial and parallel collection, the first two of the preceding two are in the process of garbage collection, the entire environment needs to be paused, and only the garbage collector is running, so the system will have a significant pause in garbage collection, and the pause time will be longer because the heap is larger. Concurrent collection is an application that is concurrent with garbage collection and the application does not pause.

How to distinguish rubbish

The reference count method mentioned above is used to determine the number of references to generate objects and delete objects by statistical control. The garbage collector collects objects with a count of 0. However, this method does not resolve the circular reference. Therefore, the later implementation of the garbage detection algorithm, are from the program running root node, traversing the entire object reference, to find the surviving object. So where does garbage collection begin in this way of implementation? That is, where to start looking for which objects are being used by the current system. The difference between the heap and stack analyzed above, where the stack is really where the program executes, so to get which objects are being used, you need to start from the Java stack. Also, a stack corresponds to a thread, so if you have more than one thread, you must check all the stacks that correspond to those threads.

Figure 5 Root object and Object tree

At the same time, in addition to the stack, there are system runtime registers and so on, but also stored programs running data. In this way, the reference in the stack or register is the starting point, we can find the objects in the heap and then find references to other objects in the heap, which are gradually extended to end with null references or base types, thus forming an object tree with the root node of the object corresponding to the reference in the Java stack. If there are multiple references in the stack, a number of object trees will eventually form. objects on these object trees are objects that are required for the current system to run and cannot be garbage collected. Other remaining objects, then, can be treated as objects that cannot be referenced, and can be recycled as garbage.

Therefore, the starting point of garbage collection is some root objects (Java stack, static variables, registers ...). , and the simplest Java stack is the main function that the Java program executes. This type of recycling is also the "mark-purge" method mentioned above.

How to handle fragmentation

Because the different Java objects are not necessarily alive, so, after the program runs for a period of time, if no memory collation, there will be fragmented memory fragmentation. The most immediate problem with fragmentation is the inability to allocate large chunks of memory space and the inefficient operation of programs. So, in the basic garbage collection algorithm mentioned above, the "copy" method and the "tag-organize" approach can solve the problem of fragmentation.

how to resolve simultaneous object creation and object recycling issues

Garbage collection threads are reclaimed memory, while program running threads are consuming (or allocating) memory, a reclaimed memory, an allocation of memory , from this point of view, the two are contradictory. Therefore, in the existing garbage collection, before the garbage collection, the general need to suspend the entire application (that is, pause the allocation of memory), and then garbage collection, after the recovery is completed before continuing to apply. This realization is the most direct, and most effective way to resolve the contradictions between the two.

But this way has a very obvious disadvantage, that is, when the heap space continues to increase, garbage collection time will correspondingly increase, the corresponding application pause time will correspondingly increase. Some applications that require high time requirements, such as a maximum timeout requirement of hundreds of milliseconds, are more likely to exceed this limit when heap space is greater than a few g, in which case garbage collection becomes a bottleneck in the system's operation. To solve this contradiction, there is a concurrent garbage collection algorithm , using this algorithm, the garbage collection thread and the program running thread run concurrently. In this way, the problem of pause is solved, but because the need to reclaim the object at the same time as the new object, the complexity of the algorithm is greatly increased, the processing power of the system will be reduced correspondingly, and the "fragmentation" problem will be more difficult to solve.

Why do we have to divide

The generational garbage collection strategy is based on the fact that the lifecycle of different objects is not the same . As a result, objects of different lifecycles can be collected in different ways to improve recovery efficiency.

In the process of running a Java program, a large number of objects are generated, some of which are related to business information, such as session objects, threads, and socket connections in HTTP requests, which are directly tied to the business and therefore have a long lifecycle. But there are some objects, mainly in the process of running the program generated temporary variables, these objects life cycle will be relatively short, such as: string objects, because of its invariant class characteristics, the system will produce a large number of these objects, some objects can be recycled even once.

Just imagine that, without making a distinction between object survival time, each garbage collection is a collection of the entire heap space, spending time relative to the president, and because each recovery needs to traverse all the surviving objects, but in fact, for long-lived objects, this traversal is ineffective, because it may be traversed many times , but they still exist. Therefore, the separation of garbage collection by the use of the idea of division, the Division of Generations, the different life cycle of the objects in different generations, different generations to use the most suitable for its garbage collection method for recycling.

How to divide the generation

Figure 6 Java Object generational

As shown in the figure:

The virtual machines are divided into three generations: young Generation, older generation (old Generation) and persistent generation (permanent Generation). The persistence generation mainly holds the class information of the Java class, which is not related to the Java objects collected by garbage collection. The division of the younger generation and the older generation has a greater impact on garbage collection. Young generation:

All newly generated objects are first placed in the younger generation. The goal of the younger generation is to collect as quickly as possible those objects with short life cycles. The young generation is divided into three districts. An Eden area, two survivor districts (in general). Most objects are generated in the Eden area. When the Eden is full, the surviving objects will be copied to the Survivor area (one of two), and when the survivor is full, the surviving objects of the area will be copied to another survivor area, and when the survivor is full, Objects copied from the first survivor area and still alive at this time will be replicated in the old age area (tenured). Note that thetwo areas of survivor are symmetrical and have no precedence, so there may be simultaneous objects in the same zone that are replicated from Eden, and those that were copied from the previous survivor, and those that were copied to the old age only came from the first survivor. Moreover, one of the survivor areas is always empty. at the same time, the Survivor area can be configured as multiple (more than two) depending on the program, which can increase the presence of objects in the younger generation and reduce the likelihood of being placed in the older generation. older generation:

In the younger generation, those who survived after N garbage collection will be placed in the older generation. Therefore, it can be considered that the older generation of the storage of some of the longer life cycle objects. Persistent Generation:

Used to store static files, Java classes, methods, etc. today. Persistent generations have no significant impact on garbage collection, but some applications may dynamically generate or invoke some class, such as Hibernate, where a larger, persistent generation space needs to be set up to store the new classes in these runs. The persistent generation size is set by-xx:maxpermsize=<n>.

under what circumstances triggers garbage collection

Because objects are processed in a generational way, the garbage collection area and time are different. There are two types of GC:scavenge GC and full GC. Scavenge GC

In general, when a new object is generated and the Eden application space fails, it triggers the scavenge GC, GC to the Eden region, scavenging of the inactive objects, and moving the surviving objects to the survivor area. Then organize the two districts of survivor. This method of GC is performed on the young generation of the Eden area and will not affect the older generation. Because most of the objects are from the Eden area, and the Eden area will not be allocated very large, so the Eden area GC will be frequent. Therefore, it is generally necessary to use the fast and efficient algorithm, so that Eden can be free to come out as soon as possible. Full GC

Organize the whole heap, including young, tenured and perm. The full GC is slower than the scavenge GC because it needs to be recycled for the entire pair, so you should minimize the number of times that you want to have the whole GC. In the process of tuning the JVM, a large part of the work is the regulation of FULLGC. a full GC can occur for the following reasons:

· The old generation (tenured) is written full

· Persistent generation (Perm) is written full

· System.GC () is displayed to call

• Dynamic change of domain allocation policies after the last GC heap

Generational garbage collection process

Figure 7-1 Generation of garbage collection 1

Figure 7-2 Generation of garbage collection 2

Figure 7-3 Generation of garbage collection 3

Figure 7-4 Generation of garbage collection 4

Select the appropriate garbage collection algorithm

Serial collector

Figure 8 Serial Collector

All garbage collection is handled with a single thread, because there is no need for multithreaded interaction, so it is more efficient. However, the advantages of multiprocessor are not available, so this collector is suitable for single processor machines. Of course, this collector can also be used on multiprocessor machines with small amounts of data (around 100M). You can use-XX:+USESERIALGC to open it.
Parallel Collector
Figure 9 Parallel collector

Parallel garbage collection for young generations can reduce garbage collection time. typically used on multithreaded multiprocessor machines. Use-XX:+USEPARALLELGC. Open. The parallel collector, introduced in the j2se5.0 6th update, has been enhanced in the Java SE6.0--which can be collected in parallel for older generations. If the older generation does not use concurrent collections, the default is to use a single-threaded garbage collection, which restricts scalability. Open using-XX:+USEPARALLELOLDGC.

Use-xx:parallelgcthreads=<n> to set the number of threads for concurrent garbage collection. This value can be set equal to the number of machine processors.

This collector can be configured as follows:

Max garbage Collection pause: Specifies the maximum pause time for garbage collection, specified by-xx:maxgcpausemillis=<n>. <N> is milliseconds. If this value is specified, the heap size and garbage collection related parameters are adjusted to reach the specified value. Setting this value may reduce the throughput of the application.

Throughput: Throughput is the ratio of garbage collection time to non garbage collection time, set by-xx:gctimeratio=<n>, Formula 1/(1+N). For example, when-xx:gctimeratio=19, it means that 5% of the time is used for garbage collection. The default is 99, or 1% of the time is used for garbage collection.
Concurrent Collectors

It is possible to ensure that most of the work is concurrent (application does not stop), garbage collection is only paused for a very small amount of time, this collector is suitable for response time requirements of a high scale application. Open using-XX:+USECONCMARKSWEEPGC.
Figure 10 Concurrent Collectors

The concurrent collector mainly reduces the pause time of the older generation, and he uses a separate garbage collection thread to track the accessible objects without stopping the application. In each old generation garbage collection cycle, the concurrent collector at the beginning of the collection will briefly pause the entire application, pausing again in the collection. The second pause is slightly longer than the first, and multiple threads are garbage collected at the same time during this process.

  The concurrent collector uses the processor for a short pause. On an N-processor system, the Concurrent Collection section uses k/n available processors for recycling, typically 1<=K<=N/4.

The concurrent collector is used on a host that has only one processor, and a shorter pause time is set to incremental mode.

floating garbage: because garbage collection occurs while the application is running, some of the garbage can be generated when garbage collection is complete, resulting in "floating garbage", which needs to be recycled at the next garbage collection cycle. Therefore, the concurrent collector generally requires 20% of the reserved space for these floating garbage. Concurrent Mode failure: The concurrent collector collects when the application runs, so it is necessary to ensure that the heap has enough space for the program to use during the garbage collection, otherwise, the garbage collection is not completed, the heap space is full first. In this case, "concurrency mode Failure" will occur, at which time the entire application will be paused for garbage collection.   

Start concurrent Collector: because concurrent collections are collected at application run time, you must ensure that there is sufficient memory space before the collection is complete for the program to use, otherwise "Concurrent Mode failure" appears. start concurrent collection by setting-xx:cmsinitiatingoccupancyfraction=<n> to specify how many remaining heaps to run.


Serial Processor:

--Application: Small amount of data (100M or so), single processor and no request for response time.
--Disadvantages: only for small applications

Parallel Processor:

Application: "High throughput Requirements", multi-CPU, application response time is not required for large and medium applications. Examples: Background processing, scientific calculation.
--Disadvantage: Application response time may be lengthened during garbage collection

Concurrent Processors:

Application: "High response time Requirements", multiple CPUs, the application response time has a higher demand for large and medium applications. Examples: Web server/Application server, telecommunications Exchange, integrated development environment.

This article turns from: http://pengjiaheng.iteye.com/blog/518623

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: