Principles of garbage collection in Java

Source: Internet
Author: User
Garbage collection Overview

A user program (mutator) modifies the object set in the heap area, obtains space from the storage manager, creates an object, and introduces and removes references to existing objects.

When mutator cannot "reach" some objects, these objects become garbage.

Objective: To find inaccessible objects and hand them over to the storage manager that tracks idle space to reclaim the resources they occupy.

Some Basic Concepts

Type security: the type of any data component is deterministic.

The data type can be determined at the time of compilation. It is called static type security, and runtime confirmation is called dynamic type.

The language of this type is not suitable for automatic garbage collection.

In Java, except for integer and reference basic types, all objects are allocated in the heap area rather than the stack area. This design removes the need for programmers to focus on variable lifecycles, but the cost is to generate more garbage.

Accessibility

The data that can be directly accessed by the program by unreferencing any pointer is reachable.

Locality Principle

If the storage location of a program is likely to be accessed again in a short period of time, it is called temporal locality ). If the accessed storage location is likely to be accessed in a short period of time, the program has a space locality.

It is generally considered that the program executes 90% of the Code in 10% time.

Principles of several garbage collectors

Mark-clear collector

The collector first traverses the object graph and marks reachable objects, then scans the stack to find untagged objects and releases their memory. This type of collector generally uses a single thread to work and stops other operations. In addition, because it only clears unmarked objects and does not compress the marked objects, a large amount of memory fragments will be generated, thus wasting memory.

Mark-compression collector

It is also called the Mark-clear-compression collector, which has the same mark stage as the Mark-clear collector. In the second stage, the tag object is copied to the new domain of the stack to compress the stack. This collector also stops other operations.

Copy collector

This collector divides the stack into two fields, which are often called semi-space. Only half of the space is used each time, and the new object generated by JVM is placed in the other half of the space. During GC running, it copies the reachable objects to the other half of the space, thus compressing the stack. This method is applicable to objects with a short lifetime, and objects with a long lifetime of continuous replication result in lower efficiency. In addition, the memory size of the specified size heap is doubled, because only half of the heap size is used at any time.

Incremental collector

The incremental collector divides the stack into multiple domains and collects garbage from only one domain at a time. It can also be understood as dividing the stack into a small block and collecting garbage from only one block at a time. This will cause a small amount of application interruption time, making it hard for users to notice that the garbage collector is working.

Principle of partial recovery

Generally, 80% ~ 90% of new allocation objects are distributed within millions of commands.

Generation collector (generational garbage coolection)

It is based on the copy recycler and partial recycling principle.

It is an effective method to take full advantage of the features of most objects "Young and dead.

Divides the heap into a series of small areas, using 0, 1, 2 ...... n. The smaller the serial number, the younger the objects are stored. The objects are first created in Area 0, and the garbage is collected after they are filled. The reachable objects are moved to area 1, each round of garbage collection is performed for areas where the serial number is less than or equal to I, and I is the highest number of the currently filled areas.

As long as I is recycled, all areas with the serial No. Light rain I will also be reclaimed. The younger generation will often contain more garbage, that is, more frequent garbage collection.

The oldest generation saves the most mature objects, and the collection of these objects is the most expensive, which is equivalent to a complete collection. Can cause a long pause. The solution is to use the train algorithm.

Garbage collectors in the j2se 5.0 hotspot JVM

The roles of the garbage collector are as follows: 1) allocate memory; 2) ensure that the referenced memory exists; 3) reclaim inaccessible objects in memory.

Make some trade-offs and choices when developing garbage collection algorithms: 1) serial or parallel; 2) concurrent or stop-the-world; 3) whether the memory is tight or not, or the copy algorithm is used.

Four types of GC in j2se 5.0 hotspot JVM use the generation collector principle, as follows:

Hotspot (j2se 5.0, also applies to 6.0 according to the White Paper) uses the so-called generational collection mechanism, which divides objects into young and old (and permanent ), after the young object is recycled several times (after a long time), it becomes an old object. this mechanism is based on the following observation: Most new object references will not last too long, that is, die young; a few references will continue.
Therefore, young generation has more collections, so the algorithm used has a high requirement on time efficiency. the old generation stores a large amount of data, and the algorithm used requires a high space efficiency. divides objects into different generations to facilitate operations using different algorithms.
Correspondingly, the hotspotjvm memory can be divided into young generation, oldgeneration, and permanent generation. By default, all the objects are placed on the left of the region.

The garbage collector 4 in j2sehotspot JVM is a generational collection.
Generally, when a new object is created, the memory is usually allocated in Young, old stores the surviving objects in young after several recycling times and some large objects allocated in old at the beginning. permanent generation stores class information and metadata.
Younggeneration is divided into one Eden region and two survivor regions. the object is generated in the Eden area. After one GC, the surviving object enters the same vor. After more rigorous tests, it enters the old. at the same time, only one vor region is saved, and the other is empty.

When the young region is full, the GC algorithm of the young region is executed. when the old and permanent regions are full, young GC is executed first. execute the GC of old and permanent. If too many objects in the old region cannot execute the GC of young, execute the GC algorithm of old in the young region (because the memory space is low ), however, the old algorithm of the CMS recycler does not work. The reasons are described below.

For multi-threaded applications, JVM uses a thread-localallocation buffers to allocate an area for each thread to eliminate thread competition. If this area is full, use the lock.

Four GC recyclers of hotspot:
Serializable collection of serial COLLECTOR:
Feature: the application will be suspended during recycling.
Young region: copy the surviving objects in the Eden and a region vor region to another region vor region (set as to) (large objects are directly placed in the old region ). if the to region is full, copy it to the old region.
Old Region: Use the Mark-sweep-compact GC algorithm to mark the surviving object, clear the discarded object, and move all the surviving objects to a region, A large amount of free space is provided.
Applicability: Most client applications can use this collection algorithm, which is also the default collection algorithm of hotspot. it takes less than half a second to completely recycle a 64 MB area on the current machine.

Parallel collection of parallel COLLECTOR:
Features: multi-core CPU can be used.
Young region: the basic mechanism of suspending applications is similar to serialization, but multithreading can accelerate efficiency.
Old Region: Same serialization.
It can be used on multi-core computers.

Parallel compression and recovery parallel compacting COLLECTOR:
Compared with parallel collection, this mainly involves a new algorithm in the old area. At the same time, according to the White Paper, such collection will eventually replace parallel collection.
Young region: Parallel recovery.
Old region: first, we divide old into several consecutive regions. then, check in parallel in each region to mark the alive object (first mark the object that can be directly referenced, and then all ). then begin to check these areas, to obtain the degree of density (the area on the left is certainly more intensive than the area on the right), from a very dense area, parallel compression of the right area.
Applicability: for multi-core environments with pause time requirements, it is better to use parallel compression for recovery than parallel recovery. however, for servers with a high sharing rate (that is to say, one server runs multiple applications), GC of one application may affect other applications because the collection in the old region is slow and multithreading occurs. corresponding solution: You can configure to reduce the number of concurrent threads.

Parallel mark clearing recycle concurrent mark sweep COLLECTOR:
Young region: Parallel recovery.
The old region is divided into several steps.
Initialmark: When GC is required, the application is paused and all directly referenced objects are marked.
Concurrentmark: Then, continue the application and check the marked objects to get all the surviving objects.
Remark: Pause the application again, check (new, obsolete) the objects modified by the application during the concurrent mark duration, and mark the surviving objects. this phase lasts for a long time, so multithreading is used. after the stage ends, all the surviving objects are marked, and unmarked objects are junk objects.
Sweep: Stop the application and release the space of all junk objects.


Differences from other algorithms:
First, do not perform compression. However, some memory blocks will be merged/split by calculating potential memory requirements in the future.
Second, GC is not executed when the old area is full, but when the space is smaller than a certain degree.
Third: fragments are generated because compression is not performed.

In addition, CMS can also use the incremental running mode, that is, only part of the work is executed in the concurrentmark stage, and then the resources are returned to the application. the recycler is divided into several parts and arranged to be completed in the idle phase of the young area twice. this mode is generally used when the pause time is required and the number of processors is small (single-core or dual-core ).
In general, compared with parallel collection, CMS reduces the pause time of the old GC (sometimes the effect is significant ), the young GC time is slightly extended (because the time for the object to be transferred from the young area to the old area is longer: no compression is performed, so the appropriate area needs to be found first ), this reduces the execution efficiency of the entire system and greatly enhances the demand for memory space.

Some Java programming suggestions

Based on the working principle of GC, we can use some techniques and methods to make GC run more efficiently and better meet the requirements of applications. Some Suggestions on Program Design:

1. The most basic suggestion is to release reference of useless objects as soon as possible. When using temporary variables, most programmers automatically set reference variables to null after they exit the scope. when using this method, we must pay special attention to some complex object graphs, such as arrays, queues, trees, and graphs. These objects have more complex reference relationships. For such objects, GC is generally less efficient to recycle them. If the program permits, the unused reference object is assigned null as early as possible, which can accelerate GC.

2. Use the finalize function as little as possible. The finalize function is an opportunity that Java provides to programmers to release objects or resources. However, it will increase the GC workload, so use finalize as little as possible to recycle resources.

3. If you need to use frequently used images, you can use the soft application type. It can store images in the memory as much as possible for the program to call without causing outofmemory.

4. Pay attention to the collection data types, including arrays, trees, graphs, and linked lists. The collection of these data structures is more complex for GC. In addition, pay attention to some global variables and some static variables. These variables are often prone to dangling reference, resulting in a waste of memory.

5. When the program has a certain waiting time, the programmer can manually execute system. GC () to notify the GC to run, but the Java language specification does not guarantee that GC will be executed. Incremental GC can shorten the pause time of Java programs.

Reference

Four garbage collectors for j2se 5.0 hotspot-http://blog.csdn.net/dmy_110/article/details/8000007

Java garbage collection mechanism-http://developer.51cto.com/art/201009/227691.htm

Memory Management in the javahotspottm Virtual Machine-Download

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.