Java Virtual machine JVM performance Optimization (iii): garbage collection Detailed _java

Source: Internet
Author: User
Tags compact garbage collection advantage

The Java platform's garbage collection mechanism significantly improves the efficiency of developers, but a poorly implemented garbage collector can consume an application's resources too much. In the third part of the Java Virtual Machine Performance Optimization series, Eva Andreasson introduces the Java platform's memory model and garbage collection mechanism to Java beginners. She explains why fragmentation (rather than garbage collection) is a major problem in Java application performance, and why generational garbage collection and compression is the primary approach (but not the most innovative) for dealing with the fragmentation of Java applications at the moment.

The purpose of garbage collection (GC) is to release memory occupied by Java objects that are no longer referenced by any active object, which is a central part of the dynamic memory management mechanism of the Java Virtual machine. In a typical garbage collection cycle, all objects that are still referenced (and therefore accessible) are preserved, and those that are no longer referenced will be freed and the space they occupy will be reclaimed for allocation to new objects.

To understand the garbage collection mechanism and the various garbage collection algorithms, you first need to know some knowledge about the Java Platform memory model.

Garbage collection and Java Platform memory model

When you start a Java program with the command line and specify the startup parameter-xmx (for example, java-xmx:2g MyApp), the memory of the specified size is assigned to the Java process, which is called the Java heap. This dedicated memory address space is used to store objects created by Java programs (sometimes JVMs). As the application runs and constantly allocates memory for new objects, the Java heap (that is, the dedicated memory address space) is slowly filling up.

The final Java heap is filled, meaning that the memory allocation thread cannot find a large enough contiguous space to allocate memory for the new object, when the JVM decides to notify the garbage collector and start garbage collection. Garbage collection can also be triggered by calling System.GC () in a program, but using System.GC () does not guarantee that garbage collection must be executed. Before any garbage collection, the garbage collection mechanism first determines whether the garbage collection is safe and can start a garbage collection when all active threads of the application are at a safe point. For example, garbage collection cannot be performed when memory is being allocated for an object, or garbage collection cannot be performed while tuning CPU instructions, because it is likely that the context is lost and the final result is mistaken.

The garbage collector cannot recycle any object that has an active reference, which destroys the Java Virtual Machine specification. There is no need to recycle dead objects immediately, because the dead object will eventually be reclaimed by subsequent garbage collection. Although there are many ways to implement garbage collection, the above two points are the same for all garbage collection implementations. The real challenge of garbage collection is how to identify if an object survives and how to reclaim it without impacting the application, so the target for the garbage collector is the following two:

1. Quickly release the memory without reference to meet the application's memory allocation needs to avoid memory overflow.
2. Minimize the impact on running application performance (latency and throughput) when reclaiming memory.

Two types of garbage collection

In the first installment of this series, I introduced two methods of garbage collection, namely, reference counting and trace collection. Next we explore these two approaches and introduce some of the tracking collection algorithms used in the production environment.

Reference count Collector

The reference count collector records the number of references to each Java object, and once the number of references to an object is 0, the object can be reclaimed immediately. This immediacy is the main benefit of reference counting collectors, and maintaining the memory that does not refer to it has almost no overhead, but it is costly to record the latest number of references for each object.

The main difficulty with reference counting collectors is how to guarantee the accuracy of reference counts, and another well-known difficulty is how to handle circular references. If two objects are referenced to each other and are not referenced by other active objects, the memory of the two objects will never be reclaimed because the number of references to both objects is not 0. Memory recycling for circular reference structures requires major analysis (global profiling on the Java heap), which increases the complexity of the algorithm and thus adds additional overhead to the application.

Trace Collector

The trace collector is based on the assumption that all active objects can be found by iteration references (references and references) to a known set of initial active objects. You can determine the initial set of active objects (also known as root objects) by analyzing registers, global objects, and stack frames. When the collection of initial objects is determined, the trace collector continues to expand the set of known active objects by referencing the object to which the reference refers, in turn, by marking the object to which it points as the active object. This process continues until all referenced objects are labeled as active, and the memory of those objects that have not been annotated is reclaimed.

The trace collector differs from the reference count collector primarily because it can handle the circular reference structure. Most trace collectors find unreferenced objects in the circular reference structure during the markup phase.

Trace collectors are the most commonly used memory management methods in dynamic languages and are the most common in Java today, and have been validated for many years in the production environment. I'll start by introducing the tracking collector from some algorithms that implement trace collection.

Tracking Collection algorithm

Copying garbage collectors and tags-clearing the garbage collector is nothing new, but they are still the two most common algorithms for tracking collection today.

Replication Garbage Collector

The traditional replication garbage collector uses the two address spaces in the heap (that is, the from space and the to space). When a garbage collection is performed, the active object in the From space is copied to the to space, and when all active objects from the From space are moved out (the translator: copied to the to space or old age), You can recycle the entire from space, and when you start allocating space again, you will first use the To space (translator note: The last round of the to space as a new round from space).

In the early implementation of the algorithm, the From space and to space constantly transform the position, that is, when the to space is full, triggering the garbage collection, to space becomes the from space, as shown in Figure 1.

Figure 1 Traditional replication garbage collection order

The latest replication algorithm allows any address space within the heap as the to and from spaces. This way they don't need to swap places, but only logically change positions.

The advantage of the replication collector is that the objects being replicated in the to space are compact and completely free of fragmentation. And fragmentation is one of the common problems that other garbage collectors face, and that's what I've been talking about.

Defect of replication Collector

In general, the replication collector is stop-the-world, meaning that the application cannot execute as long as the garbage collection is in progress. The more things you need to replicate for this implementation, the greater the impact on application performance. This is a disadvantage for those response time sensitive applications. When using the replication collector, you also have to consider the worst scenario (that is, all objects in the From space are active), and you need to have enough space to move the active objects so that the to space must be large to all objects that can load the from space. Because of this limitation, the memory utilization of the replication algorithm is slightly insufficient (translator: In the worst case, the to space needs to be the same size as the from space, so only 50% utilization).

Mark-Purge collector

Most commercial JVMs deployed in an enterprise production environment are labeled-clear (or labeled) collectors because it does not replicate the impact of the garbage collector on application performance. One of the most famous tag collectors includes CMS, G1, Genpar, and DETERMINISTICGC.

Tag-Clears the Collector trace object reference and marks each found object as live with a flag bit. This flag usually corresponds to an address or a set of addresses on the heap. For example, an active bit can be a bit of the object header (the translator note: bit) or a bit vector, a bitmap.

After the tag is complete, it enters the purge phase. The purge phase usually iterates through the heap (not just the objects marked live, but the entire heap), which is used to locate unmarked contiguous memory address spaces (the unmarked memory is idle and recyclable), and the collector then collates them into an idle list. The garbage collector can have multiple free lists (usually divided by the size of the memory block), and some JVMs (for example: JRockit real time) are dynamically dividing the idle list based on the application's performance analysis and the statistical results of the object size.

After the purge phase, the application can allocate memory again. When allocating memory from an idle list for a new object, the newly allocated memory block needs to match the size of the new object, the average object size of the thread, or the Tlab size of the application. Finding the right size memory block for new objects helps to optimize memory and reduce fragmentation.

Tag-Clears the collector of defects

The execution time of the mark phase depends on the number of active objects in the heap, while the cleanup phase depends on the size of the heap. Therefore, for a larger heap setting and more active objects in the heap, the tag-purge algorithm has a certain pause time.

For applications with large memory consumption, you can adjust the garbage collection parameters to suit the scenarios and needs of various applications. In many cases, this adjustment delays at least the risk of the markup phase/cleanup phase to the application or service agreement SLA (where the SLA refers to the response time the application is to achieve). However, tuning is only effective for specific load and memory allocation rates, and load changes or changes to the application itself need to be tuned again.

Implementation of the tag-purge collector

There are at least two methods that have been commercially validated to implement the tag-clean garbage collection. One is parallel garbage collection, the other is concurrent (or most of the time concurrent) garbage collection.

Parallel collector

Parallel collection means that resources are used in parallel by garbage collection threads. Most of the business implementations of parallel collections are stop-the-world collectors, where all application threads are paused until a garbage collection is completed, because the garbage collector can use resources efficiently, so high points, such as SPECJBB, are usually obtained in the benchmark of throughput. Parallel garbage collectors are a good choice if throughput is critical to your application.

The primary cost of parallel collection, especially for production environments, is that the application thread does not work properly during garbage collection, just like a copy collector. Therefore, the use of parallel collectors for response time sensitive applications can have a significant impact. In particular, there are many object references that need to be tracked when there are many complex active object structures in the heap space. (Remember? Tags-the time that the collector reclaims memory depends on the time that the collection of active objects is tracked plus the time spent traversing the entire heap) for the parallel method, the entire garbage collection time application is paused.

Concurrent Collectors

Concurrent garbage collectors are better suited for applications that are sensitive to response times. Concurrency means that garbage collection threads and application threads execute concurrently. Garbage collection threads do not monopolize all resources, so you need to decide when to start a garbage collection and have enough time to track the collection of active objects and reclaim memory before application memory overflows. If the garbage collection is not completed in time, the application throws a memory overflow error and does not want the garbage collection to execute too long because it consumes the resources of the application and then affects throughput. It is tricky to maintain this balance, so heuristic algorithms are used to determine when to start garbage collection and to choose the timing of garbage collection optimization.

Another difficulty is determining when you can safely perform some operations (operations that require complete and accurate heap snapshots), for example: you need to know when to mark phase completion so that you can go to the cleanup phase. This is not a problem for stop-the-world parallel collectors because the world has been paused (the application thread is paused and the garbage collection thread exclusive). However, for concurrent collectors, it may not be safe to switch from the markup phase to the cleanup phase immediately. If an application thread modifies a section of memory that has been tracked and annotated by the garbage collector, this may result in a new, annotated reference. In some concurrent collection implementations, this causes the application to fall into a lengthy cycle of repeating annotations, and it cannot obtain free memory when the application requires that memory.

Through the discussion so far we know that there are a lot of garbage collectors and garbage collection algorithms that are suitable for specific application types and different workloads. Not only the different algorithms, but also the different algorithm implementation. So in specifying the garbage collector money it is best to understand the requirements of the application as well as its own characteristics. Next we'll look at some of the pitfalls of the Java Platform memory model, which means that Java programmers are easily making assumptions that make application performance worse in a dynamically changing production environment.

Why tuning cannot replace garbage collection

Most Java programmers know that if you want to optimize Java programs, you can have a lot of choices. Several optional JVMs, garbage collectors, and performance tuning parameters allow developers to spend a lot of time on endless performance tuning. This has led some to conclude that garbage collection is bad, and that it is a good workaround to use tuning to make garbage collection less likely or to have a shorter duration, but this is risky.

Consider tuning for specific applications, where most tuning parameters (such as memory allocation rate, object size, response time) are adjusted based on the amount of data currently being tested on the application's memory allocation rate (Translator note: or other parameters). In the end, the following two results may result:

1. The use cases passed in the test failed in the production environment.
2. Changes in data volumes or application changes require a new tuning.

Tuning is iterative, especially if the concurrent garbage collector may require many tuning (especially in a production environment). Heuristic methods are needed to meet the needs of the application. In order to meet the worst-case scenario, the result of tuning may be a very inflexible configuration, which also leads to a lot of waste of resources. This kind of tuning method is a kind of Don Quixote style of exploration. In fact, the more you optimize the garbage collector to match a particular load, the more it is away from the dynamic nature of the Java runtime. After all, how much of the application load is stable, and what is the reliability of the load you expect?

So what can you do to prevent memory overflow errors and increase response time if you don't focus on tuning? The first thing to do is to find the main factors that affect the performance of Java applications.

of fragmentation

The factors that affect the performance of Java applications are not the garbage collectors, but the fragmentation and how the garbage collector handles fragmentation. Fragmentation is a state in which free space is available in the heap space, but there is not enough contiguous memory space to allocate memory for new objects. As mentioned in the first article, the memory fragment is either a piece of space remaining in the heap, or the space occupied by a small object that is tlab in the middle of a long surviving object.

As time goes on and the application runs, the fragments will spread across the heap. In some cases, parameters that use static tuning can be even worse because they do not meet the dynamic needs of the application. Applications cannot effectively take advantage of these fragmented space. If nothing is done, it will result in successive garbage collection attempts to free memory allocations to new objects. In the worst case, even successive garbage collections cannot release more memory (too much fragmentation), and the JVM has to throw a memory overflow error. You can solve fragmentation by restarting your application so that the Java heap has contiguous memory space to allocate to new objects. Restarting the program causes downtime, and after a while the Java heap will be filled again with fragmentation and has to be restarted again.

A memory overflow error suspends the process, and the log shows that the garbage collector is overworked, which shows that garbage collection is trying to free up memory, and that the heap is fragmented. Some programmers will try to solve the problem of fragmentation by again optimizing the garbage collector. But I think we should find a more innovative way to solve this problem. The next section will focus on two ways to address fragmentation: generation of garbage collection and compression.

Generation of garbage collection

You've probably heard the theory that most of the objects in the production environment have a short survival time. Generational garbage collection is a garbage collection strategy derived from this theory. In generational garbage collection, we divide the heap into different spaces (or generations), each of which holds objects of different ages, the age of the object being the number of garbage collection cycles in which the object survives (that is, how many garbage collection cycles are still referenced after the object).

When there is no surplus space to allocate, the new generation of active objects will be moved to the old age (usually only two generations.) Translators Note: Only those who meet a certain age will be moved to the old age, and the generational garbage collection often uses a one-way replication collector, and some of the more modern JVMs use parallel collectors in the Cenozoic, and, of course, they can implement different garbage collection algorithms for the Cenozoic and old eras respectively. If you use a parallel collector or a copy collector, your new generation collector is a Stop-the-world collector (see previous explanations).

The old age is assigned to objects that have been removed from the Cenozoic, either referenced for a long time, or referenced by a collection of objects in the Cenozoic. Occasionally large objects are directly assigned to the old age, because the cost of moving large objects is relatively high.

Generation of garbage collection technology

In the generation of garbage collection, the frequency of garbage collection in the old age is low and the frequency of garbage collection in the Cenozoic is higher, and we hope that the garbage collection cycle will be shorter in the Cenozoic. In rare cases, the new generation of garbage collection may be more frequent than the old generation of garbage collection. This can happen if you set the new generation too large and most of the objects in your application survive for a long time. In this case, if the old age is set too small to accommodate all the long-lived objects, garbage collection in the old age will struggle to free up space for the objects that are being moved in. But typically, generational garbage collection can give applications better performance.

Another benefit of dividing up the new generation is to sort of solve the fragmentation problem, or postpone the worst. Small, short-lived objects could have been a problem of fragmentation, but were cleaned up in the new generation of garbage collection. The older age is also more compact because the long-lived objects are moved to more compact space in the old age. Over time (if your application is running long enough), the old age can also be fragmented, by running one or several times of complete garbage collection, and the JVM may throw a memory overflow error. But it is enough for many applications to divide the new generation into a time when the worst-case scenario is delayed. For most applications, it does reduce the frequency of Stop-the-world garbage collection and the chance of memory overflow errors.

Optimized generation of garbage collection

As mentioned earlier, the use of generational garbage collection has resulted in repetitive tuning efforts, such as adjusting the Cenozoic size, lifting rate, and so on. I can't emphasize how to make trade-offs for specific applications: Choosing a fixed size can optimize the application, but it also reduces the ability of the garbage collector to respond to dynamic changes, which is unavoidable.

The first principle for the new generation is to increase as much as possible while ensuring that stop-the-world garbage collection is delayed, and to keep enough space in the heap for those objects that are long-lived. The following are some additional factors to consider when adjusting the generational garbage collector:

1. Most of the Cenozoic are stop-the-world garbage collectors, the larger the new generation set, the longer the corresponding pause time. So for applications that are affected by garbage collection pause times, it's important to consider how appropriate the new generation will be.

2. Different garbage collection algorithms can be used on different generations. For example, parallel garbage collection is used in the Cenozoic, and concurrent garbage collection is used in the old age.

3. The failure of frequent ascension (the translator: moving from the Cenozoic to the old) suggests that there are too many fragments in the old age, meaning that there is not enough space in the old age to store the objects removed from the Cenozoic. At this point you can adjust the rate of ascension (that is, adjust the age of Ascension), or ensure that the garbage collection algorithm in the old age is compressed (discussed in the next section) and adjusts the compression to fit the application load. You can also increase the heap size and the size of each generation, but this will further extend the pause time in the old age. You know, fragmentation is unavoidable.

4. Garbage collection is the most suitable for this application, they have a lot of short-lived small objects, many objects in the first round of garbage collection cycle was recycled. This kind of application generation of garbage collection can be a good reduction of fragmentation, and the timing of the impact of fragmentation is postponed.


Although generational garbage collection delays the time when fragmentation and memory overflow errors occur, compression is the only way to really solve the problem of fragmentation. Compression is a garbage collection policy that frees contiguous blocks of memory by moving objects, freeing up enough space to create new objects by compressing them.

Moving an object and updating an object reference is a Stop-the-world operation that can lead to a certain amount of consumption (one exception will be discussed in the next article in this series). The more surviving objects, the longer the pause time caused by compression. In the case of little space left and serious fragmentation (this is usually because the program is running for a long time), compressing a region with more live objects can take a few seconds to pause, and compressing the entire heap takes a few 10 seconds to close the memory overflow.

The pause time for compression depends on the amount of memory that needs to be moved and the number of references that need to be updated. Statistical analysis indicates that the larger the heap, the more active objects and updated references need to be moved. The pause time for each 1GB to 2GB active object is approximately 1 seconds, and 25% of the active object is likely to be in the 4GB size heap, so there is occasionally a pause of about 1 seconds.

Compression and application memory wall

The application memory wall refers to the heap size that can be set before a pause (for example, compression) is generated by garbage collection. Depending on the system and application, most Java application memory walls are between 4GB and 20GB. This is why most enterprise applications are deployed on several smaller jvms rather than on a few larger JVMs. Let's consider this question: How many modern enterprise Java application designs and deployments are defined by the JVM's compression restrictions. In this case, to circumvent the pause time of defragmenting the heap, we accept multiple instance deployments that are more cost-intensive. This is a bit of a surprise given the large capacity storage capabilities of the hardware and the need to increase memory for enterprise-class Java applications today. Why only a few gigabytes of memory are set for each instance. Concurrent compression will break the memory wall, which is also the subject of my next article.


This article is an introductory article on garbage collection to help you understand the concepts and mechanisms of garbage collection, and hopefully it will encourage you to read more about it. Many of the things discussed here have been around for a long time, and some new concepts will be introduced in the next article. Concurrent compression, for example, is currently implemented by the Azul ' s Zing JVM. It is a new garbage collection technology that even attempts to redefine the Java memory model, especially with today's memory and processing capabilities increasing.

Here are some of the key points I've summed up about garbage collection:

1. Different garbage collection algorithms and implementations adapt to the needs of different applications, and tracking garbage collectors is the most common garbage collector used in commercial Java virtual machines.

2. Parallel garbage collection uses all resources in parallel when performing garbage collection. It is usually a stop-the-world garbage collector and therefore has higher throughput, but the application's worker threads must wait for the garbage collection thread to complete, which has a certain impact on the response time of the application.

3. Concurrent garbage collection The application worker thread is still running when the collection is performed. The concurrent garbage collector needs to complete the garbage collection before the application needs memory.

4. Generation of garbage collection can help delay fragmentation, but it cannot eliminate fragmentation. Generational garbage collection divides the heap into two spaces, one of which holds the new object and the other holds the old object. The generation of garbage collection is suitable for applications that have a small number of short-lived objects.

5. Compression is the only way to resolve fragmentation. Most garbage collectors are compressed in a stop-the-world way, the longer the program runs, the more complex the object reference, and the more uneven the size of the object will result in longer compression time. The size of the heap also affects the compression time, because there may be more active objects and references that need to be updated.

6. Tuning helps to delay memory overflow errors. But the result of excessive tuning is a rigid configuration. Before you start tuning in a trial-and-error way, be sure to know the load of the production environment, the object type of the application, and the attributes of the object reference. Overly rigid configurations are likely to be unable to cope with dynamic loads, so be sure to understand the consequences of doing so when you set up a non dynamic value.

The next article in this series is: in-depth discussion of C4 (Concurrent continuously compacting Collector) garbage collection algorithm, please look forward to!

(end of full text)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.