1. Introduction
One of the biggest advantages of the Java platform is its automatic memory management, which allows Java developers to not write their own code to manage the memory, thereby leaving the complex memory management to focus on the development of business logic.
This article is intended to be a general introduction to the memory management of the hotspot virtual machines released by Sun's j2se5.0 release. This article mainly describes some of the available garbage collector (garbage collector) in memory management, and provides some comments and suggestions on the selection and configuration of the garbage collector, the size of the garbage collector's operating memory, and so on. It also provides resource information, such as some of the most common virtual machine configuration options that affect the garbage collector's work, and links to documentation resources for more information.
The second section is primarily for readers unfamiliar with automatic memory management. This includes a discussion of the benefits of automatic memory management relative to the developer's manual memory management. The third section mainly introduces the concept of garbage collection, the choice of design and performance characteristics. It also introduces a common memory management based on the object life cycle (that is, dividing the memory into several different regions, and the life cycle of the objects in each region is different). Memory management based on generational is proven to be effective in reducing the time and overall resource consumption of garbage collection in different applications.
The remainder of the article mainly provides information about the hotspot virtual machine. Section Fourth describes the four available garbage collector, including a new one in the j2se5.0 Update6 release, and a detailed description of generational-based memory management. For each garbage collector, section fourth summarizes the appropriate garbage collection algorithm used and the period of use.
The fifth section describes a new technology introduced in j2se5.0, which is called (ergonomics, which is to make memory management more humane), and consists mainly of two situations: (1) automatically select the garbage collector, the size of the heap, the application's platform and operating system, And the version of the virtual machine (client or server side); (2) Dynamically adjust the garbage collection according to the user-specified garbage collector's behavior parameters (for example, the user can set the upper and lower limits of the garbage collection pause time, the ratio of time spent by garbage collection, etc.).
Section Sixth provides comments and suggestions on the configuration and selection of some garbage collector. It also provides some advice on how to do this when encountering outofmemoryerrors exceptions. Section seventh briefly describes some of the tools you can use to evaluate the performance of garbage collection, and section eighth lists some of the options for the garbage collector and some commands to control how the garbage collector works. The last Nineth section provides links to detailed reference documentation for the topics covered in this article.
2. Manual vs Automatic memory management
The main task of memory management is to identify when the memory allocated to an object is no longer needed by the application, and then release the memory allocated to that object to facilitate subsequent memory allocations. In some other programming languages, memory management is done by developers themselves. The complexity of memory management can lead to some common errors that can cause some unforeseen incorrect program behavior or even program crashes. As a result, developers spend most of their development time debugging and modifying this error.
One of the problems that often occurs in manual memory management is the (dangling references) dangling pointer reference. When memory is released, it may occur that an object a frees the memory space m in other object B still retains a reference to that memory space m, this time the memory space m has been recycled by the operating system, but object B still retains a reference to memory space m, the reference is (dangling references )。 This is the case when object B tries to use the reference to access the contents of the original memory space, but the original content of the memory space has been emptied and reclaimed, and now it has been assigned to a new object, resulting in improper access to the content, it will cause the program to appear unforeseen circumstances, And that's not what we want.
Another scenario is our common memory leak, which occurs when an allocated memory is no longer being used by the program, but is not released. For example, you need to use a linked list, but you make a mistake, that is, when the memory is freed only the memory of the first node of the list, so that the remaining nodes can no longer be accessed by the program, and can not be overwritten, if this continues to occur, it will lead to constant memory consumption, Until there is no memory to use.
A scheme that is now often used by object-oriented languages to implement memory management is automatic memory management (also a garbage collector). Automated memory management allows for more advanced abstraction interfaces and more robust coding (that is, provides more interfaces for business logic processing, more advanced encapsulation of memory management, and automatic memory management, so that developers can focus more on the implementation of business logic, without worrying about complex memory management).
Garbage collection avoids the problem referred to above (dangling references) dangling pointer references, because the memory space that is still referenced is considered to be non-idle and will never be garbage collected. garbage collection also resolves the issue of memory space leaks because the memory space that is not referenced is automatically freed.
3. The concept of garbage collection
A garbage collector is primarily responsible for the following tasks:
- Memory allocation
- Ensure that the memory space that is still referenced by the object is not freed
- Reclaim memory space that is no longer referenced by an object
(Here we think that the object and the memory space mentioned above are equivalent, i.e. "Object ~ Memory Space") The referenced object (memory space) is considered to be alive. Objects that are no longer referenced (memory space, followed by the same) are considered dead, meaning that the application is no longer used and will be considered garbage. The process of finding and releasing this space is garbage collection.
Garbage collection solves many memory management problems, but not all of them. You can create objects and reference them until there is no more memory available. Garbage collection itself is also a complex task that takes time and resources.
The algorithm used to organize memory, allocate, and free memory is implemented by the garbage collector and is not visible to programmers. The allocation of memory comes from a large pool of memory called the heap.
The time required for garbage collection is determined by the appropriate garbage collector. Typically, garbage collection occurs when the entire heap or part of a heap has been allocated or the heap usage reaches a threshold.
The process of completing a memory allocation request is to find a suitable, unused memory in the heap, which is a very difficult process. The main problem with most dynamic memory allocation algorithms is how to keep memory allocations and recoveries efficient while avoiding memory fragmentation.
Characteristics of the desired garbage collector
A garbage collector must be secure and comprehensive. In other words, the surviving object or data must not be released incorrectly, and the garbage object or data should not be recycled after a few garbage collection cycles.
There is also the expectation that the garbage collector will run efficiently, without causing too long a pause (the application will not run during this time period). For most computer systems, however, there is a balance between time, space, and recovery frequency. For example, if the size of the heap is small, the recycling process will be quick but the heap can easily be filled up, which requires more frequent recycling and a higher rate of recycling. Conversely, a large heap will take longer to fill, so the frequency of recycling is relatively low, but the process of recycling also takes more time.
Another feature of expected garbage collection is the limitation of memory fragmentation. When the memory occupied by the garbage object is freed, these freed memory spaces may be scattered across multiple discontinuous regions of the entire heap, which can result in the allocation of a large object's memory allocation request with no corresponding size of contiguous free memory on the heap. A method of eliminating fragmentation, called memory tightening, will be discussed in the design of the following garbage collector.
Scalability is also important. Memory allocation should not be a bottleneck for multithreaded program scalability on multiprocessor systems, and garbage collection should not be such a bottleneck at the same time (translation is a bit difficult to understand, combined with the subsequent multi-threaded garbage collection, I think the expression means that the garbage collector should support multi-threaded memory allocation and recycling so that you can take advantage of multiprocessor features, allocate and recycle more efficiently, or allow application and memory allocation, garbage collection concurrency to be performed under conditions that are likely to be allowed, and refer to the following "Introduction to design Choices" below.
The choice of design
There are a number of choices to make when designing and selecting the garbage collector's recovery algorithm:
Serial VS Parallel
Serial garbage collection (which can also be a single-threaded garbage collection). For example, even in multi-processor cases, there is only one processor that is used for garbage collection. When parallel garbage collection (multi-threaded garbage collection) is used, the garbage collection task is divided into multiple subtasks, which can be executed concurrently on multiple processors. This concurrent operation can make garbage collection faster, but it adds some additional complexity and potential memory fragmentation.
Concurrent VS Stop Application (Stop-the-world) two garbage collection mechanisms
When the garbage collection mechanism for stopping application mode is turned on, the execution of the application is completely blocked during garbage collection. However, the concurrent garbage collection mechanism, one or more garbage collection tasks or threads can be executed in parallel, i.e., concurrently with the application. The typical concurrency mechanism of garbage collection is that most of the work is done concurrently, but occasionally it has to stop the application for a little while to do the work that must be done. The garbage collection method that stops the application mechanism is simpler than the garbage collection mechanism of the concurrency mechanism, because the memory in the heap during garbage collection is not modified by the application, and recycling is more convenient and thorough. What's bad about it is that it causes the application to pause for a while, giving the user a bad experience. If garbage collection and application parallelism are appropriate, this can shorten application downtime but increase the complexity of recycling because memory usage in the heap may be modified by the application during the recycling process. This increases the time and space overhead of the concurrent collector to affect performance and requires a larger heap space.
Memory Crunch vs non-memory crunch vs Memory copy
After the garbage collector scans which objects in the heap are alive and which objects are garbage objects, it is possible to tighten the memory, move all the surviving objects together, and reclaim all the remaining memory space. After the memory crunch, a simple pointer movement can be used to make the next memory allocation faster and easier. Conversely, non-memory-compressed garbage collection will only release the space occupied by the garbage object, and will not move all the surviving objects together to get a contiguous large memory space. The advantage of this is that garbage collection is fast, but the downside is potential memory fragmentation. In general, allocating a chunk of memory from a non-condensed heap is more expensive than allocating a chunk of memory to a condensed heap. In this case, in order to satisfy the next memory allocation request, you may want to search for all the free memory space on the entire heap. The third type of memory replication is garbage collection, which means that the memory area (not the entire heap, in which case the heap may be divided into chunks of equal size), the surviving objects are all copied to another free memory area, so the advantage is that the memory area that was just reclaimed is considered free, This allows for the next faster memory allocation, but the downside is that it takes extra time to replicate and extra space to hold the copied objects.
Performance metrics
The criteria for multiple metrics are evaluated by use cases for the performance of the garbage collector, including:
- Throughput-a percentage of the total time that is not spent on garbage collection and should be counted after a long period of running.
- Garbage collection consumption-in contrast to throughput, is the amount of time spent on garbage collection and the percentage of total time.
- Pause time-the time at which the application stops executing during garbage collection.
- The frequency of garbage collection-the number of garbage collections in a certain amount of time, and the application execution.
- Resource metering (footprint footprint)-a measure of a size, such as the size of a heap.
- Agility-refers to the size of the time when an object becomes a garbage object to which the object is reclaimed.
An interactive program may require a smaller application's pause time, but the time-to-go execution is more important for a non-interactive program. A real-time application may need to have a high value in both areas of garbage collection pause application time and the time ratio spent on garbage collection. Applications on personal computers or embedded systems are most concerned with a smaller resource metering (that is, the use of virtual machine resources).
Generation of garbage collection
In virtual machines that use generational garbage collection, memory is divided into different generations, meaning that the age of objects stored in different generations of memory (the time the object survives) is different. For example, the most widely used is to divide memory into two generations: one for storing young objects (which are not long), and the other for storing old (long-time) objects.
Different generations can use different garbage collection algorithms for garbage collection, and each algorithm will be optimized according to the characteristics of different generations. Generational garbage collection utilizes a feature known as the "weak generational hypothesis" (weak generational hypothesis), depending on the language in which the application is written, including the Java programming language:
- Most objects that are allocated memory are not referenced (survived) too long, meaning that these objects are referenced for a short time, so they are quickly called garbage-collected objects.
- There are references to some old objects to young objects.
Young generation's garbage collection is relatively high and efficient and fast, because young generation's memory space is often small and seems to contain many objects that are no longer referenced.
Objects in young generation are copied to the old generation if they are still referenced after a certain number of garbage collections, as shown in Figure 1. Old generation is generally more memory space than young generation, and the space occupied is relatively slow. So the old generation garbage collection is relatively low, but the garbage collection process takes up a relatively large amount of time.
Because young generation garbage collection is relatively high frequency, young generation need to choose a faster garbage collection algorithm. In contrast, old generation needs to choose a more efficient garbage collection algorithm, because old generation occupies the bulk of the heap space, so the garbage collection algorithm must work well in low garbage densities (garbage-occupied space).
4. Garbage collector for hotspot JVM in j2se5.0
Since the release of j2se5.0 Update 6, the hotspot virtual machine contains four garbage collector. All garbage collector is based on generational. This section will describe the different generations and the types of their recyclers, and then discuss why objects are often allocated quickly and efficiently, and then describe each garbage collector in detail.
The generational of the hotspot
The memory in the hotspot is divided into three generations: young generation, old generation, permanent generation. Most of the objects were initially assigned to young generation.
Try to finish the translation this week!
Memory management in the Java HotSpot Virtual machine (Chinese translation)