Java Theory and Practice: A Brief History of garbage collection

Source: Internet
Author: User

The advantage of garbage collection is indisputable-improved reliability, separated memory management from class interface design, and reduced the developer's time to track memory management errors. The well-known hanging pointer and Memory leakage won't happen again in Java programs (Java programs may have some form of Memory leakage, which is more accurate to the unintentional object retention, but this is a different problem ). However, garbage collection is not costly-including impact on performance, suspension, configuration complexity, and uncertain termination (Nondeterministic finalization ).

An ideal garbage collection implementation should be completely invisible-no garbage collection pause, no CPU time loss due to garbage collection, and no negative impact on the garbage collector and virtual memory or cache interaction, and the heap does not need to be larger than the application'sResident Space(Heap usage ). Of course, there is no perfect garbage collector, but it has been greatly improved over the past decade.

Options and options

1.3 JDK includes three different garbage collection policies. 1.4.1 JDK includes six garbage collection policies and more than 12 command line options for configuring and optimizing garbage collection. What are their differences? Why are so many options required?

Different garbage collection implementations use different policies to identify and reclaim inaccessible objects. They interact with user programs and schedulers in different ways. Different types of applications have different requirements for garbage collection-real-time applications will require a short and limited collection pause duration, enterprise applications may allow longer periods of time and lower predictability to suspend for higher throughput.

Back to Top

How does garbage collection work?

There are several basic policies for garbage collection: reference count, Mark-clear, Mark-compact, and copy. In addition, some algorithms canIncrementalMethod to complete their work (you do not need to collect the entire heap at a time, so that the collection pause time is shorter), some algorithms can run when the user program runs (ConcurrencyCollection ). Other algorithms must be collected at one time when the user program is suspended (that is, the so-calledStop-the-worldCollector ). Finally, there is a hybrid collector, such as the generational collector used by JDK 1.2 and later versions, which uses different collection algorithms for different areas of the heap.

When evaluating garbage collection algorithms, we may consider all of the following criteria:

  • Pause Time.Does the collector stop all work for garbage collection? How long does it take to stop? Does pause have a time limit?
  • The predictability of the pause.Is the garbage collection pause scheduled to take place at a convenient time in the user program instead of the garbage collector?
  • CPU usage.What is the percentage of the total available CPU time used in garbage collection?
  • Memory size.Many garbage collection algorithms need to split the heap into independent Memory Spaces, some of which are inaccessible to user programs at some time points. This means that the actual size of the heap may be several times larger than the maximum heap resident space of the user program.
  • Virtual Memory interaction.In a system with limited physical memory, a complete garbage collection process may mistakenly place the abnormal page in the memory for inspection. Because the cost of page errors is high, it is necessary for the garbage collector to correctly manage the referenced region (locality.
  • Cache interaction.Even if the entire heap can be placed on the system in the main memory-in fact, almost all Java applications can do this, garbage collection often overwrites the data used by the user program to the cache, this affects the performance of your programs.
  • The effect on the program culture.Although some people think that the work of the garbage collector is only to reclaim the inaccessibility of memory, others think that the garbage collector should also try to improve the reference regions of the user program. Sorting collectors and copying collectors reschedule objects during collection, which may improve the culture.
  • Compiler and runtime impact.Some garbage collection algorithms require an important combination of compilers or runtime environments, such as updating the reference count when pointers are allocated. This increases the compiler's work because it must generate these bookkeeping commands and increase the runtime environment overhead because it must execute these additional commands. What are the impact of these requirements on performance? Does it interfere with compilation optimization?

Regardless of the algorithm, the development of hardware and software makes garbage collection more practical. Empirical studies in the 25% S and 40% s show that for large LISP programs, garbage collection consumes to of runtime time. Garbage collection cannot be completely invisible, so there must be a long way to go.

Back to Top

Basic Algorithms

All garbage collection algorithms face the same problem-finding out memory blocks allocated by the distributor but not accessible to the user program. What does imaccessibility mean? You can access the memory block in either of the two ways-or the user program inRoot)There is a reference to this memory block, or there is a reference to this block in another reachable block. In Java programs, the root is a reference to the objects contained in the local variables in static variables or active stack frameworks. The reachable object set points to the passing closure of the root set under the link.

Reference count

The most intuitive garbage collection policy is reference count. Reference counting is very simple, but it requires an important combination of the compiler and adds a value assignment.Mutator)(This term is for user programs, from the perspective of the Garbage Collector ). Each object has an associated reference count-the number of active references to the object. If the reference count of an object is zero, it is junk (the user program cannot reach it) and can be recycled. Each time a pointer reference is modified (for example, a value assignment statement is used), or when the reference is out of range, the compiler must generate code to update the reference count of the referenced object. If the reference count of an object changes to zero, the block can be immediately withdrawn at runtime (and the reference count of all blocks referenced by recycled blocks is reduced ), or put it in the delayed collection queue.

Many ansi c ++ library classes, suchstringThe reference count is used to provide the garbage collection feature. By reloading the value assignment operator and taking advantage of the deterministic end provided by the C ++ scope, the C ++ program canstringClass is used as garbage collected. The reference count is very simple and suitable for incremental collection. The collection process generally gets a good reference area, but it is rarely used in the production Garbage Collector for several reasons, for example, it cannot recycle inaccessible loop structures (several objects directly or indirectly referenced by each other, such as a list of loop links or a tree containing reverse pointers to the parent node ).

Tracking collector

Standard garbage collectors in JDK do not use reference counts. On the contrary, they all use some formTracing collector). The tracing collector stops all work (though this does not need to be done throughout the collection process) and starts tracking objects, starting from the root set along the reference trail until all reachable objects are checked. You can find the root in the program registry, the local variables in each thread stack (stack-based), and static variables.

Mark-clear collector

The most basic form of tracking collector proposed by John McCarthy, the inventor of Lisp in 1960 wasMark-clearCollector, which stops all work. The Collector accesses every active node from the root and Marks every node it accesses. After all references are passed, the collection is complete, and then the heap is cleared (that is, every object in the heap is checked ), all unlabeled objects are collected as garbage and the idle list is returned. Figure 1 shows the heap before garbage collection. The shadow blocks are garbage because user programs cannot reach them:

Figure 1. Accessible and inaccessible objects

Mark-clearing is easy to implement. It can easily recycle the loop structure and not add the burden on the compiler or value assignment function as the reference count. But it is also insufficient-the collection pause may be very long and the entire heap is accessible during the cleanup phase, this has a very negative impact on the performance of the virtual memory system that may have a Page Swap heap.

The biggest problem with Mark-clearing is that every active (that is, allocated) object, whether or not it is reachable, can be accessed in the clearing phase. Because many objects may become garbage, it means the collector spends a lot of energy checking and processing garbage. The mark-clear collector can also easily cause fragments in the heap, which can cause regional problems and cause allocation failure, even if it seems that there is enough free memory available.

Copy collector

In another form of tracking collector ――Copy collectorIn, the heap is divided into two equal half spaces, one of which contains active data, and the other is not used. When the active space is full, the program stops and the active objects are copied from the active space to the inactive space. The role of a space is converted, and the inactive space becomes a new active space.

The advantage of replication collection is to only access active objects, which means no spam object is checked, and no pages need to be exchanged to the memory or sent to the cache. The collection cycle time of the replication collector is determined by the number of active objects. However, the replication collector adds costs because it needs to copy data from one space to another and adjust all references to point to the new backup. In particular, long-lived objects must be copied back and forth each time they are collected.

Heap sorting

The replication collector has another advantage: the active object assembly is organized to the bottom of the heap. This not only improves the user program's reference region and eliminates heap fragments, but also greatly reduces the cost of object allocation-object allocation is changed to adding pointers on the top of the heap. You do not need to maintain the Free List or backup list, or use the best-performing or first-appropriate algorithm-allocation.NThe byte is added to the top pointer of the heap.NReturn the previous value, as shown in Listing 1:

List 1. Copy cheap memory allocation in the Collector

void *malloc(int n) {     if (heapTop - heapStart < n)        doGarbageCollection();    void *wasStart = heapStart;    heapStart += n;    return wasStart;}

 

Developers who implement complex memory management solutions for non-spam languages may be surprised by the cheap memory allocation in the replication collector-that is, the pointer addition is so simple. Previous JVM implementations did not use the replication collector-this may be one of the reasons why object allocation is expensive, developers are still subconsciously assuming that the allocation cost is similar to that of other languages (such as C), and in fact Java runtime may be much cheaper. No, but the allocation cost is reduced, and for objects that become spam before the next collection, the cost of unallocation is zero, because they are neither accessed nor copied.

Mark-organize collector

The replication algorithm has excellent performance, but one drawback is that it requires two times the memory required by the tag-clear collector.Mark-organizeThe algorithm combines tag-clearing and replication to avoid this problem, at the cost of increasing collection complexity. Like tag-clear, tag-clear is a two-phase process that accesses and tags each active object in the tag phase. Then, copy the marked object so that all the active objects are organized to the bottom of the heap. If a thorough sorting is performed during each collection, the resulting heap is similar to the result of the replication collector-with a clear line between the active and free parts of the heap, in this way, the allocation cost is equivalent to that of the replication collector. Long-lived objects tend to sink at the bottom of the heap so that they are not repeatedly copied as they are in the replication collector.

Back to Top

Which one is selected?

Which method does JDK use for garbage collection? In a sense, all methods are used. Early JDK used the single-thread mark-clear or mark-clear collector. JDK 1.2 and later uses a hybrid method calledGenerational collectionThe heap is divided into several parts based on the age of the object. Different generations are collected using different collection algorithms.

Generational collection proves to be very efficient, even though it requires more bookkeeping at runtime. In the next monthJava Theory and PracticeIn addition to introducing all other garbage collection options provided by JVM 1.4.1, we will also explore how generational collection works and how JVM uses it 1.4.1. In the next article, we will analyze the impact of garbage collection on performance, including revealing performance myths related to memory management.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.