Java Theory and Practice: A Brief History of garbage collection

Last Update:2017-02-27 Source: Internet

Author: User

Tags garbage collection time limit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Java language may be the most widely used programming language that relies on garbage collection, but it is not the first. Garbage collection has become an integral part of many programming languages, including Lisp, Smalltalk, Eiffel, Haskell, ML, scheme, and Modula-3, and has been in use since the early the 1960s. In this article in Java theory and Practice, Brian Goetz describes the most commonly used techniques for garbage collection.

The benefits of garbage collection are indisputable-improved reliability, separation of memory management from class interface design, and reduced time for developers to track memory management errors. The famous dangling pointers and memory leaks are never going to happen in Java programs (Java programs may have some form of memory leak, more precisely unintentional object retention, but this is a different problem). However, garbage collection is not without cost-including impact on performance, pause, configuration complexity, and an indeterminate end (nondeterministic finalization).

An ideal garbage collection implementation should be completely invisible-no garbage collection paused, no CPU time loss due to garbage collection, garbage collector does not negatively interact with virtual memory or cache, and the heap does not need to be larger than the application's resident space (that is, heap occupancy). Of course, there is no perfect garbage collector, but the garbage collector has improved a lot in the past ten years.

Options and Choices

The 1.3 JDK includes three different garbage collection policies, 1.4.1 JDK includes six garbage collection policies and more than 12 command-line options for configuring and optimizing garbage collection. What is the difference between them? Why do we need so many options?

Different garbage collection implementations use different policies to identify and retrieve unreachable objects, interacting with user programs and dispatchers in different ways. Different types of applications have different requirements for garbage collection-real-time applications will have a short and limited duration for collecting pauses, while enterprise applications may allow longer and less predictable pauses for higher throughput.

How does garbage collection work?

There are several basic strategies for garbage collection: reference count, Mark-Purge, Tag-organize (mark-compact), and replication. In addition, some algorithms can do their work incrementally (without having to collect the entire heap at once, making the collection pause time shorter), and some algorithms can be run while the user program is running (concurrent collection). Other algorithms must be collected at a time when the user program is paused (that is, the so-called Stop-the-world collector). Finally, there are hybrid collectors, such as the generational collectors used by 1.2 and later JDK, which use different collection algorithms for different areas of the heap.

When evaluating a garbage collection algorithm, we may want to consider all of the following criteria:

Pause time. Does the collector stop all work for garbage collection? How long does it take to stop? Is there a time limit for pausing?

The predictability of the pause. is the garbage collection pause planned to occur at a convenient time for the user program rather than the garbage collector?

CPU footprint. What percentage of the total available CPU time is spent on garbage collection?

Memory size. Many garbage collection algorithms need to divide the heap into separate memory spaces, some of which are inaccessible to user programs at some point. This means that the actual size of the heap may be several times larger than the maximum heap hosting space of the user program.

Virtual memory interaction. On systems with limited physical memory, a complete garbage collection may incorrectly place a very resident page in memory for inspection during garbage collection. Because of the high cost of page faults, it is necessary for the garbage collector to properly manage the referenced culture (locality).

Cache interaction. Even if the entire heap can be placed on a system in main memory-virtually all Java applications can do this, garbage collection often has the effect of flushing data that is used by the user program out of the cache, affecting the performance of the user program.

The effect on the program culture. While some people think that the garbage collector's job is simply to reclaim inaccessible memory, others believe that the garbage collector should also try to improve the referral culture of the user program. The collation collector and the replication collector rearrange the objects during the collection process, which may improve the culture.

compiler and Run-time effects. Some garbage collection algorithms require an important conjunction of the compiler or run-time environment, such as updating the reference count when a pointer assignment is made. This increases the compiler's work because it must generate these bookkeeping instructions while increasing the overhead of the run-time environment because it must perform these additional instructions. What is the impact of these requirements on performance? Does it interfere with compile-time optimizations?

No matter what algorithm you choose, the development of hardware and software makes garbage collection more practical. Empirical studies of the 20th century 70 and 80 show that for large Lisp programs, garbage collection consumes 25% to 40% of the runtime. Garbage collection can not be completely invisible, which certainly has a long way to go.

Basic algorithm

The problem that all garbage collection algorithms face is the same-find blocks of memory that are allocated by the allocator but not accessible to the user program. What do you mean, not reachable? A memory block can be accessed in one of two ways-either the user program has a reference to the memory block in the root (root), or a reference to the block in another reachable block. In a Java program, the root is a reference to an object contained in a static variable or in a local variable on an active stack frame. A reachable set of objects is a transitive closure that points to the root set under the relationship.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More