History of garbage collection algorithms

Source: Internet
Author: User

Pioneering age
DomesticProgramMost of the members felt the charm of the garbage collection technology for the first time in the Java language, so many people regard Java and garbage collection as an inseparable whole. But in fact, the garbage collection technology has developed and matured more than 30 years before the advent of the Java language. What Java language does is to bring this magic technology to the masses of programmers.

If we need to find a twin brother for the garbage collection technology, The LISP Language is a well-deserved candidate. The LISP Language, which was born around 1960 in MIT, was the first language highly dependent on the dynamic memory allocation technology: almost all data in LISP appeared in the form of "tables, the space occupied by the "table" is dynamically allocated in the heap. The inherent dynamic memory management feature of the lisp language requires the designers of the lisp language to solve the issue of automatic release of each memory block in the heap (otherwise, lisp programmers will inevitably be overwhelmed by countless free or delete statements in the Program), which directly leads to the birth and development of the garbage collection technology, A teacher once told us That lisp is the language that contributes the most to modern software development technology. I didn't agree with this statement at the time: it was filled with parentheses and it looked like a labyrinth. How can a lisp language be more great than C or Pascal? But now, when I know that garbage collection technology, data structure technology, artificial intelligence technology, parallel processing technology, virtual machine technology, metadata technology, and many of the techniques familiar to programmers all originate from the lisp language, I especially want to apologize to the teacher and take back my childish thoughts.

Knowing the close relationship between the lisp language and garbage collection, we can easily understand why J. mcCarthy and M. l. minsky is also an important character in the history of lisp language development. J. McCarthy is the father of Lisp. He invented the lisp language and completely described the garbage collection for the first time.AlgorithmAnd implementation method; M. l. minsky became the founder of several mainstream garbage collection algorithms today in the process of developing the lisp language-similar to the experiences of many technical masters at that time, J. mcCarthy and M. l. minsky has made remarkable achievements in many different technical fields. Perhaps, in the pioneering age of Software Development History in the 1960 s, researchers with agile thinking and strong will are more likely to become the omnipotent tough West man.

Before learning about the origins of the garbage collection algorithm, it is necessary to review the main methods of memory allocation. We know that most mainstream languages or runtime environments support three basic memory allocation methods:

1. Static allocation: the distribution of static variables and global variables. We can regard static memory allocation as a durable home furniture. Generally, they do not need to be released or recycled, because no one will throw the wardrobe out of the window as garbage every day.

Ii. Automatic Allocation: The method for allocating memory for local variables in the stack. The memory in the stack can followCodeThe outbound stack operation is automatically released when the block exits. This is similar to a visitor who visits the house. Every night, we have to go back to every house. Except for some who do not know the current business, we generally do not need to bundle the guests in a garbage bag to sweep out the house.

3. Dynamic Allocation: dynamically allocates memory space in the heap to store data. The memory block in the heap is like the napkin we use everyday. If we use it, we have to drop it into the garbage bin. Otherwise, the house will be full of space. A lazy like me dream of having a home robot to clean the house with him. In software development, if you are too reluctant to release the memory, you also need a similar robot-a garbage collector implemented by a specific algorithm.

That is to say, all the garbage collection algorithms mentioned below are the algorithms used to collect and clean up obsolete "Napkins" during the program running. Their operation objects are neither static variables nor local variables, instead, all allocated memory blocks in the heap.

Reference Counting Algorithm
Before 1960, when people designed a garbage collection mechanism for The LISP Language in the embryo, the first algorithm that came to mind was to reference the counting algorithm. Taking napkin as an example, the principle of this algorithm can be roughly described:

At lunch, I took out a napkin from my napkin bag and planned to draw a blueprint for the system architecture. According to the requirements of the napkin usage statute reference counting edition, before drawing a picture, I must first write the counting value 1 in the corner of the napkin to indicate that I am using this napkin. At this time, if you want to see the blueprint I have drawn, you should add 1 to the Count value on the napkin and change it to 2, this indicates that two people are currently using this napkin at the same time (of course, I won't allow you to use this napkin to wipe your nose ). After reading the paper, you must reduce the Count value by 1, indicating that your use of the napkin has ended. Similarly, when I write all the contents on my napkin to my notebook, I will consciously reduce the count on my napkin by 1. In this case, the count on the napkin should be 0, and it will be collected by the garbage collector-assuming it is a robot responsible for cleaning-and picked it up and threw it into the garbage bin, because the only mission of the garbage collector is to find and clean all the napkins whose count is 0.

The advantages and defects of the reference counting algorithm are equally obvious. This algorithm is fast in performing garbage collection tasks, but it puts forward additional requirements for every memory allocation and pointer operation in the Program (increase or decrease the reference count of memory blocks ). More importantly, the reference counting algorithm cannot correctly release the memory blocks referenced by the loop. For this, D. Hillis has a funny and incisive discussion:

One day, a student came to moon and said, "I know how to design a better garbage collector. We must record the number of pointers pointing to each node ." Moon patiently told the student the following story: "One day, a student came to moon and said, 'I know how to design a better garbage collector ...... '"

D. The story of Hillis is similar to the story we used to say when we were a child: "There was a mountain, a temple, and an old monk in the temple. This indicates that the reference counting algorithm alone is not enough to solve all the problems in garbage collection. Because of this, the reference counting algorithm is often excluded from the narrow garbage collection algorithm by researchers. Of course, as the simplest and most intuitive solution, the reference counting algorithm itself has irreplaceable superiority. Before and after the 1980 s, D. p. friedman, D. s. wise, H. g. baker and others have made several improvements to the reference counting algorithm, which makes the reference counting algorithm and its variants (such as the delay Counting Algorithm) in a simple environment, or in some modern garbage collection systems that integrate multiple algorithms, you can still show your skills.

Mark-Sweep Algorithm
The first practical and perfect garbage collection algorithm was proposed by J. McCarthy and others in 1960 and successfully applied to the algorithm of tag-clearing in lisp. Taking napkin as an example, the execution process of the tag-Purge algorithm is as follows:

During lunch, all people in the restaurant use napkins as needed. When a garbage collection robot wants to collect used napkin, it will stop all diners and then ask each person in the restaurant in sequence: "Are you using napkin? Which napkin are you using ?" The robot marks the napkin that people are using based on their answers. After the inquiry, the robot finds all the paper napkins (apparently all used paper napkins) scattered on the dining table and threw them into the garbage bin.

As its name implies, the execution process of the tag-clearing algorithm is divided into two stages: "tag" and "clear. This step-by-step execution laid the ideological foundation for modern garbage collection algorithms. Unlike the reference counting algorithm, the Mark-clearing algorithm does not need to monitor every memory allocation and pointer operation in the runtime environment, as long as you track the direction of each pointer variable in the "tag" phase, the garbage collector implemented using similar ideas is often referred to as the tracking collector)

Along with the success of the lisp language, the tag-clearing algorithm also shines in most early lisp runtime environments. Although the tag-clearing algorithm of the original version still has many defects such as low efficiency (TAG and clearing are two very time-consuming processes, we can see that almost all modern garbage collection algorithms are the continuation of the Mark-clearing idea. mcCarthy and others have contributed as much as they did in the lisp language.

Copy Algorithm
To solve the defects of the tag-clearing algorithm in the efficiency of garbage collection, M. l. minsky published a famous paper in 1963, "a lisp Garbage Collector algorithm using serial secondary storage, which uses a dual-storage zone )". The algorithm described by M. L. Minsky in this paper is called a replication algorithm. It is also successfully introduced to an implementation version of Lisp by M. L. Minsky.

The replication algorithm splits the heap space into two parts and uses simple replication operations to complete garbage collection. This idea is quite interesting. Using the napkin metaphor, we can understand the copy algorithm of M. L. Minsky as follows:

The restaurant is divided into two identical parts by garbage collection robots: the southern district and the Northern District. During lunch, everyone eats in the southern district first (because of limited space, the number of diners will naturally be halved) and napkin can be used at will. When a garbage collection robot deems it necessary to recycle used paper napkins, it will require all diners to move from the Southern district to the Northern District as quickly as possible, while carrying the napkin they are using with them. After everyone is moved to the northern district, the garbage collection robot simply threw all the scattered napkins in the Southern District into the garbage bin to complete the task. The next garbage collection process is similar. The only difference is that people's transfer direction has changed from North District to South District. In this way, each garbage collection process only needs to be transferred (that is, replicated) once, and the garbage collection speed is unparalleled-of course, it is hard for diners to travel between the North and South areas, garbage collection robots will never show mercy.

The invention of M. L. Minsky is definitely a whimsy. The idea of partitioning and replication not only greatly improves the efficiency of garbage collection, in addition, the original complex memory allocation algorithms have been simplified and summarized as never before (since each memory collection is a collection of the entire half-zone, during memory allocation, you don't need to consider complicated situations such as memory fragments. You just need to move the heap top pointer and allocate the memory in order.) This is a miracle! However, the emergence of any miracle has a certain price. In the garbage collection technology, the cost of copying algorithms to improve efficiency is to artificially reduce the available memory by half. To be honest, this price is too high.

Regardless of the advantages and disadvantages, the replication algorithm has been successfully compared with the tag-clearing Algorithm in practice. Except for m. l. minsky's work in the lisp language, from the end of the 1960 s to the beginning of the 1970 s, R. r. fenichel and J. c. yochelson and others have successively improved the replication algorithm in different implementations of the lisp language, S. arnborg successfully applied the replication algorithm to the simula language.

So far, the three traditional algorithms of the garbage collection technology-the reference counting algorithm, the tag-clearing algorithm, and the replication algorithm-have been released around 1960. Each of these algorithms has its own strengths, both have fatal defects. Since the late 1960 s, researchers have gradually turned to improving or integrating these three traditional algorithms to foster strengths and circumvent weaknesses, adapt to the higher requirements of the programming language and runtime environment for the efficiency and timeliness of garbage collection.

Since the 1970 s, with the deepening of scientific research and application practices, people gradually realized that an ideal garbage collector should not cause application suspension during operation, instead of occupying a large amount of memory and CPU resources, the three traditional garbage collection algorithms cannot meet these requirements. People must propose updated algorithms or ideas to solve many problems encountered in practice. At that time, the investigator's goals included:

First, improve the efficiency of garbage collection. The garbage collector using the tag-clearing algorithm consumes a considerable amount of CPU resources during work. The early lisp Runtime Environment took 40% of the total system runtime garbage collection time! -The low garbage collection efficiency has created a reputation for the speed at which the LISP Language executes. Today, many people mistakenly think that all LISP programs are incredibly slow.

Second, reduce the memory usage during garbage collection. This problem mainly occurs in the replication algorithm. Although the replication algorithm achieves a qualitative breakthrough in efficiency, the cost of sacrificing half of the memory space is still huge. In the early days of computer development, during the days when the Memory price was calculated in KB, half of the customer's memory space was wasted, simply by extortion or hijacking in disguise.

Third, look for real-time garbage collection algorithms. Regardless of the execution efficiency, the three traditional garbage collection algorithms must interrupt the current work of the program when performing the garbage collection task. This delay caused by garbage collection is unacceptable by many programs, especially those that execute key tasks. How to improve traditional algorithms to achieve a real-time Garbage Collector for the current process that is quietly executed in the background without affecting-or at least seemingly unaffected, this is obviously a more challenging task.

The determination of researchers to explore unknown fields and the progress of research work are equally surprising: in just over a decade from the 1970 s to the 1980 s, A large number of new computing methods and new ideas that excel in practical systems stand out. It is precisely because of these increasingly mature garbage collection algorithms that we can use in Java or.. NET provides the runtime environment to allocate memory blocks as you like without worrying about the risk of Space release.

Mark-compact Algorithm
A tag-sorting algorithm is an organic combination of a tag-clearing algorithm and a replication algorithm. Combining the advantages of the tag-clearing Algorithm in memory usage and the execution efficiency of the replication algorithm, this is what everyone wants to see. However, the integration of the two garbage collection algorithms is not as simple as 1 plus 1 equals 2. We must introduce some new ideas. Around 1970, G. l. steele, C. j. cheney and D. s. wise and other researchers have gradually found the correct direction, and the outline of the algorithm is gradually clearer:

In our familiar restaurants, this time, garbage collection robots no longer divide the restaurants into two North-South areas. When you need to execute a garbage collection task, the robot first performs the first step of the tag-clear algorithm, and then marks all the napkins in use, the robot ordered all diners to bring tagged napkins to the south of the restaurant, and threw unlabeled disposable napkins to the north of the restaurant. In this way, the robot only needs to stand in the north of the restaurant, embrace the garbage bins, and greet the unwanted napkin.

Experiments show that the overall execution efficiency of the tag-sorting algorithm is higher than that of the tag-clearing algorithm, and it does not need to sacrifice half of the storage space as the replication algorithm does. This is obviously an ideal result. In many modern garbage collectors, Mark-sorting algorithms or their improved versions are used.

Incremental collection Algorithm
The Study of Real-time garbage collection algorithms directly led to the birth of incremental collection algorithms.

Initially, the idea of real-time garbage collection was as follows: to achieve real-time garbage collection, a multi-process runtime environment can be designed, such as using a process to perform garbage collection, another process executes the program code. In this way, the garbage collection work seems to be completed quietly in the background, without interrupting the running of program code.

In the example of napkin collection, this idea can be understood as: Garbage collection robots search for obsolete napkins while dining and threw them into the garbage bins. This seemingly simple idea will encounter a conflict between processes during design and implementation. For example, if the garbage collection process involves two stages: tag and clear, the results that the garbage collector has worked hard to mark in the first stage are likely to be completely modified by the memory operation code in the other process, so that the second stage of work cannot be carried out.

M. l. minsky and D. e. knuth made an early research on the technical difficulties in the real-time garbage collection process. l. steele published a paper entitled "multiprocessing compactifying garbage collection" in 1975, this paper describes a Real-Time garbage collection algorithm called "Minsky-knuth-Steele algorithm. E. W. Dijkstra, L. Lamport, R. R. fenichel and J. C. yochelson have also made their respective contributions in this field. 1978, H. g. baker published the article "list processing in real time on a serial computer", which describes the incremental collection algorithm used for garbage collection in a multi-process environment.

The basis of the incremental collection algorithm is still the traditional tag-clearing and Replication Algorithms. By properly handling inter-process conflicts, the incremental collection algorithm allows the garbage collection process to complete marking, cleaning, or copying in a phased manner. It is quite tedious to analyze the internal mechanism of Various incremental collection algorithms in detail. Here, readers only need to understand: H. g. the efforts of Baker and others have turned the Real-Time garbage collection dream into reality, and we no longer have to worry about interrupting the program running.

Generational collecting Algorithm
Like most software development technologies, statistical principles can always play a powerful catalyst in technological development. Before and after 1980, technical staff who are good at using statistical analysis knowledge in the study found that the survival cycle of most memory blocks is relatively short, the garbage collector should focus more on checking and clearing newly allocated memory blocks. Examples of the benefits of this discovery for garbage collection technology can be summarized as follows:

If the garbage collection robot is smart enough, he knows the habit of using napkins when dining in the restaurant. For example, some people prefer to use a napkin before and after meals, some people like to stick a napkin from start to end, and some people use a napkin every time they sneeze-a robot can develop a better napkin recycling plan, it's not long before people just threw away napkins and picked up the rubbish. This statistical-based approach can of course multiply the cleanliness of the restaurant.

D. E. knuth, T. Knight, G. Sussman, R. Stallman and others made the earliest research on the classification and processing of memory garbage. In 1983, H. Lieberman and C. Hewitt published a paper entitled "A real-time Garbage Collector Based on the lifetimes of objects. This famous paper marks the birth of the generational collection algorithm. After that. g. baker, R. l. hudson, J. e. b. with the joint efforts of moss and others, the generational collection algorithm has gradually become the mainstream technology in the garbage collection field.

The generational collection algorithm generally divides memory blocks in the heap into two types: Old and young. The garbage collector uses different collection algorithms or collection policies to process these two types of memory blocks separately, and especially spends the main work time processing the young memory blocks. The generational collection algorithm enables the Garbage Collector to work more effectively with limited resources. This improvement in efficiency has been proven best in today's Java virtual machines.

Application Wave
LISP is the first beneficiary of the garbage collection technology, but it is clearly not the last one. After the lisp language, many traditional, modern, and post-modern languages have put the garbage collection technology into their arms. Let's take a few examples: the simula language was born in 1964, the Smalltalk language in 1969, the PROLOG Language in 1970, the ml language in 1973, the scheme language in 1975, the Modula-3 language in 1983, eifel language in 1986, Haskell language in 1987 ...... They all use the automatic garbage collection technology. Of course, the garbage collection algorithms used in each language may be different. Most languages and runtime environments even use multiple garbage collection algorithms. However, these examples all show that the spam technology was not an extremely high school technology since its birth.

The garbage collection technology can play a huge role in the C and C ++ languages we are familiar. As we already know at school, C and C ++ languages do not provide garbage collection mechanisms, however, this does not prevent us from using function libraries or class libraries with the garbage collection function in our programs. For example, as early as 1988, h.j. Boehm and a.j. Demers successfully implemented a function library using the conservative GC algorithmic (see http://www.hpl.hp.com/personal/Hans_Boehm/gc ). We can use this function library in C or C ++ to complete the automatic garbage collection function. If necessary, even the traditional C/C ++ code can work together with the C/C ++ code using the automatic garbage collection function in a program.

The Java language, which was born in 1995, overnight turned the garbage collection technology into one of the most popular technologies in the software development field. From a certain point of view, it is difficult to tell whether Java has benefited from garbage collection or whether the garbage collection technology itself is famous by the popularity of Java. It is worth noting that different versions of Java virtual machines use different garbage collection mechanisms, and Java virtual machines have actually gone through a process from simplicity to complexity. In Java Virtual Machine version 1.4.1, garbage collection algorithms that people can experience include generational collection, replication collection, incremental collection, tag-sorting, and parallel replication (parallel copying) parallel clearing (parallel scavenging), concurrent (concurrent) Collection, and so on, Java program running speed is greatly improved thanks to the development and improvement of garbage collection technology.

Although many application platforms and operating systems including the garbage collection technology have appeared in history, Microsoft. NET is the first language runtime environment that is truly practical and contains the garbage collection mechanism. In fact ,.. NET platform, including C #, Visual Basic.. net, Visual C ++. net, J #, and so on can be used in almost identical ways.. NET platform. We seem to be able to assert ,. net is a major change of the garbage collection technology in the application field. It makes the garbage collection technology from a pure technology to an internal culture in the application environment and even the operating system. The influence of such changes on future software development technologies may far exceed the commercial value of the. NET platform.

General Trend
Today, people who are dedicated to the research of garbage collection technology are still making unremitting efforts, their research interests include garbage collection in Distributed Systems, garbage collection in complex transaction environments, and garbage collection in specific systems such as databases.

However, many programmers are still dismissive about the garbage collection technology. They prefer to trust the free or delete commands compiled by themselves on a line-by-line basis, they do not want to assign the heavy lifting of garbage collection to the garbage collectors that seem stupid and stupid to them.

I personally think that the popularization of garbage collection technology is the trend of the times, and there is no doubt that life will get better and better. Today, programmers may be discouraged by the garbage collector because it needs to occupy certain CPU resources, but more than 20 years ago, programmers insisted on using machine language to write programs because of the slow speed of advanced languages! In today's ever-changing hardware speed, do we have to pay attention to the time loss and hold on, or do we have to unswervingly stand on the side of the garbage collection agent in the Code and running environment?

Turn: http://blog.csdn.net/zhoufoxcn/article/details/1365786

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.