Classic garbage collection Algorithm

Source: Internet
Author: User

This article focuses on garbage collectionAlgorithm. The garbage collection mechanism first appeared in the world's second-largest language, lisp, Jean E. sammet once said that one of the longest sharing of The LISP Language is a non-verbal feature, which represents the terminologies of the system's automatic memory processing approach-garbage collection (GC, garbage collection ). Next we will introduce several classic garbage collection algorithms. Although these algorithms appeared in and, they are still used by CLR, JVM, and other garbage collectors.

Reference Counting Algorithm

The reference counting algorithm calculates the number of pointers to an object. When a pointer points to itself, add 1 to the value. When a pointer points to itself, the Count value is reduced by 1. If the Count value is reduced to 0, there is no pointer to this object, so it can be safely destroyed. It can be intuitively expressed in the following figure:

 

The advantage of the counting algorithm is that the memory management overhead is distributed throughout the application.ProgramDuring running, it is very "smooth" and does not need to suspend the running of the Application for garbage collection. Another advantage of it is that the spatial reference locality is better, when the reference count of an object changes to 0, the system does not need to access the units on other pages in the heap, several garbage collection algorithms we will see later traverse all the Surviving units before collection, which may cause paging operations; finally, the counting algorithm is referenced to provide a method similar to stack allocation. Abandon means recycling. Several garbage collection algorithms we will see later will survive for a period of time after the object is discarded, will be recycled.

The reference counting algorithm has many advantages, but its disadvantages are also obvious. The first thing we can see is the time overhead. Every time an object is created or released, the reference count value must be calculated, which causes some additional overhead. The second is the space overhead, to maintain the number of referenced items for each object, you must allocate additional space to store the reference count value. The biggest drawback of the reference count algorithm is that it cannot process circular references, as shown in:

 

Here, the blue objects are neither reachable nor recoverable, because each other references each other and their respective count values are not 0. This situation is powerless to reference the counting algorithm, other garbage collection algorithms can handle ring references very well.

It is converted from an unknown reference.

The most famous use of counting algorithms is Microsoft's COM technology. The famous iunknown interface:

InterfaceIunknown {VirtualHresult_ StdcallQueryInterface (ConstIID & IID,Void** GMM) = 0;VirtualUlong_ StdcallAddref () = 0;VirtualUlong_ StdcallRelease () = 0 ;}

The addref and release are used to allow the component to manage its own lifecycle, while the client program only cares about the interface and does not need to care about the life cycle of the component. A simple example is as follows:

IntMain () {iunknown * Pi = createinstance (); ix * pix = NULL; hresult hR = pi-> QueryInterface (iid_ix ,(Void*) & Pix );If(Succeeded (HR) {pix-> dosomething (); pix-> release ();} pi-> release ();}

The above client program has called addref in createinstance, so you do not need to call it again, but call release after using the interface, so that the Count value maintained by the component will change. BelowCodeA simple example of addref and release implementation is provided:

Ulong_ StdcallAddref (){Return+ + M_cref;} ulong_ StdcallRelease (){If(-- M_cref = 0 ){Delete this;Return0 ;}ReturnM_cref ;}

InProgramming LanguageIn python, the reference counting algorithm is also used. When the reference counting value of an object is 0, the _ del _ function will be called. As for why Python should use the reference counting algorithm, an article I have readArticleBecause python is a scripting language, it often needs to interact with C/C ++, and the reference counting algorithm can avoid changing the location of an object in the memory, python also introduces the GC module to solve the ring reference problem. Therefore, the GC solution of python is to calculate and track the mixed references (the three algorithms to be discussed later) two types of garbage collection mechanisms.

Mark-clear Algorithm

The mark-sweep algorithm relies on a global traversal of all surviving objects to determine which objects can be recycled. The traversal process starts from the root and finds all reachable objects, in addition, other inaccessible objects are garbage objects that can be recycled. The entire process is divided into two phases: the tag phase to find all the surviving objects; the clear phase to clear all the spam objects.

Tag phase:

 

Clearing stage:

 

Compared with the reference counting algorithm, the Mark-clearing algorithm can naturally handle the circular referencing problem. In addition, the overhead of the reference counting value is always reduced when an object is created and destroyed. Its disadvantage is that the Mark-Purge algorithm is a "stop-start" algorithm. When the garbage collector is running, the application must be temporarily stopped, so how can we reduce the pause time of the tag-clearing algorithm, and the generational garbage collector is designed to reduce its pause time, which will be discussed later. In addition, the tag-clearing algorithm needs to traverse all the surviving objects in the tag phase, which may cause a certain amount of memory fragmentation.

Tag-merge algorithm

The tag-contraction algorithm is an algorithm generated to solve the memory fragmentation problem. Its entire process can be described as: marking all the surviving objects; shrinking the object graph by adjusting the location of the surviving object again; updating the pointer to the object to be moved.

Tag phase:

 

Clearing stage:

 

Merge phase:

 

The biggest difficulty of the label-compression algorithm is how to select the compression algorithm used. If the compression algorithm is not well selected, it will lead to great program performance problems, such as low cache hit rate. Generally, based on the location of the compressed object, the compression algorithm can be divided into the following three types:

1. Arbitrary: when moving objects, they do not consider their original order or whether there is a mutual reference relationship between them.
2. Linearity: place the original object and the object it points to in adjacent locations as much as possible to achieve better spatial locality.
3. Slide: slide the object to the end of the heap, and squeeze out the free unit between the surviving objects to maintain the original order of allocation.

Node copy Algorithm

The node copy algorithm divides the entire heap into two half-zones (from, to). The GC process is actually the process of copying a surviving object from one half-zone from one to another, in the next round of recovery, roles will be exchanged between the two half zones. After the movement ends, update the pointer reference of the object. Before GC starts:

 

After GC:

 

Because the node copy algorithm can be used to sort memory during the copy process, there will be no memory fragmentation issues, and there is no need to specifically compress the memory ., Its biggest drawback is that it requires double space.

Summary

This article describes four typical garbage collection algorithms. The last three algorithms are often called tracking garbage collection, because the counting algorithm can be used for smooth garbage collection without stopping, it often appears in some real-time systems, but it cannot solve the ring problem. However, the tracking-based garbage collection mechanism traverses or copies all the surviving objects in each garbage collection process, this is a very time-consuming task. A good solution is to partition the objects on the stack and use different garbage collection algorithms for objects in different regions, the generational garbage collector is one of them. Both Clr and JVM adopt the generational garbage collection mechanism, but they are somewhat different in processing, the following article details the differences between the two garbage collectors.

Classic garbage collection Algorithm

The implementation of the basic functions of the garbage collection program can be divided into two parts:
1) spam detection: identifies spam objects and other activity objects.
There are two basic ways to implement Spam Detection: reference counting method and tracing method.
2) garbage collection: reclaim the memory space occupied by the garbage objects and make them available for reuse.

1 reference counting method
In the reference counting method, spam detection and collection are performed at the same time as the user program. Therefore, the reference counting method has the advantage of stable overhead and does not need to occupy a large amount of system resources in non-synchronous periods, the user program will not be interrupted for a long time. therefore, this method can better meet the requirements of real-time systems.
The reference counting method has two major defects: 1) It is invalid for objects in the circular reference; 2) it is inefficient. The efficiency of the reference counting method is that the garbage collection program must regularly occupy system resources.

2 Mark clearing method
The main problem with the traditional mark clearing method is that with the garbage collection process, a large amount of memory fragments will be generated and need to be regularly implemented.
Memory reorganization. In addition, garbage detection and garbage collection must be performed on all memory objects, so the time efficiency is not high. To solve these two problems, the mark compression method and copy method are introduced.
3 mark Compression Method
The Mark compression method uses the same mechanism as the mark removal method in the spam detection phase to mark accessible objects and then transfer the active objects in sequence, it is connected to adjacent activity objects in the space. The final result is that all activity objects exist in the connected memory space, which also makes the reborn free space integrated, overcome the memory fragmentation problem of the Mark clearing method.
However, the tag compression method is more efficient than the tag clearing method. after the activity OBJECT tag is completed, the recycler processes all activity objects two to three times. first, you need to determine a new location for each active object in the memory, and then modify all the pointers to these objects before moving the object. therefore, garbage collection slows down when a large proportion of active objects are involved.
4. Stop and copy methods
The advantage of the "stop and copy" method is that it avoids heap fragments and thus does not need periodic memory reorganization. the stop and copy method is a fast garbage collection algorithm, but it requires two times the memory area. this method is not suitable for real-time systems because it will freeze the system from time to time.

Go to: Unknown reference

The answer has been locked and cannot be commented, edited, or voted.

()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.