A detailed description of C # garbage collection mechanism

Source: Internet
Author: User

GC's past and present life

Although this article is about using. NET as a goal to tell the GC, the concept of GC is not just born soon. As early as 1958, the Lisp language, implemented by the famous Turing laureate John McCarthy, had already provided the function of the GC, which was the first time the GC appeared. Lisp programmers think that memory management is too important to be managed by programmers themselves. But later in the days of Lisp but there is no struggling, the use of manual memory management language occupies the upper hand, with C as the representative. For the same reason, different people have different views, C programmers think that memory management is too important, so can not be managed by the system, and ridiculed the Lisp program as slow as the turtle running speed. Indeed, the speed of the GC and the heavy use of system resources in the age of careful calculation of every byte are unacceptable to many. Then, the small talk language developed by Dave Ungar for the first time adopted the technology of generational garbage collection, which is discussed later in this article, but small talking is not widely used.
It was not until the mid 1990s that the GC ascended the stage of history as a protagonist, thanks to the progress of Java, which today's GC is defeating. Java uses VM (virtual machine) mechanism, and the running of the program by the VM also includes the management of the GC. In the late 90,. Net appeared, and. NET adopted a similar approach to Java that was managed by the CLR (Common Language Runtime). The emergence of these two camps will lead to the introduction of a virtual platform-based development ERA, GC is also more and more popular attention at this time.
Why use GC? It can also be said why use memory automatic management? There are several reasons for this:
1. Improve the abstraction degree of software development;
2, the programmer can focus on the actual problem without distraction to manage the memory problem;
3, can make the interface of the module more clear, reduce the coupling between the modules;
4, greatly reduce the memory of human mismanagement caused by the bug;
5. Make memory management more efficient.
In general, GC allows programmers to get rid of complex memory problems, which improves the speed, quality, and security of software development.

What is a GC

The GC, in its name, is garbage collection, which is, of course, just memory. Garbage Collector (garbage collector, also a GC without confusion) iterates through all the objects that the application dynamically allocates on the heap, based on the root of the application [2], by identifying whether they are referenced to determine which objects are already dead and which ones still need to be used. Objects that are no longer referenced by the application's root or other objects are dead objects, known as garbage, that need to be recycled. That's how the GC works. To implement this principle, the GC has several algorithms. The more common algorithms are reference Counting,mark sweep,copy collection and so on. The current mainstream virtual system. NET Clr,java VMS and rotor are all using the Mark sweep algorithm.

First, mark-compact tag compression algorithm
Simply put. NET GC algorithm as the MARK-COMPACT algorithm
Phase 1:mark-sweep Mark Purge phase
Let's assume that all objects in the heap can be reclaimed, and then find objects that cannot be recycled, mark them, and finally the objects that are not tagged in the heap can be recycled.
Phase 2:compact Compression Phase
After the object is reclaimed, the heap memory space becomes discontinuous, moving the objects in the heap so that they are re-arranged from the heap base address, similar to the defragmentation of disk space

After the heap memory has been reclaimed and compressed, you can continue to use the previous heap memory allocation method, where only one pointer is used to record the start address of the heap assignment.
Main processing steps: Suspending threads and determining roots=> Create reachable Objectsgraph=> object Recycling =>heap compression = pointer fix
It is possible to understand that the reference relationships of objects in Roots:heap are intricate (cross-referencing, circular referencing), and that a complex graph,roots is a variety of entry points that the CLR can find outside of the heap. Where GC searches for roots include global objects, static variables, local objects, function invocation parameters, object pointers in the current CPU register (and Finalizationqueue), and so on. It can be categorized into 2 types: Static variables that have been initialized, objects that are still in use by threads (STACK+CPU Register)
Reachable objects: Refers to objects that can be reached from roots based on the object reference relationship. For example, the local variable of the currently executing function object A is a rootobject, and his member variable refers to object B, then B is a reachable object. Reachable Objectsgraph can be created from roots, the remaining objects are unreachable and can be recycled

Pointer repair is because the compact process moves the heap object, the object address changes, and all reference pointers need to be repaired, including the stack, pointers in the cpuregister, and reference pointers to other objects in the heap
There is a slight difference between the debug and release execution modes, the objects that are not referenced by subsequent code in release mode are unreachable, and the debug mode needs to wait until the current function is executed before the objects become unreachable. To track the contents of a local object for debugging purposes
The managed object passed to COM + will also become root, and has a reference counter to be compatible with COM + 's memory management mechanism, and the reference counter is 0 o'clock these objects may be recycled objects
Pinnedobjects refers to objects that cannot be moved after allocation, such as objects passed to unmanaged code (or using the fixed keyword), and the GC cannot modify reference pointers in unmanaged code while the pointer is being repaired, so moving those objects will result in an exception. Pinnedobjects can cause fragmentation of the heap, but in most cases objects passed to unmanaged code should be recycled in GC
second, generational generation algorithm
The program may use hundreds of m, a few grams of memory, the memory area of such a high cost of GC operation, the generation of the algorithm has a certain statistical basis, the performance of the GC improved effect is more obvious
The object is divided into new and old according to the life cycle, according to the result of the statistic distribution law, the new and old regions can adopt different recycling strategies and algorithms, strengthen the recovery and processing intensity of the new region, and strive for the short time interval, the smaller memory area, A large number of newly discarded local objects on the execution path that are not used at a lower cost are promptly reclaimed
Hypothetical prerequisites for the generational algorithm:
1, a large number of newly created object life cycle is relatively short, and older object life cycle will be longer
2. Recovery of partial memory is faster than recovery based on all memory
3. The association between newly created objects is usually strong. The heap allocates objects that are contiguous and highly correlated to increase the hit rate of the CPU cache.
. NET divides the heap into 3 age zones: Gen 0, Gen 1, Gen 2

The heap is divided into 3 age zones, and the corresponding GC is available in 3 ways: # Gen 0 Collections, # gen 1 collections, #Gen 2 collections. If gen 0 heap memory reaches the threshold, the 0 generation gc,0 GC is triggered to enter GEN1 after the surviving objects in Gen 0. If Gen 1 's memory reaches the threshold, the 1 generation gc,1 GC recycles the Gen 0 heap and Gen 1 heap, and the surviving objects enter Gen2. 2 Generation GC recycles the Gen 0 heap, Gen 1 heap, and Gen 2 heap
Gen 0 and Gen 1 are relatively small, these two generations are always kept at around 16M; the size of the Gen2 is determined by the application and can reach a few grams, so the cost of the 0-generation and 1-generation GC is very low, and the 2-generation GC, called FULLGC, is usually expensive. Roughly calculating the 0 and 1 generations of GC should be able to complete in milliseconds to dozens of milliseconds, while Gen 2 heap fullgc may take a few seconds. In general terms. NET applications, the 2-generation, 1-generation, and 0-generation GC frequencies should be roughly 1:10:100.

Iii. finalization queue and freachable queue

These two queues are related to the Finalize method provided by the. NET object. These two queues are not used to store real objects, but rather to store a set of pointers to objects. When the new operator is used in the program to allocate space on the managed heap, the GC parses it and, if the object contains a Finalize method, adds a pointer to the object in the finalization queue. After the GC is started, the mark phase distinguishes what is garbage. Then search in the trash, and if you find that there are objects in the garbage that are pointed to by pointers in the finalization queue, detach the object from the garbage and move the pointer to it into the freachable queue. This process is known as the resurrection of the object (Resurrection), and the Dead object has been revived. Why should we save it? Because the Finalize method for this object has not yet been executed, it cannot be left to die. Freachable queue usually does not do anything, but once the inside is added to the pointer, it will trigger the object's Finalize method execution, and then remove the pointer from the queue, this is the object can be quiet dead. Net The System.GC class of the framework provides two methods for controlling finalize, ReRegisterForFinalize and SuppressFinalize. The former is a finalize method that requests the system to complete the object, which is a finalize method that requests the system not to complete the object. The ReRegisterForFinalize method is actually adding a pointer to the object back to the finalization queue. This is a very interesting phenomenon, because the objects in the finalization queue can be resurrected, and if the ReRegisterForFinalize method is called in the object's Finalize method, it creates an object that will never die on the heap. Like the Phoenix Nirvana, every time you die, you can revive.

Managed Resources:

All types in net are derived (directly or indirectly) from the System.Object type.

The types in the CTS are divided into two main classes-the reference type (reference type, also called the managed type [managed type]), which is allocated on the memory heap and the value type. Value types are allocated on the stack.

The value type is in the stack, advanced, and the life of the value type variable is in order, which ensures that the value type variable releases the resource before the scope is introduced. Simpler and more efficient than reference types. The stack is allocated memory from a high address to a low address.

The reference type is allocated on the managed heap (Managed heap), declaring a variable to be stored on the stack, and storing the object's address in this variable when using new to create the object. Managed heap In contrast, allocating memory from low addresses to high addresses,

More than 80% of the resources in. NET are managed resources.

Unmanaged Resources:

Applicationcontext,brush,component,componentdesigner,container,context,cursor,filestream,font,icon,image, Resources such as Matrix,object,odbcdatareader,oledbdatareader,pen,regex,socket,streamwriter,timer,tooltip, file handles, GDI resources, database connections, and so on. May be in use when a lot of did not notice!

. NET's GC mechanism has such two problems:

First, the GC is not able to release all resources. It does not automatically release unmanaged resources.

Second, the GC is not real-time, which will cause bottlenecks and uncertainties in system performance.

GC is not real-time, which can cause bottlenecks and uncertainties in system performance. So with the IDisposable interface, the IDisposable interface defines the Dispose method, which is used by programmers to explicitly invoke to release unmanaged resources. Use statements can simplify resource management.

Example

<summary>
Executes the SQL statement, returning the number of records affected
</summary>
<param name= "SQLString" >sql statement </param>
<returns> number of records affected </returns>
public static int ExecuteSQL (string SQLString)
{
using (SqlConnection connection = new SqlConnection (connectionString))
{
using (SqlCommand cmd = new SqlCommand (SQLString, connection))
{
Try
{
Connection. Open ();
int rows = cmd. ExecuteNonQuery ();
return rows;
}
catch (System.Data.SqlClient.SqlException e)
{
Connection. Close ();
Throw e;
}
Finally
{
Cmd. Dispose ();
Connection. Close ();
}
}
}
}

When you release an unmanaged object with the Dispose method, you should call gc.suppressfinalize. If the object is terminating the queue (finalization queues), the GC. SuppressFinalize will prevent the GC from calling the Finalize method. Because the call to the Finalize method sacrifices partial performance. If your Dispose method already cleans up the delegated resources, there is no need for the GC to call the object's Finalize method (MSDN) again. Attached to the MSDN code, you can refer to.

public class Baseresource:idisposable
{
Point to an external unmanaged resource
Private INTPTR handle;
Other managed resources used by this class.
Private Component components;
The trace is invoked. Dispose method, identify bits, control the behavior of the garbage collector
private bool disposed = false;

constructor function
Public BaseResource ()
{
Insert appropriate constructor code here.
}

Implement Interface IDisposable.
Cannot be declared as virtual method virtual.
Subclasses cannot override this method.
public void Dispose ()
{
Dispose (TRUE);
Leaving the end queue finalization queues
To set the block finalizer code for an object
//
Gc. SuppressFinalize (this);
}

Dispose (bool disposing) performs in two different situations.
If disposing equals True, the method has been called
or indirectly called by user code. Both managed and unmanaged code can be freed
If disposing equals false, the method has been called internally by the finalizer finalizer.
You cannot reference other objects, only unmanaged resources can be freed.
protected virtual void Dispose (bool disposing)
{
Check if Dispose has been called.
if (!this.disposed)
{
If equal to True, frees all managed and unmanaged resources
if (disposing)
{
Releases the managed resources.
Components.dispose ();
}
Releases the unmanaged resource, if disposing is false,
Only the following code is executed.
CloseHandle (handle);
handle = IntPtr.Zero;
Note that this is non-thread safe.
After the managed resource is released, other threads can be started to destroy the object.
But before the disposed flag is set to True
If thread safety is a must, the client must implement it.

}
disposed = true;
}
Calling methods using Interop
Clears unmanaged resources.
[System.Runtime.InteropServices.DllImport ("Kernel32")]
private extern static Boolean CloseHandle (IntPtr handle);

Using C # destructors to implement finalizer code
This will only invoke execution if the Dispose method is not invoked.
If you give the base class the chance to end.
Do not provide destructors for subclasses.
~baseresource ()
{
Do not repeat the code to create the cleanup.
Calling Dispose (false) is the best approach based on reliability and maintainability considerations
Dispose (FALSE);
}

Allows you to call the Dispose method multiple times,
However, an exception is thrown if the object is already disposed.
Regardless of when you deal with the object will check whether the object is released,
Check to see if it has been disposed.
public void DoSomething ()
{
if (this.disposed)
{
throw new ObjectDisposedException ();
}
}


Do not set the method to virtual.
Inheriting classes do not allow overriding this method
public void Close ()
{
Call the dispose parameter without parameters.
Dispose ();
}

public static void Main ()
{
Insert code here to create
and use a BaseResource object.
}
}

GC. Collect () method

Role: Enforces garbage collection.

GC's Method:

Name

Description

Collect ()

Enforces instant garbage collection for all generations.

Collect (Int32)

Forces an immediate garbage collection of 0 generations to the specified generation.

Collect (Int32, GCCollectionMode)

Forces a garbage collection of 0 generations to the specified generation at the time specified by the GCCollectionMode value.


GC Considerations:

1, only management memory, unmanaged resources, such as file handles, GDI resources, database connections, etc. also need the user to manage

2, circular reference, the realization of network structure, etc. will become simple. The GC flag also compresses the algorithm to effectively detect these relationships and remove the mesh structure that is no longer referenced as a whole.

3. GC detects whether an object can be accessed by another object from the beginning of the program's root object, rather than using a reference counting method similar to COM.

4. The GC runs in a separate thread to remove memory that is no longer referenced

5. The GC compresses the managed heap each time it is run

6. You must be responsible for the release of unmanaged resources. You can ensure that resources are freed by defining finalizer in the type.

7. The finalizer of an object is executed at an indeterminate time after the object is no longer referenced. Note that the destructor is not immediately executed when the object goes out of the declaration cycle, as in C + +

8, the use of finalizer has a performance cost. Objects that need to be finalization are not immediately purged, and the thread that needs to execute Finalizer.finalizer is not called before the GC executes. The GC puts every object that needs to execute finalizer into one queue, and then starts another thread to execute all of these finalizer. The GC thread continues to delete other objects to be reclaimed. In the next GC cycle, the memory of these objects that finished finalizer is recycled.

9.. NET GC uses the concept of "generation" (generations) to optimize performance. Generation helps the GC identify those most likely to be garbage more quickly. The newly created object is the No. 0 generation object after the garbage collection was last executed. The object that experienced a GC cycle is a 1th generation object. An object that has undergone two or more GC cycles is a 2nd-generation object. The role of a generation is to differentiate between local variables and objects that need to survive the lifetime of the application. Most No. 0 generation objects are local variables. member variables and global variables quickly become 1th-generation objects and eventually become 2nd-generation objects.

10. The GC performs different check policies for different generations of objects to optimize performance. The No. 0 Generation object is checked for each GC cycle. About 1/10 of GC cycles check for NO. 0 and 1th generation objects. About 1/100 of the GC cycles check for all objects. Rethinking the cost of finalization: the object that needs to be finalization may stay in memory for an additional 9 GC cycles than does not need to finalization. If it has not yet been finalize, it becomes a 2nd-generation object and stays in memory for a longer period of time.

(turn) C # garbage collection mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.