C # technology------garbage collection mechanism (GC)

Source: Internet
Author: User
Tags compact

GC's past and present life

Although this article is based on the. NET as a goal to tell the GC, but the concept of GC is not just born soon. As early as 1958, the Lisp language, implemented by the famous Turing laureate John McCarthy, had already provided the function of the GC, which was the first time the GC appeared. Lisp programmers think that memory management is too important to be managed by programmers themselves.

But later in the days of Lisp but there is no struggling, the use of manual memory management language occupies the upper hand, with C as the representative. For the same reason, different people have different views, C programmers think that memory management is too important, so can not be managed by the system, and ridiculed the Lisp program as slow as the turtle running speed. Indeed, the speed of the GC and the heavy use of system resources in the age of careful calculation of every byte are unacceptable to many. Then, the Smalltalk language developed by Dave Ungar for the first time used the generational garbage collection technology (which is discussed later), but Smalltalk is not widely used.

It was not until the mid 1990s that the GC ascended the stage of history as a protagonist, thanks to the progress of Java, which today's GC is defeating. Java uses VM (virtual machine) mechanism, and the running of the program by the VM also includes the management of the GC. The late 90. NET has appeared,. NET uses a similar approach to Java that is managed by the CLR (Common Language Runtime). The emergence of these two camps will lead to the introduction of a virtual platform-based development ERA, GC is also more and more popular attention at this time.

Why use GC? It can also be said why use memory automatic management? There are several reasons for this:

1. Improve the abstraction degree of software development;

2, the programmer can focus on the actual problem without distraction to manage the memory problem;

3, can make the interface of the module more clear, reduce the coupling between the modules;

4, greatly reduce the memory of human mismanagement caused by the bug;

5. Make memory management more efficient.

In general, GC allows programmers to get rid of complex memory problems, which improves the speed, quality, and security of software development.

What is a GC

The GC, in its name, is garbage collection, which is, of course, just memory. Garbage Collector (garbage collector, also a GC without confusion) takes the root of the application and iterates through all the objects that the application dynamically allocates on the heap [2], by identifying whether they are referenced to determine which objects are dead, Which still needs to be used. Objects that are no longer referenced by the application's root or other objects are dead objects, known as garbage, that need to be recycled. That's how the GC works. To implement this principle, the GC has several algorithms. The more common algorithms are reference Counting,mark sweep,copy collection and so on. The current mainstream virtual system. NET Clr,java VMS and rotor are all using the Mark sweep algorithm.

First, mark-compact tag compression algorithm

Simply put in. NET GC algorithm is regarded as mark-compact algorithm. Phase 1:mark-sweep Mark Clear phase, first assume that all the objects in the heap can be recycled, then find the objects that cannot be recycled, mark them, and finally the objects that are not tagged in the heap can be recycled; stage 2:compact compression phase, After the object is reclaimed, the heap memory space becomes discontinuous, moving the objects in the heap so that they are re-arranged from the heap base site, similar to the defragmentation of disk space.

After the heap memory has been reclaimed and compressed, you can continue to use the previous heap memory allocation method, where only one pointer is used to record the start address of the heap assignment. Main processing steps: Suspend thread → OK roots→ create reachable objects graph→ object recycle →heap compress → pointer fix. It is possible to understand that the reference relationships of objects in Roots:heap are intricate (cross-referencing, circular referencing), and that a complex graph,roots is a variety of entry points that the CLR can find outside of the heap.

Where GC searches for roots include global objects, static variables, local objects, function invocation parameters, object pointers in the current CPU register (as well as finalization queue), and so on. It can be categorized into 2 types: Static variables that have been initialized, objects that are still in use by threads (STACK+CPU Register). Reachable objects: Refers to objects that can be reached from roots based on the object reference relationship. For example, the local variable of the currently executing function object A is a root object whose member variable refers to object B, then B is a reachable object. From roots you can create reachable objects graph, and the remaining objects are unreachable and can be recycled.

The pointer fix is because the compact process moves the heap object, the object address changes, and all reference pointers need to be repaired, including the stack, pointers in the CPU register, and reference pointers to other objects in the heap. There is a slight difference between the debug and release execution modes, the objects that are not referenced by subsequent code in release mode are unreachable, and the debug mode needs to wait until the current function is executed before the objects become unreachable. The purpose is to track the contents of a local object for debugging purposes. Managed objects passed to COM + will also become root, and have a reference counter to be compatible with COM + 's memory management mechanism, and the reference counter is 0 o'clock, these objects may become recycled objects. Pinned objects refers to an object that cannot be moved after the assignment, such as an object passed to unmanaged code (or using the fixed keyword), and the GC cannot modify the reference pointer in unmanaged code while the pointer is being repaired, so moving those objects will result in an exception. Pinned objects can cause fragmentation of the heap, but in most cases objects passed to unmanaged code should be recycled in GC.      

Second, generational generation algorithm

The program may use hundreds of m, a few grams of memory, the memory area of such a high cost of GC operation, the generation algorithm has a certain statistical basis, the performance of the GC improved significantly. The object is divided into new and old according to the life cycle, according to the result of the statistic distribution law, the new and old regions can adopt different recycling strategies and algorithms, strengthen the recovery and processing intensity of the new region, and strive for the short time interval, the smaller memory area, At a lower cost, a large number of newly discarded local objects on the execution path are reclaimed in a timely manner. Hypothetical prerequisites for the generational algorithm:

1, a large number of newly created object life cycle is relatively short, and older object life cycle will be longer;

2, the partial memory recovery is faster than the whole memory-based recovery operation;

3. The association between newly created objects is usually strong. The heap allocation object is continuous, the correlation is strong to increase the CPU cache hit rate,. NET divides the heap into 3 age zones: Gen 0, Gen 1, Gen 2;

The heap is divided into 3 age zones, and the corresponding GC is available in 3 ways: # Gen 0 Collections, # gen 1 collections, #Gen 2 collections. If gen 0 heap memory reaches the threshold, the 0 generation gc,0 GC is triggered to enter GEN1 after the surviving objects in Gen 0. If Gen 1 's memory reaches the threshold, the 1 generation gc,1 GC recycles the Gen 0 heap and Gen 1 heap, and the surviving objects enter Gen2.

The 2-generation GC recycles the Gen 0 heap, Gen 1 heap, and Gen 2 heap, and Gen 0 and Gen 1 are smaller, and the two generations are always around 16M, and the size of the Gen2 is determined by the application and can reach a few g, so the cost of the 0 generation and 1 generation GC is very low, The 2-generation GC, called full GC, is usually expensive. A rough calculation of the 0 and 1 generation GC should be possible between a few milliseconds to dozens of milliseconds, and the full GC might take a few seconds to complete when Gen 2 heap is large. In general terms. NET application runs, the 2-generation, 1-generation, and 0-generation GC frequencies should be roughly 1:10:100.

Iii. finalization queue and freachable queue

both queues and. NET object is related to the Finalize method provided by the. These two queues are not used to store real objects, but rather to store a set of pointers to objects. When the new operator is used in the program to allocate space on the managed heap, the GC parses it and, if the object contains a Finalize method, adds a pointer to the object in the finalization queue.

After the GC is started, the mark phase distinguishes what is garbage. Then search in the trash, and if you find that there are objects in the garbage that are pointed to by pointers in the finalization queue, detach the object from the garbage and move the pointer to it into the freachable queue. This process is known as the resurrection of the object (Resurrection), and the Dead object has been revived. Why should we save it? Because the Finalize method for this object has not yet been executed, it cannot be left to die. Freachable queue usually does not do anything, but once the inside is added to the pointer, it will trigger the object's Finalize method execution, and then remove the pointer from the queue, this is the object can be quiet dead.

The System.GC class of the. NET Framework provides two methods for controlling finalize, ReRegisterForFinalize and SuppressFinalize. The former is a finalize method that requests the system to complete the object, which is a finalize method that requests the system not to complete the object. The ReRegisterForFinalize method is actually adding a pointer to the object back to the finalization queue. This is a very interesting phenomenon, because the objects in the finalization queue can be resurrected, and if the ReRegisterForFinalize method is called in the object's Finalize method, it creates an object that will never die on the heap. Like the Phoenix Nirvana, every time you die, you can revive.

Managed Resources:

. All types in net are derived (directly or indirectly) from the System.Object type.

The types in the CTS are divided into two main classes-the reference type (reference type, also called the managed type [managed type]), allocated on the memory heap, and the value type, which is allocated on the stack.

The value type is in the stack, advanced, and the life of the value type variable is in order, which ensures that the value type variable releases the resource before exiting the scope. Simpler and more efficient than reference types. The stack is allocated memory from a high address to a low address.

The reference type is allocated on the managed heap (Managed heap), declaring a variable to be stored on the stack, and storing the object's address in this variable when using new to create the object. Managed heap In contrast, allocating memory from low addresses to high addresses,

. More than 80% of the resources in net are managed resources.

Unmanaged Resources: 

ApplicationContext, Brush, Component, ComponentDesigner, Container, Context, Cursor, FileStream, Font, Icon, Image, Matri Resources such as x, Object, OdbcDataReader, OleDbDataReader, Pen, Regex, Socket, StreamWriter, Timer, Tooltip, file handle, GDI resource, database connection, and so on. May be in use when a lot of did not notice!

. NET's GC mechanism has such two problems:

First, the GC is not able to release all resources. It does not automatically release unmanaged resources.

Second, the GC is not real-time, which will cause bottlenecks and uncertainties in system performance.

GC is not real-time, which can cause bottlenecks and uncertainties in system performance. So with the IDisposable interface, the IDisposable interface defines the Dispose method, which is used by programmers to explicitly invoke to release unmanaged resources. Use statements can simplify resource management.

Example:

///summary 
///Execute SQL statement, return the number of records affected
////summary
///param name= "SQLString" SQL statement/param
// Returns the number of records affected/returns
Publicstaticint ExecuteSQL (string SQLString)
{
using (SqlConnection connection =new SqlConnection (connectionString))
{
using (SqlCommand cmd =new SqlCommand (SQLString, connection))
{
Try
{
Connection. Open ();
int rows = cmd. ExecuteNonQuery ();
return rows;
}
catch (System.Data.SqlClient.SqlException e)
{
Connecti On. Close ();
Throw E;
}
Finally
{
cmd. Dispose ();
Connection. Close ();
}
}
}
}

When you release an unmanaged object with the Dispose method, you should call gc.suppressfinalize. If the object is terminating the queue (finalization queues), the GC. SuppressFinalize will prevent the GC from calling the Finalize method. Because the call to the Finalize method sacrifices partial performance. If your Dispose method already cleans up the delegated resources, there is no need for the GC to call the object's Finalize method (MSDN) again. Attached to the MSDN code, you can refer to.

Publicclass baseresource:idisposable
{
Point to an external unmanaged resource
Private INTPTR handle;
Other managed resources used by this class.
Private Component components;
The trace is invoked. Dispose method, identify bits, control the behavior of the garbage collector
Privatebool disposed =false;
constructor function
Public BaseResource ()
{
Insert appropriate constructor code here.
}
Implement Interface IDisposable.
Cannot be declared as virtual method virtual.
Subclasses cannot override this method.
Publicvoid Dispose ()
{
Dispose (TRUE);
Leaving the end queue finalization queues
To set the block finalizer code for an object
//
Gc. SuppressFinalize (this);
}
Dispose (bool disposing) performs in two different situations.
If disposing equals True, the method has been called
or indirectly called by user code. Both managed and unmanaged code can be freed
If disposing equals false, the method has been called internally by the finalizer finalizer.
You cannot reference other objects, only unmanaged resources can be freed.
Protectedvirtualvoid Dispose (bool disposing)
{
Check if Dispose has been called.
if (!this.disposed)
{
If equal to True, frees all managed and unmanaged resources
if (disposing)
{
Releases the managed resources.
Components.dispose ();
}
Releases the unmanaged resource, if disposing is false,
Only the following code is executed.
CloseHandle (handle);
handle = IntPtr.Zero;
Note that this is non-thread safe.
After the managed resource is released, other threads can be started to destroy the object.
But before the disposed flag is set to True
If thread safety is a must, the client must implement it.
}
Disposed =true;
}
Calling methods using Interop
Clears unmanaged resources.
[System.Runtime.InteropServices.DllImport ("Kernel32")]
Privateexternstatic Boolean CloseHandle (IntPtr handle);
Using C # destructors to implement finalizer code
This will only invoke execution if the Dispose method is not invoked.
If you give the base class the chance to end.
Do not provide destructors for subclasses.
~baseresource ()
{
Do not repeat the code to create the cleanup.
Calling Dispose (false) is the best approach based on reliability and maintainability considerations
Dispose (FALSE);
}
Allows you to call the Dispose method multiple times,
However, an exception is thrown if the object is already disposed.
Regardless of when you deal with the object will check whether the object is released,
Check to see if it has been disposed.
Publicvoid dosomething ()
{
if (this.disposed)
{
Thrownew ObjectDisposedException ();
}
}
Do not set the method to virtual.
Inheriting classes do not allow overriding this method
Publicvoid Close ()
{
Call the dispose parameter without parameters.
Dispose ();
}
Publicstaticvoid Main ()
{
Insert code here to create
and use a BaseResource object.
}
}

  Gc. Collect () method

Role: Enforces garbage collection.

GC's Method:

Name

Description

Collect ()

Enforces instant garbage collection for all generations.

Collect (Int32)

Forces an immediate garbage collection of 0 generations to the specified generation.

Collect (Int32, GCCollectionMode)

Enforces garbage collection of 0 generations to a specified generation at the time specified by the GCCollectionMode value

GC Considerations:

1, only the management of memory, unmanaged resources, such as file handles, GDI resources, database connections, etc. also need the user to manage.

2, circular reference, the realization of network structure, etc. will become simple. The GC's flag-compression algorithm effectively detects these relationships and removes the entire mesh structure that is no longer referenced.

3. GC detects whether an object can be accessed by another object from the beginning of the program's root object, rather than using a reference counting method similar to COM.

4. The GC runs in a separate thread to remove memory that is no longer referenced.

5. The GC compresses the managed heap each time it is run.

6. You must be responsible for the release of unmanaged resources. You can ensure that resources are freed by defining finalizer in the type.

7. The finalizer of an object is executed at an indeterminate time after the object is no longer referenced. Note that the destructor is not immediately executed when the object goes out of the declaration cycle, as in C + +

8, the use of finalizer has a performance cost. Objects that need to be finalization are not immediately purged and need to execute Finalizer.finalizer first, not the thread that is executing the GC. The GC puts each object that needs to execute finalizer into one queue and then starts another thread to perform all of these finalizer, and the GC thread continues to delete the other objects to be reclaimed. In the next GC cycle, the memory of these objects that finished finalizer is recycled.

9.. NET GC uses the concept of "generation" (generations) to optimize performance. Generation helps the GC identify those most likely to be garbage more quickly. The newly created object is the No. 0 generation object after the garbage collection was last executed. The object that experienced a GC cycle is a 1th generation object. An object that has undergone two or more GC cycles is a 2nd-generation object. The role of a generation is to differentiate between local variables and objects that need to survive the lifetime of the application. Most No. 0 generation objects are local variables. member variables and global variables quickly become 1th-generation objects and eventually become 2nd-generation objects.

10. The GC performs different check policies for different generations of objects to optimize performance. The No. 0 Generation object is checked for each GC cycle. About 1/10 of GC cycles check for NO. 0 and 1th generation objects. About 1/100 of the GC cycles check for all objects. Rethinking the cost of finalization: the object that needs to be finalization may stay in memory for an additional 9 GC cycles than does not need to finalization. If it has not yet been finalize, it becomes a 2nd-generation object and stays in memory for a longer period of time.

C # technology------garbage collection mechanism (GC)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.