Understanding. NET CLR Garbage collection--(learn it well.) NET series)

Source: Internet
Author: User
Tags compact

Introduction

Memory management is a rather complex and interesting area of computer science. In the dozens of years of the birth of computers, the technology of memory management is progressing, which makes the system more efficient to utilize the memory of this computer as an essential resource.

In general, memory management can be divided into three categories: hardware management (such as TLB), operating system management (such as Buddy System,paging,segmentation), and application management (such as C++,java,.net's memory management mechanism). In view of the limitations of space and the author's level, this article only covers a very small part of memory management, that is, the memory management method in. Net.. NET is a modern application framework, using memory automatic management technology, which is commonly referred to as memory garbage automatic recovery technology ――garbage Collection (hereinafter referred to as GC), the analysis of. NET is more representative.

The history and benefits of GC

Although this article is about using. NET as a goal to tell the GC, the concept of GC is not just born soon. As early as 1958, the Lisp language, implemented by the famous Turing laureate John McCarthy, had already provided the function of the GC, which was the first time the GC appeared. Lisp programmers think that memory management is too important to be managed by programmers themselves. But later in the days of Lisp but there is no struggling, the use of manual memory management language occupies the upper hand, with C as the representative. For the same reason, different people have different views, C programmers think that memory management is too important, so can not be managed by the system, and ridiculed the Lisp program as slow as the turtle running speed. Indeed, the speed of the GC and the heavy use of system resources in the age of careful calculation of every byte are unacceptable to many. Then, the small talk language developed by Dave Ungar for the first time adopted the technology of generational garbage collection (this technique is discussed later), but small Talk has not been widely used.

It was not until the mid 1990s that the GC ascended the stage of history as a protagonist, thanks to the progress of Java, which today's GC is defeating. Java uses VM (virtual machine) mechanism, and the running of the program by the VM also includes the management of the GC. In the late 90,. Net appeared, and. NET adopted a similar approach to Java that was managed by the CLR (Common Language Runtime). The emergence of these two camps will lead to the introduction of a virtual platform-based development ERA, GC is also more and more popular attention at this time.

Why use GC? It can also be said why use memory automatic management? There are several reasons for this:

L improved the abstraction degree of software development;

• Programmers can focus on real-world issues without being distracted by managing memory issues;

L can make the interface of the module more clear, reduce the coupling between modules;

L greatly reduce the error caused by improper management of memory;

L make memory management more efficient.

In general, GC allows programmers to get rid of complex memory problems, which improves the speed, quality, and security of software development.

what is a GC

The GC, in its name, is garbage collection, which is, of course, just memory. Garbage Collector (garbage collector, also a GC without confusion) is based on the application's root[1] , traversing all the objects dynamically allocated by the application on the heap [2], Identify which objects are already dead and which still need to be used, by identifying whether they are referenced. Objects that are no longer referenced by the application's root or other objects are dead objects, known as garbage, that need to be recycled. That's how the GC works. To implement this principle, the GC has several algorithms. The more common algorithms are reference Counting,mark sweep,copy collection and so on. The current mainstream virtual system. NET Clr,java VMS and rotor are all using the Mark sweep algorithm. This article is based on. NET, where only the mark sweep algorithm is described.

related GC AlgorithmsMark Sweep

In the process of running the program, the heap allocation space is constantly given to the object, and when the heap space is occupied to the point that the next object is allocated, the Mark sweep algorithm is activated, the garbage memory is reclaimed and returned to the free list[3] .

Mark Sweep, like its name, is divided into two stages in the process of operation, the Mark stage and the sweep stage. The Mark phase's task is to start from root, traverse the entire heap with mutual referential relationships, and tag the objects referenced by root and other objects. Objects that are not marked are garbage. Then there is the sweep phase, the task of which is to reclaim all the rubbish.

The Mark sweep algorithm is faster than the reference counting and avoids memory leaks due to circular references. But there are also a number of drawbacks that need to traverse all the objects in the heap (the surviving objects traverse the mark phase and the dead objects traverse the sweep phase) so the speed is not ideal. and garbage collection can cause a lot of memory fragmentation.

( simply put. NET GC algorithm is regarded as mark-compact algorithm. Phase 1:mark-sweep Mark Clear phase, first assume that all the objects in the heap can be recycled, then find the objects that cannot be recycled, mark them, and finally the objects that are not tagged in the heap can be recycled; stage 2:compact compression phase, After the object is reclaimed, the heap memory space becomes discontinuous, moving the objects in the heap so that they are re-arranged from the heap base site, similar to the defragmentation of disk space. After the heap memory has been reclaimed and compressed, you can continue to use the previous heap memory allocation method, where only one pointer is used to record the start address of the heap assignment. Main processing steps: Suspend thread → OK roots→ create reachable objects graph→ object recycle →heap compress → pointer fix. It is possible to understand that the reference relationships of objects in Roots:heap are intricate (cross-referencing, circular referencing), and that a complex graph,roots is a variety of entry points that the CLR can find outside of the heap. GC Search roots places include global objects, static variables, local objects, function call parameters, The object pointer (also finalization queue) in the current CPU register. It can be categorized into 2 types: Static variables that have been initialized, objects that are still in use by threads (STACK+CPU Register).     Reachable objects: Refers to objects that can be reached from roots based on the object reference relationship. For example, the local variable of the currently executing function object A is a root object whose member variable refers to object B, then B is a reachable object. From roots you can create reachable objects graph, the remaining objects are unreachable, can be recycled )

To solve these two problems, the Mark sweep algorithm has been improved. The first is to add the compact stage to the algorithm, that is, to mark the surviving objects, then move the objects to make them contiguous in memory, and finally update the object-related addresses and the free list. This is the Mark compact algorithm, which solves the problem of memory fragmentation. In order to improve the speed, the concept of generation was introduced.

Generation

Generational garbage collector (also known as ephemeral garbage collector) is based on the following assumptions:

L The younger the object, the shorter its life cycle;

The older the object, the longer its life cycle;

Young objects and other objects of the relationship is relatively strong, the frequency of access is also relatively high;

The recovery compression of a portion of the heap is faster than the recovery of the entire heap.

The concept of generation is to manage the generation of objects in the heap (divided into chunks, with different object lifetimes in each piece). When the object has just been allocated in generation 0, the Mark compact algorithm is started when generation 0 of space is exhausted. After several GC, if the object is still alive, it will be moved to Generation 1. Similarly, if the object survives several times after the GC, it is moved to generation 2 until it is moved to the highest level and is eventually recycled or dies with the program. The biggest benefit of using generation is that each time the GC does not process the entire heap, it processes a small chunk at a time. For objects in generation 0, because they are the most likely to die, the number of GC can be arranged more, while other objects with less likelihood of relative death can have fewer GC generation. By doing so, the speed of the GC has been improved to a certain extent. This gives rise to several issues to be discussed, starting with how many generation should be set, how large each generation should be, and then the number of times that it should have been GC when each object was upgraded. With regard to the. NET CLR's handling of this problem, an example is given at the end of this article to test it.

Related data Structures

The data structure associated with. NET GC has three managed heap,finalization queue and freachable queue.

Managed Heap

The Managed heap is a simple and optimized heap that is not the same as the traditional c-runtime heap. Its simple management method is to improve the management of the heap, but also based on a simple (and impossible) hypothesis. The management of the managed heap assumes that memory is infinite. On the managed heap there is a pointer called Nextobjptr, which is used to indicate the address of the last object on the heap. When a new object is to be allocated to this heap, the only thing to do is to add the Nextobjptr value to the new object's size to form a new nextobjptr. This is a simple addition, and when the value of Nextobjptr exceeds the managed heap boundary, the heap is full and the GC is started.

Finalization queue and freachable queue

These two queues are related to the Finalize[4] method provided by the. NET object. These two queues are not used to store real objects, but rather to store a set of pointers to objects. When the new operator is used in the program to allocate space on the managed heap, the GC parses it and, if the object contains a Finalize method, adds a pointer to the object in the finalization queue. After the GC is started, the mark phase distinguishes what is garbage. Then search in the trash, and if you find that there are objects in the garbage that are pointed to by pointers in the finalization queue, detach the object from the garbage and move the pointer to it into the freachable queue. This process is known as the resurrection of the object (Resurrection), and the Dead object has been revived. Why should we save it? Because the Finalize method for this object has not yet been executed, it cannot be left to die. Freachable queue usually does not do anything, but once the inside is added to the pointer, it will trigger the object's Finalize method execution, and then remove the pointer from the queue, this is the object can be quiet dead. Net The System.GC class of the framework provides two methods for controlling finalize, ReRegisterForFinalize and SuppressFinalize. The former is a finalize method that requests the system to complete the object, which is a finalize method that requests the system not to complete the object. The ReRegisterForFinalize method is actually adding a pointer to the object back to the finalization queue. This is a very interesting phenomenon, because the objects in the finalization queue can be resurrected, and if the ReRegisterForFinalize method is called in the object's Finalize method, it creates an object that will never die on the heap. Like the Phoenix Nirvana, every time you die, you can revive.

Managed Resources:

. All types in net are derived (directly or indirectly) from the System.Object type.

The types in the CTS are divided into two main classes-the reference type (reference type, also called the managed type [managed type]), allocated on the memory heap, and the value type, which is allocated on the stack.

The value type is in the stack, advanced, and the life of the value type variable is in order, which ensures that the value type variable releases the resource before exiting the scope. Simpler and more efficient than reference types. The stack is allocated memory from a high address to a low address.

The reference type is allocated on the managed heap (Managed heap), declaring a variable to be stored on the stack, and storing the object's address in this variable when using new to create the object. Managed heap In contrast, allocating memory from low addresses to high addresses,

. More than 80% of the resources in net are managed resources.

Unmanaged Resources: 

ApplicationContext, Brush, Component, ComponentDesigner, Container, Context, Cursor, FileStream, Font, Icon, Image, Matri Resources such as x, Object, OdbcDataReader, OleDbDataReader, Pen, Regex, Socket, StreamWriter, Timer, Tooltip, file handle, GDI resource, database connection, and so on. May be in use when a lot of did not notice!

. NET's GC mechanism has such two problems:

First, the GC is not able to release all resources. It does not automatically release unmanaged resources.

Second, the GC is not real-time, which will cause bottlenecks and uncertainties in system performance.

GC is not real-time, which can cause bottlenecks and uncertainties in system performance. So with the IDisposable interface, the IDisposable interface defines the Dispose method, which is used by programmers to explicitly invoke to release unmanaged resources. Use statements can simplify resource management.

Direct Control of the GC

The System.GC class of the. NET Framework provides some ways to manipulate the GC directly. The System.Runtime.InteropServices.GCHandle class provides a way to access managed objects from unmanaged memory (this is not discussed here). Let's take a look at the following example of direct manipulation using SYSTEM.GC.

Using System; Namespace Gctest{class Gcdemo {private static void Generationdemo () {/Let's see How many generations the GCH supports (we know it ' s 2) Console.WriteLine ("Maximum GC Generations: {0}", GC.               Maxgeneration);               Create a new baseobj in the heap genobj obj = new Genobj ("Generation"); Since This object is newly created, it should are in generation 0 obj.    Displaygeneration (); Displays 0 for (int i = 1; I <= GC. Maxgeneration;                   i++) {//performing a garbage collection promotes the object ' s generation Gc.                   Collect (); Obj.    Displaygeneration ();         Displays i} obj = null; Destroy the strong reference to this object for (int i = 0; I <= GC. Maxgeneration; i++) {GC.                                 Collect (i);    Gc.                   WaitForPendingFinalizers ();                   Suspend this thread until the freachable queue of//the I generation have been emptied only when i = GC. Maxgeneration, this finalization method//of obj would be performed} Console.              WriteLine ("Demo stop:understanding generations.");  Total GC times//generation 0:5 times//generation 1:4 times//generation 2:         3 times} public static void Main () {Generationdemo ();          }} class Genobj {private string objname;         Public genobj (string name) {this.objname = name; } public void Displaygeneration () {Console.WriteLine ("I am in Generation {0}", GC.         Getgeneration (this)); }     };}

This is an interesting example, first using gc.maxgeneration () to learn that the GC in the. NET CLR takes a 3-generation structure, the generation 0~2. Next, a genobj instance of obj is allocated on the managed heap. At the beginning, obj is in generation 0 and then two GC for the entire managed heap. It can be found that every GC surviving object will ascend one level until it reaches Generation 2. Set obj = null to remove the strong reference to the root of obj and make obj garbage. The Gc.collect (i) is followed by GC for the managed heap, which is GC for generation 0~i. Gc. The role of WaitForPendingFinalizers () is to suspend the entire process until the Finalize method of the object pointed to in the freachable queue is called. The purpose of this is to guarantee a complete recovery of the garbage determined by this GC, and not to bring the object back up because of the object's Finalize method.

Some of the results from this example can be seen visually in the. NET CLR's processing of GC, for more specific data readers can test. NET applications using the Performance Monitor Perfmon.exe provided by Windows.

Also mention is the GC's handling of large objects (large object), which is much the same as discussed above, except that the GC does not perform the compact process, because it is obvious that a large object in memory will have a negative impact on system performance.

Gc. Collect () method

Role: Enforces garbage collection.

GC's Method:

Name

Description

Collect ()

Enforces instant garbage collection for all generations.

Collect (Int32)

Forces an immediate garbage collection of 0 generations to the specified generation.

Collect (Int32, GCCollectionMode)

Enforces garbage collection of 0 generations to a specified generation at the time specified by the GCCollectionMode value

GC Considerations:

1, only the management of memory, unmanaged resources, such as file handles, GDI resources, database connections, etc. also need the user to manage.

2, circular reference, the realization of network structure, etc. will become simple. The GC's flag-compression algorithm effectively detects these relationships and removes the entire mesh structure that is no longer referenced.

3. GC detects whether an object can be accessed by another object from the beginning of the program's root object, rather than using a reference counting method similar to COM.

4. The GC runs in a separate thread to remove memory that is no longer referenced.

5. The GC compresses the managed heap each time it is run.

6. You must be responsible for the release of unmanaged resources. You can ensure that resources are freed by defining finalizer in the type.

7. The finalizer of an object is executed at an indeterminate time after the object is no longer referenced. Note that the destructor is not immediately executed when the object goes out of the declaration cycle, as in C + +

8, the use of finalizer has a performance cost. Objects that need to be finalization are not immediately purged and need to execute Finalizer.finalizer first, not the thread that is executing the GC. The GC puts each object that needs to execute finalizer into one queue and then starts another thread to perform all of these finalizer, and the GC thread continues to delete the other objects to be reclaimed. In the next GC cycle, the memory of these objects that finished finalizer is recycled.

9.. NET GC uses the concept of "generation" (generations) to optimize performance. Generation helps the GC identify those most likely to be garbage more quickly. The newly created object is the No. 0 generation object after the garbage collection was last executed. The object that experienced a GC cycle is a 1th generation object. An object that has undergone two or more GC cycles is a 2nd-generation object. The role of a generation is to differentiate between local variables and objects that need to survive the lifetime of the application. Most No. 0 generation objects are local variables. member variables and global variables quickly become 1th-generation objects and eventually become 2nd-generation objects.

10. The GC performs different check policies for different generations of objects to optimize performance. The No. 0 Generation object is checked for each GC cycle. About 1/10 of GC cycles check for NO. 0 and 1th generation objects. About 1/100 of the GC cycles check for all objects. Rethinking the cost of finalization: the object that needs to be finalization may stay in memory for an additional 9 GC cycles than does not need to finalization. If it has not yet been finalize, it becomes a 2nd-generation object and stays in memory for a longer period of time.

Understanding. NET CLR Garbage collection--(learn it well.) NET series)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.