. Net garbage collection mechanism principle and algorithm (I)

Source: Internet
Author: User

 

With the garbage collection mechanism in Microsoft. Net clr, programmers do not need to pay attention to when to release the memory. The memory is released completely by GC, which is transparent to programmers. However, as a. Net programmer, it is necessary to understand how garbage collection works. In this article, we will look at how. Net allocates and manages managed memory, and then further describe the algorithm mechanism of the garbage collector.

It is difficult and boring to design an appropriate memory management policy for the Program. This work will also affect your focus on solving the problems to be solved by the program itself. Is there a built-in method that can help developers solve the memory management problem? Of course, in. Net, it is GC and garbage collection.

Let's think about how every program uses memory resources, such as screen display, network connection, and database resources. In fact, in an object-oriented environment, each type requires a little memory resources to store its data. The object needs to use the memory following the steps below:
1. allocate memory space for the Type
2. initialize the memory and set the memory to available.
3. Access Object members
4. Destroy the object to clear the memory.
5. release memory

This seemingly simple memory usage mode causes many program problems. Sometimes programmers may forget to release objects that are no longer in use, and sometimes attempt to access released objects. These two bugs are usually hidden and difficult to find. They are not like logical errors and can be modified once discovered. They may cause unexpected crashes due to memory leakage after the program runs for a period of time. In fact, there are many tools that can help developers detect memory problems, such as task manager, System Monitor AcitvieX Control, and Rational Purify.

GC does not require developers to pay attention to the time to release memory. However, the garbage collector does not manage all resources in the memory. Some resource garbage collectors do not know how to recycle them. Therefore, developers need to write their own code to recycle these resources. In. in the. Net framework, developers usually write the code to the Close, Dispose, or Finalize Method for clearing such resources. Later, we will look at the Finalize method, which will be automatically called by the garbage collector.

However, there are a lot of objects that do not need to release resources by themselves, such as Rectangle. to clear it, you only need to clear its left, right, width, and height fields, this garbage collector can be used completely. Next let's take a look at how the memory is allocated to objects.

Object allocation:

. Net clr allocates all referenced objects to the managed stack. This is similar to the c-runtime heap, but you do not need to pay attention to the release time when the object will be automatically released when it is not in use. In this way, a problem occurs. How does the Garbage Collector know that an object is no longer used for garbage collection? We will explain the problem later.

There are several garbage collection algorithms, each of which optimizes the performance for a specific environment. This article focuses on the clr garbage collection algorithm. Let's start with a basic concept.

After a process is initialized, the running process retains a contiguous blank memory space, which is the hosting heap. The managed heap records a pointer called NextObjPtr, which points to the allocation address of the next object. At the beginning, the Pointer Points to the starting position of the managed heap.

The application uses the new operator to create a new object. The operator first needs to make sure that the remaining space of the managed heap can store this object. If it can, point the NextObjPtr pointer to this object, then, call the object constructor. The new operator returns the object address.

Figure 1 managed heap

At this time, NextObjPtr points to the next object allocation location on the managed stack. Figure 1 shows three objects A, B, and C in A managed stack. The next object will be placed in the position pointed to by NextObjPtr (next to the C object)

Now let's take a look at how the c-runtime heap allocates memory. In the c-runtime heap, the allocated memory needs to traverse the data structure of a linked list until a large enough memory block is found, which may be split, after splitting, the pointer in the linked list must point to the remaining memory space and ensure that the linked list is intact. For managed heaps, allocating an object only modifies the direction of the NextObjPtr pointer, which is very fast. In fact, allocating an object on the hosting stack is very close to allocating memory on the thread stack.

So far, the memory allocation speed on the hosting stack seems to be faster than that on the c-runtime stack, and the implementation is simpler. Of course, the hosting heap gets this advantage because it makes the assumption that the address space is infinite. Obviously, this assumption is incorrect. There must be a mechanism to ensure that this assumption is true. This mechanism is the garbage collector. Let's see how it works.

When an application calls the new operator to create an object, there may be no memory to store the object. The managed heap can detect whether the space pointed to by NextObjPtr exceeds the size of the heap. If it exceeds the size of the heap, it indicates that the hosting is full and a garbage collection is required.

In reality, a garbage collection is triggered after the 0 generation is full. "Generation" is an implementation mechanism for the garbage collector to improve performance. "Generation" means that the newly created object is a younger generation, and the object that is not recycled before the recycle operation occurs is an older one. Dividing an object into several generations allows the Garbage Collector to recycle objects of only one generation, rather than all objects.

Garbage collection algorithm:

Check whether there are objects that are no longer used by the application. If such an object exists, the space occupied by these objects can be recycled (if there is not enough memory available on the stack, the new operator will throw an OutofMemoryException ). You may ask how the Garbage Collector judges whether an object is still in use? This question is not easy to answer.
Each application has a set of root objects with some storage locations. They may point to an address on the hosting stack or be null. For example, all global and static object pointers are the root objects of the application, and local variables/parameters on the thread Stack are also the root objects of the application, in addition, the object pointing to the managed heap in the CPU register is also the root object. The list of surviving root objects is maintained by the JIT (just-in-time) compiler and clr. The garbage collector can access these root objects.

When the Garbage Collector starts running, it assumes that all objects on the managed Stack are junk. That is to say, assume that there is no root object or the object referenced by the root object. The garbage collector then starts to traverse the root object and build a graph composed of all referenced objects with the root object.

Figure 2 shows that the root objects of the application hosted on the stack Are A, C, D, and F. These objects are part of the graph, and Object D references the object H, then object H is added to the graph. The Garbage Collector cyclically traverses all reachable objects.

Figure 2 objects on the managed Stack

The garbage collector traverses the root object and reference object one by one. If the garbage collector finds that an object has already been shown in the figure, it will traverse the object in another path. This has two purposes: improving performance and avoiding infinite loops.

After all the root objects are checked, all reachable objects in the application are displayed in the image of the garbage collector. All objects on the managed stack that are not on this graph are the garbage objects to be recycled. After the reachable object graph is built, the garbage collector starts to traverse the managed heap linearly and find continuous junk object blocks (which can be considered as idle memory ). The garbage collector then moves non-junk objects together (using the memcpy function in C) to overwrite all memory fragments. Of course, when moving objects, you must disable all object pointers (because they may all be wrong ). Therefore, the garbage collector must modify the root object of the application so that they point to the new memory address of the object. In addition, if an object contains a pointer to another object, the garbage collector is also responsible for modifying the reference. Figure 3 shows the managed heap after a collection.

Figure 3 managed heap after recycling

3. After collection, all the spam objects are identified, and all the non-spam objects are moved together. All non-spam object pointers are also modified to the moved memory address, and NextObjPtr points to the end of the last Non-spam object. At this time, the new operator can continue to successfully create objects.

As you can see, garbage collection has a significant performance loss, which is an obvious disadvantage of using managed heaps. However, remember that the memory reclaim operation is performed only when the managed heap is slow. The performance of the managed heap is better than that of the c-runtime heap before it is full. The garbage collector also performs some performance optimization during runtime. We will talk about this in the next article.

The following code illustrates how objects are created and managed:

 

 

 

 

01 class Application {

02 public static int Main (String [] args ){

03

04 // ArrayList object created in heap, myArray is now a root

05 ArrayList myArray = new ArrayList ();

06

07 // Create 10000 objects in the heap

08 for (int x = 0; x <10000; x ++ ){

09 myArray. Add (new Object (); // Object object created in heap

10}

11

12 // Right now, myArray is a root (on the thread's stack). So,

13 // myArray is reachable and the 10000 objects it points to are also

14 // reachable.

15 Console. WriteLine (a. Length );

16

17 // After the last reference to myArray in the code, myArray is not

18 // a root.

19 // Note that the method doesn't have to return, the JIT compiler

20 // knows

21 // to make myArray not a root after the last reference to it in

22 // code.

23

24 // Since myArray is not a root, all 10001 objects are not reachable

25 // and are considered garbage. However, the objects are not

26 // collected until a GC is already med.

27}

28}

 

You may ask why the GC is so good that ansi c ++ does not have it? The reason is that the Garbage Collector must be able to find the application's root object list, and the Object Pointer must be found. In C ++, object pointers can be converted to each other, and there is no way to know what object pointer the Pointer Points. In CLR, the managed heap knows the actual type of the object. Metadata (metadata) information can be used to determine the member object referenced by an object.

Garbage collection and Finalization

The garbage collector provides an additional function that can automatically call the Finalize method after the object is identified as garbage (provided that the object overrides the Finalize method of the object ).

 

 

The Finalize method is a virtual method of the object. You can override this method if needed, but this method can only be rewritten in a way similar to the c ++ destructor. For example:

 

Class Foo

{

~ Foo (){

Console. WriteLine ("Foo Finalize ");

}

} Programmers who have used C ++ here should pay special attention to the fact that the Finalize method write method is exactly the same as the C ++ destructor, ,. the Finalize method in. Net is different from the destructor. Managed Objects cannot be destructed and can only be recycled by garbage collection.

When designing a class, you 'd better avoid rewriting the Finalize method for the following reasons:
1. The object implementing Finalize will be promoted to an older "Generation", which increases the memory pressure and prevents the associated objects of the object from being reclaimed as garbage.
2. These objects will be allocated for a longer period of time.
3. Enabling the Garbage Collector to execute the Finalize method will significantly degrade the performance. Remember that the Finalize method must be executed for each object that implements the Finalize method. If there is an array object with a length of 10000, the Finalize method must be executed for each object.
4. The objects that override the Finalize method may reference other objects that do not implement the Finalize method. These objects will also be recycled in a delayed manner.
5. You have no way to control when to execute the Finalize method. If you want to release resources such as database connections in the Finalize method, the database resources may be released a long time later.
6. When the program crashes and some objects are still referenced, their Finalize method will not be able to be executed. In this case, the object is used in the background thread, when the program exits, or when the AppDomain is uninstalled. In addition, the Finalize method is not executed by default when the application is forced to end. Of course, all the operating system resources will be recycled, but the objects on the managed stack will not be recycled. You can change this behavior by calling the RequestFinalizeOnShutdown method of GC.
7. The execution sequence of Multiple object Finalize methods cannot be controlled during running. Sometimes the destruction of objects may be sequential.

If the defined object must implement the Finalize method, make sure that the Finalize method is executed as quickly as possible, and avoid all operations that may cause blocking, including any thread synchronization operations. In addition, make sure that the Finalize method does not cause any exceptions. If there is any exception, the garbage collector will continue to execute the Finalize method of other objects and directly ignore the exception.

When the compiler generates code, it automatically calls the base class constructor on the constructor. Similarly, the C ++ compiler automatically adds a base class destructor call for the destructor. However, the Finalize function in. Net is not like this, And the compiler will not perform special processing on the Finalize method. If you want to call the Finalize method of the parent class in the Finalize method, you must add the call code yourself.

Note that the Finalize method in C # is written in the same way as the destructor in c ++, but C # does not support the destructor. Do not trick you into this method.

Internal Implementation of GC calling Finalize method

On the surface, the garbage collector uses the Finalize method. You create an object and call its Finalize method when the object is recycled. But it is actually complicated.

When an application creates a new object, the new operator allocates memory on the stack. If the object implements the Finalize method. The object pointer is placed in the end queue. An end queue is an internal data structure controlled by the garbage collector. Each object in the queue needs to call its Finalize method when it is recycled.

The displayed stack contains several objects, some of which are objects and some are not. When objects C, E, F, I, and J are created, the system detects that these objects implement the Finalize method and places their pointers in the end queue.

 

The Finalize method usually recycles resources that cannot be recycled by the garbage collector, such as file handles and database connections.

When garbage collection is performed, objects B, E, G, H, I, and J are marked as garbage. The garbage collector scans the end queue to find pointers to these objects. When an object pointer is found, the pointer is moved to the Freachable queue. The Freachable queue is another internal data structure controlled by the garbage collector. The Finalize method of each object in the Freachable queue is executed.

After garbage collection, the managed heap 6 is shown. You can see that the objects B, G, and H have been recycled, because these objects do not have the Finalize method. However, objects E, I, and J have not been recycled because their Finalize method has not yet been executed.

Figure 5 managed heaps after garbage collection

When the program runs, a special thread is responsible for calling the Finalize method of objects in the Freachable queue. When the Freachable queue is empty, this thread will sleep. When there are objects in the queue, the thread is awakened, the objects in the queue are removed, and their Finalize method is called. Therefore, do not attempt to access the local storage of the thread when executing the Finalize method.

The interaction between the finalization queue and the Freachable queue is clever. First, let me tell you how the freachable name came from. F is obviously finalization; every object in this queue is waiting to execute their Finalize method; reachable means these objects are coming. In other words, objects in the Freachable queue are considered to be objects, such as global variables or static variables. Therefore, if an object is in the freachable queue, this object is not spam.

In short, when an object is inaccessible, the garbage collector considers the object as garbage. Then, when the garbage collector moves objects from the end queue to the Freachable queue, these objects are no longer junk and their memory will not be recycled. In this regard, the garbage collector has already identified the garbage, and some objects are marked as garbage and re-considered as non-garbage objects. The garbage collector recycles the compressed memory, clears the freachable queue, and executes the Finalize method of each object in the queue.

Figure 6 managed heap after garbage collection is executed again

After garbage collection is started again, objects that implement the Finalize method will be truly recycled. The Finalize method of these objects has been executed and the Freachable queue has been cleared.

Garbage collection brings objects to life
As we have mentioned above, when the program does not use an object, this object will be recycled. However, if an object implements the Finalize method, the object is considered recoverable and its memory is truly reclaimed only after the object's Finalize method is executed. In other words, such objects are first identified as garbage and then revived in the freachable queue before being recycled after Finalize is executed. It is the call of the Finalize method that gives this object a chance to be revived. We can make an object strongly reference this object in the Finalize method; then the Garbage Collector considers this object no longer garbage, the object is resurrected.

 

The following is a Demo code for resurrection:

 

Public class Foo {

~ Foo (){

Application. ObjHolder = this;

}

}

 

Class Application {

Static public Object ObjHolder = null;

}

In this case, after the Finalize method of the object is executed, the object is strongly referenced by the static field ObjHolder of the Application and becomes the root object. This object is revived, and the object referenced by this object is resurrected. However, the Finalize method of these objects may have been executed and unexpected errors may occur.

 

In fact, when you design your own types, the termination and resurrection of objects may be totally uncontrollable. This is not a good phenomenon; a common practice in this case is to define a bool variable in the class to indicate whether the object has executed the Finalize method. If the Finalize method is executed, an exception is thrown when other methods are executed.

 

Now, if another code snippet sets Application. ObjHolder to null, the object becomes inaccessible. Eventually, the garbage collector treats the object as garbage and recycles the object memory. Please note that this time the object will not appear in the finalization queue, and its Finalize method will not be executed any more.

 

Resurrection is only useful in a few ways. You should avoid using it as much as possible. Even so, it is better to re-Add the object to the end queue when the resurrection is used. GC provides the static method ReRegisterForFinalize to do this:

The following code:

 

 

Public class Foo {

~ Foo (){

Application. ObjHolder = this;

GC. ReRegisterForFinalize (this );

}

}

When the object is resurrected, the object is added to the resurrection queue again. Note that if an object is already in the end queue and the GC. ReRegisterForFinalize (obj) method is called, The Finalize method of this object will be executed repeatedly.

 

The purpose of the garbage collection mechanism is to simplify memory management for developers.

Next, let's talk about the role of weak references, the "Generation" in garbage collection, the garbage collection in multithreading, and the performance counters related to garbage collection.

 

This article is a translation article, Source Address:

Http://msdn.microsoft.com/zh-cn/magazine/bb985010 (en-us). aspx

If your E text is good, read the original text. If you find something wrong with my translation, please make a brick.

 

My weibo address is: http://weibo.com/yukaizhao I will write some technical fragments into Weibo, welcome to attention.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.