A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
English Original: Jeffrey Richter
Compilation: Zhao Yu Open
With the garbage collection mechanism in the Microsoft.NET CLR, programmers do not need to pay attention to when to release memory, freeing the memory is entirely done by the GC, which is transparent to the programmer. Nonetheless, as a. NET programmers need to understand how garbage collection works. We'll take a look at this article. NET is how to allocate and manage the managed memory, and then step-by-step describes the algorithm mechanism for the garbage collector to work.
To program a proper memory management strategy is difficult and tedious, and this work can also affect your focus on solving the problems that the program itself solves. Is there a built-in approach to help developers solve memory management problems? Of course, in. NET is GC, garbage collection.
Let's think about every program that uses memory resources: such as screen displays, network Connections, database resources, and so on. In fact, in an object-oriented environment, each type requires a bit of memory resources to hold his data, and the object needs to use the memory as follows:
1. Allocating memory space for a type
2. Initialize memory, set memory to available state
3. Accessing the members of the object
4. Destroy the object to make the memory empty
5. Freeing up memory
This seemingly simple pattern of memory usage causes a lot of program problems, and sometimes programmers may forget to release objects that are no longer in use, and sometimes try to access objects that have already been freed. These two kinds of bugs usually have a certain concealment, not easy to find, they do not like logic errors, found can be modified. They may have a memory leak that causes an unexpected crash after the program has been running for some time. In fact, there are many tools to help developers detect memory problems such as Task Manager, System Monitor Acitviex control, and rational purify.
The GC, however, does not require the developer to focus on when to release memory at all. However, the garbage collector is not able to manage all the resources in memory. Some resource garbage collector do not know how to recycle them, this part of the resources will require developers to write code to achieve recycling. In the. Net framework, developers often write code that cleans up such resources into the close, Dispose, or Finalize methods, and later we look at the Finalize method, which is called automatically by the garbage collector.
However, there are many objects that do not need to implement the code to release the resources themselves, such as: Rectangle, empty it only need to empty its left,right,width,height field, the garbage collector can do it completely. Let's take a look at how memory is allocated to objects.
The. Net CLR assigns all referenced objects to the managed heap. This is a lot like the C-runtime heap, but you don't have to focus on when to release objects that are automatically freed when not in use. So, there is a problem, how does the garbage collector know that an object is no longer using the recycle? We'll explain the problem later.
Now there are several garbage collection algorithms, each of which performs performance optimizations for a particular environment, and this article focuses on the CLR's garbage collection algorithm. Let's start with a basic concept.
When a process is initialized, the runtime retains a contiguous amount of blank memory space, which is the managed heap. The managed heap records a pointer, which we call it nextobjptr, which points to the assigned address of the next object, initially pointing to the starting position of the managed heap.
The application uses the new operator to create an object that first confirms that the remaining space on the managed heap can be placed on the object, and if it can be placed, point the Nextobjptr pointer to the object and then call the object's constructor, and the new operator returns the address of the object.
Figure 1 Managed heap
At this point, nextobjptr points to the location of the next object assignment on the managed heap, and figure 1 shows that there are three objects A, B, and C in a managed heap. The next object is placed where the nextobjptr points (next to the C object)
Now let's take a look at how the C-runtime heap allocates memory. In the C-runtime heap, allocating memory needs to traverse the data structure of a linked list until a block of memory is found that is large enough that the memory block may be split and the pointers in the list be broken to point to the remaining memory space, ensuring that the list is intact. For the managed heap, assigning an object simply modifies the point of the nextobjptr pointer, which is very fast. In fact, allocating an object on the managed heap and allocating memory on the thread stack is very close.
So far, the memory allocated on the managed heap seems to be much faster and easier to implement than on the C-runtime heap. Of course, the managed heap gains this advantage because it makes a hypothesis that the address space is infinite. It is obvious that this hypothesis is wrong. There must be a mechanism to ensure that this hypothesis is established. This mechanism is the garbage collector. Let's see how it works.
When an application calls the new operator to create an object, it is possible that there is no memory to hold the object. The managed heap can detect whether the space that nextobjptr points to is larger than the heap size, and if the managed heap is full, it needs to be garbage collected.
In reality, a garbage collection is triggered when the 0-generation heap is full. "Generation" is an implementation mechanism for the garbage collector to improve performance. The "generation" means that the newly created object is the younger generation, and that the object that was not reclaimed before the recycle operation occurred is an older object. Dividing an object into generations allows the garbage collector to reclaim only one generation of objects, rather than reclaiming all objects.
Garbage collection algorithm:
The garbage collector checks to see if there are objects that the application is no longer using. If such objects exist, then the space occupied by these objects can be reclaimed (if there is not enough memory available on the heap, then the new operator throws OutOfMemoryException). You may ask the garbage collector how to tell if an object is still in use. The question is not easy to get answers to.
Each application has a set of root objects, which are storage locations that may point to an address on the managed heap, or it may be null. For example, all global and static object pointers are the root object of the application, and the local variables/parameters on the thread stack are also the root objects of the application, and the objects in the CPU registers that point to the managed heap are also root objects. The list of surviving root objects is maintained by the JIT (just-in-time) compiler and the CLR and the garbage collector can access these root objects.
When the garbage collector starts running, it assumes that all objects on the managed heap are garbage. That is, assume there is no root object and no object referenced by the root object. The garbage collector then begins to traverse the root object and build a diagram that consists of a reference relationship object between all and the root object.
Figure 2 shows that the root object of the application on the managed heap is a,c,d and F, these objects are part of the graph, then object D refers to Object H, then object H is added to the diagram, and the garbage collector loops through all the objects that can be reached.
Figure 2 Objects on the managed heap
The garbage collector iterates through the root object and the Reference object. If the garbage collector finds an object that is already in the diagram, it will continue traversing in a different path. This has two purposes: one is to improve performance, and the other is to avoid infinite loops.
After all the root objects have been checked, the garbage collector's diagram has all the objects that are available in the application. All objects on the managed heap that are not on this diagram are garbage objects to be recycled. After building up the object graph, the garbage collector begins to traverse the managed heap linearly, finding contiguous blocks of garbage (which can be considered free memory). The garbage collector then moves the non-garbage objects together (using the memcpy function in C), overwriting all memory fragments. Of course, you want to disable pointers to all objects when moving objects (because they can all be wrong). Therefore, the garbage collector must modify the application's root object so that they point to the new memory address of the object. In addition, if an object contains a pointer to another object, the garbage collector is also responsible for modifying the reference. Figure 3 shows the managed heap after a collection.
Figure 3 Managed heap after recycling
3 shows that after recycling, all the junk objects are identified, and all non-garbage objects are moved together. All non-spam pointers are also modified to the moved memory address, Nextobjptr points to the last non-garbage object behind. The new operator can then continue to create the object successfully.
As you can see, there is a significant performance penalty for garbage collection, which is a significant disadvantage of using managed heap. Remember, however, that the memory-reclamation operation is designed to slow the hosting heap before it executes. The performance of the managed heap before full is better than the performance of the C-runtime heap. The runtime garbage collector also performs some performance optimizations, which we'll talk about in the next article.
The following code illustrates how an object is created and managed:
You might ask, why is the GC so good that it doesn't have it in ANSI C + +? The reason is that the garbage collector must be able to find a list of the application's root objects and must find pointers to the objects. The pointers to objects in C + + can be converted from one to the other, and there is no way to know the pointer to what the object is pointing to. In the CLR, the managed heap knows the actual type of the object. Metadata (metadata) information can be used to determine what member objects are referenced by the object.
Garbage collection and finalization
The garbage collector provides an additional feature that automatically calls its Finalize method after an object is identified as garbage (provided the object overrides the Finalize method of object).
The Finalize method is a virtual method of object objects, and you can override this method if you want, but this method can only be overridden by a C + + destructor. For example:
C + + programmers here have to pay special attention to the method of finalize is the same as the destructor of C + +, however, the Finalize method and the destructor in. Net are not the same, the managed object can not be destroyed and can only be reclaimed by garbage collection.
When you design a class, it is best to avoid rewriting the Finalize method for the following reasons:
1. Objects that implement finalize are promoted to older "generations", which increases memory pressure so that objects and associated objects of this object cannot be reclaimed the first time they become garbage.
2. These objects will be allocated for a longer period of time
3. Having the garbage collector perform a Finalize method can have a noticeable loss of performance. Keep in mind that every object that implements the Finalize method needs to perform a Finalize method, and if there is an array object of length 10000, each object needs to execute the Finalize method
4. Objects that override the Finalize method may reference other objects that do not implement the Finalize method, and these objects also delay recycling
5. You have no way of controlling when the Finalize method is executed. If you want to release resources such as database connections in the Finalize method, it is possible that database resources will be released long after the time
6. When a program crashes, some objects are also referenced, and their finalize method has no chance to execute. This situation uses the object in the background thread, or when the object exits the program or when the AppDomain unloads. Also, by default, the Finalize method does not execute when the application is forced to end. Of course, all operating system resources are recycled, but objects on the managed heap are not recycled. You can change this behavior by invoking the GC's Requestfinalizeonshutdown method.
7. The runtime cannot control the order in which multiple objects are executed by the Finalize method. And sometimes the destruction of objects can be sequential.
If you define an object that must implement the Finalize method, ensure that the Finalize method executes as quickly as possible, avoiding all possible blocking operations, including any thread synchronization operations. Also, to make sure that the Finalize method does not cause any exceptions, if there is an exception the garbage collector continues to execute other objects that the Finalize method directly ignores the exception.
The constructor of the base class is called automatically on the constructor when the compiler generates code. Similarly, C + + compilers will automatically add a call to the destructor for the base class destructor. But. The Finalize function in net is not such that the compiler does not treat the Finalize method in a special way. If you want to call the Finalize method of the parent class in the Finalize method, you must display the add calling code yourself.
Note that in C # The Finalize method is written in the same way as a destructor in C + +, but C # does not support destructors, so don't let this be a trick.
The internal implementation of the GC call Finalize method
On the surface, the garbage collector is very simple to use the Finalize method, and you create an object that calls its Finalize method when the object is reclaimed. But it's actually a little more complicated.
When an application creates a new object, the new operator allocates memory on the heap. If the object implements the Finalize method. The pointer to the object is placed in the finalization queue. The end queue is an internal data structure that is controlled by the garbage collector. Each object in the queue needs to call their Finalize method when it is reclaimed.
The heap that is displayed contains several objects, some of which are objects, and some objects are not. When objects C, E, F, I, and J are created, the system detects that these objects implement the Finalize method and places their pointers in the finalization queue.
What the Finalize method does is usually to recycle resources that the garbage collector cannot reclaim, such as file handles, database connections, and so on.
Objects B, E, G, H, I, and J are marked as garbage when garbage is reclaimed. The garbage collector scans the end queue to find pointers to these objects. When the object pointer is found, the pointer is moved to the freachable queue. The freachable queue is another internal data structure that is controlled by the garbage collector. The Finalize method of each object in the freachable queue is executed.
After garbage collection, the managed heap is shown in 6. You can see that objects B, G, h have been recycled because these objects do not have a finalize method. However, objects E, I, and J have not yet been recycled because their finalize method has not yet been implemented.
Figure 5 managed heap after garbage collection
The program runs with a dedicated thread responsible for invoking the Finalize method of the object in the Freachable queue. When the freachable queue is empty, the thread sleeps, and when there are objects in the queue, the threads are awakened, the objects in the queue are removed, and their finalize methods are called. Therefore, do not attempt to access the thread's local storage while executing the Finalize method.
The interaction between the end queue (finalization queue) and the freachable queue is ingenious. First let me tell you how Freachable's name came from. F is obviously finalization; each object in this queue is waiting to execute their finalize method; reachable means that these objects are coming. In other words, the objects in the Freachable queue are considered to be objects, like global variables or static variables. Therefore, if an object is in the freachable queue, then this object is not garbage.
Briefly, when an object is unreachable, the garbage collector considers the object to be garbage. Then, when the garbage collector moves objects from the end queue to the freachable queue, the objects are no longer garbage, and their memory is not recycled. From this point of view, the garbage collector has completed the identification of garbage, some objects are identified as garbage and re-considered as non-garbage objects. The garbage collector reclaims compressed memory, empties the freachable queue, and executes the Finalize method for each object in the queue.
Figure 6 managed heap after garbage collection is performed again
After you start the garbage collection again, the objects that implement the Finalize method are actually recycled. The Finalize method for these objects has been executed, and the freachable queue is emptied.
Garbage collection to revive objects
As we have said in the previous section, when a program does not use an object, the object is recycled. However, if an object implements a Finalize method, it is considered to be a recyclable object and actually reclaims its memory only if the object's Finalize method is executed. In other words, such objects are identified as garbage first, then resurrected in the Freachable queue, and then recovered after the finalize is executed. It is the call of the Finalize method that gives this object a chance to resurrect, and we can make a strong reference to the object in the Finalize method, so the garbage collector thinks the object is no longer garbage, and the object is resurrected.
The following Resurrection Demo code:
In this case, when the object's Finalize method executes, the object is objholder a strong reference to the application static field and becomes the root object. This object is resurrected, and the object referenced by this object is resurrected, but the Finalize method of these objects may have been executed and there may be unexpected errors.
In fact, when you design your own type, the end and resurrection of objects can be completely uncontrollable. This is not a good phenomenon; a common practice for dealing with this situation is to define a bool variable in the class to indicate whether the object has executed the Finalize method and throw an exception if the Finalize method is executed and other methods are executed.
Now, if there are other code fragments and the Application.objholder is set to null, the object becomes an unreachable object. The final garbage collector treats objects as garbage and reclaims object memory. Note that this time the object does not appear in the finalization queue, and its Finalize method is no longer executed.
The resurrection has only a limited number of uses, and you should avoid using the resurrection as much as possible. However, when using resurrection, it is best to re-add the object to the finalization queue, and the GC provides a static method ReRegisterForFinalize method to do this:
The following code:[CSharp]View Plaincopy
When the object is resurrected, the object is re-added to the Resurrection queue. It is important to note that if an object is already in the finalization queue, then the Gc.reregisterforfinalize (obj) method is called, which causes the Finalize method of this object to execute repeatedly.
The purpose of the garbage collection mechanism is to simplify memory management for developers.
Next we talk about the role of weak references, "generation" in garbage collection, garbage collection in multiple threads, and performance counters related to garbage collection.
Principles of. Net garbage Collection mechanism (i)
Start building with 50+ products and up to 12 months usage for Elastic Compute Service