GC: Automatic Memory Management in. net framework-part 2)

Source: Internet
Author: User

Garbage Collection Part 2: Automatic Memory Management in the Microsoft. NET Framework


GC: Automatic Memory Management in. net framework


Jeffrey Richter


This document assumes that you are familiar with C and C ++

Summary: The first part of this article has discussed how the GC algorithm works, how to properly recycle the memory when the GC decides to release resources, and how to forcibly release the memory.
A free memory. This section summarizes how strong object references and weak object references solve the problem of managing large objects in the memory, and explains the object generation and
How they improve performance. In addition, it also describes some methods and attributes used to control GC, monitoring resources for recovery performance, and GC in multi-threaded programs.

Last month, I introduced the use of the GC environment to simplify the memory management of programmers. I also discussed the traditional algorithms used by CLR and the internal mechanism of the algorithms.
At the same time, I also explain how programmers can explicitly manage and clear resources by introducing Finalize, Close, or Dispose methods. This month, I will continue to discuss CLR


First, I will first explore a feature called weak reference. You can reduce the memory pressure when the managed heap allocates large objects. Next, I will discuss how GC uses the "Generation" concept.
To enhance GC performance. Finally, I will summarize some other performance optimization methods provided by GC, such as multithreading collection and the performance counters provided by CLR for Monitoring Real-Time GC operations.


Weak reference

When a root Pointer Points to an object, the object cannot be recycled because it can be accessed by the program. When a root Pointer Points to an object, we call this object a strongly referenced object.
However, GC also supports weak references. The weak application allows GC to recycle the object, and runs a program to access the object. How can this be achieved? It all comes down to time issues.

When GC is running, there is a weak reference object, the object will be recycled, and the program will not be able to access the object. In addition, to access a weak application object, the program
Must contain a strong reference to this object. If the program contains a strong reference of this object before GC collection, GC cannot recycle this resource. I know you are confused,
We can clarify this problem through the code segment in Figure 1 below:

Figure 1 Strong and Weak References Void Method () {Object o = new Object (); // Create a strong reference to an object // Create a Strong reference to a short WeakReference Object. // The WeakReference object tracks the Object. weakReference wr = new WeakReference (o); o = null; // Remove the strong reference to the object o = wr. target; if (o = null) {// a gc occurred and Object was reclaimed .} else {// a GC did not occur and we can successfully access the Object // using o }}

Why do you use weak references? If there are some data structures that are easy to create, but a large amount of memory is required temporarily. For example, you may want your program
The file name is used to retrieve all the paths on the user's hard disk. You can easily construct such a document tree to reflect this information. You will choose to put this information in memory instead
Directly access the user's hard disk. This process greatly improves the performance of your application.


The problem is that this tree requires a lot of memory resources. If the user accesses other parts of the program, there is no need to put this tree in the memory to waste space.
You can delete the tree, but when the user returns to the first part of the program, you must reconstruct the tree. Weak applications can be used in simple and efficient scenarios.


When you convert from the first part to other parts of the program, you can create a weak reference to this tree and then delete all strong references. If other parts of the program do not need
GC does not recycle the resource because it occupies a large amount of memory. When the user returns the first part of the program, the program will try to obtain a strong application of this tree. If the application succeeds
You do not need to access your hard disk.


The weak reference type provides two constructors:

WeakReference (Object target );
WeakReference (Object target, Boolean trackResurrection );


A parameter is an object that requires weak reference. The trackResurrection parameter indicates whether the weak reference object should be tracked after the Finalize method is called. Usually false is passed,
The first constructor identifies a weak reference type of the software and does not need to trace whether the software is reborn. (For the concept of rebirth, refer to the first part of the article)


For convenience, for weak references that do not require tracking rebirth, we call short weak references. If we need to track rebirth, we call it long weak applications. If an object does not provide the Finalize method,
So short and weak applications are the same. We strongly recommend that you avoid using long and weak applications. Long and weak applications can be reborn after the finalize method is called by the object, and the object state is unpredictable.


Once you create a weak application type object, you usually need to set the strongly referenced type object to null. If there is a strong application, the object will not be recycled by GC.


To reuse an object, you must point the weak application back to a strong reference. Simply point the Target attribute of the referenced object to the root object of the program.
If the Target attribute is returned as null, the object is recycled. If the object is not returned as null, you can create a strongly referenced object to this object, then the code will
This object can be maintained. Because there is a strong application, the object cannot be recycled.


Internal principles of weak references

In the above discussion, it is obvious that weak applications are not doing the same as other types. Generally, if your program root has a reference to this object, this object references another
Objects, the two objects are accessible, and GC cannot re-allocate the memory of these two objects. However, if your program root SET has a weak reference object
Objects pointed to by weak references are not considered reachable and may be recycled.


To better understand how weak references work, let's look at the internal hosting of the pair. The managed heap contains two internal data structures. The parent is used to manage weak reference objects:
Short weak referenced table and long weak referenced table. These two tables contain pointers to managed objects.


During initialization, both tables are empty. When you create a weak application object, the object does not allocate resources from the hosting pair, but instead allocates an empty slot from the weak reference table,
Short and weak reference types Use short and weak Reference Type tables, while long and weak reference types use long and weak Reference Type tables.


Once an empty slot is found, the slot is set to the address of the object you want to trace-the object pointer passed from the WeakReference constructor.
The address of the slot is returned by the instantiation operation. Obviously, two weak referenced tables are not part of the program root set or GC cannot re-allocate the object pointer in the table.

Now let's take a look at what will happen during GC runtime:


1. GC creates a map of reachable objects. The first part of this article has discussed how GC is implemented.

2. GC search for a short and weak reference table. If the pointer points to an object that is not part of the object graph, the pointer points to an inaccessible object, and the slot will be
Set to null.

3. GC search finalization queue. If the pointer in the queue points to an object that is not the content of the object graph, the pointer indicates an inaccessible object.
Move from the finalization queue to the freachable queue. At this point, the object is added to the object graph because the object is considered accessible.

4. GC traversal of the long reference type table. If the pointer of the weak reference object is not part of the graph (and the pointer of the object is included in the freachable Queue), the pointer
Indicates an inaccessible object, and the slot is set to null.

5. GC compresses memory and compresses the space left by unreachable objects.


Once you understand the running logic of GC, it is easy to understand how weak references work. When you access the Target attribute of a weak reference, the weak application table
. If the slot is null, the object is recycled.


Short and weak references will not be traced again. This means that, once the GC considers the object to be inaccessible, the pointer in the short weak reference table is set to null. If the object has
Finalize method. If this method is not called, this object still exists. If the program accesses the Target attribute of the weak reference type, null is returned, even if the object still exists.


A long weak reference will be traced and reborn, which means that when the object's storage space is recycled, the pointer of the long weak reference table will be set to null. If the object uses Finalize
Method, the Finalize will be called, and the object will not be reborn.



When I first came into contact with the GC environment, I had a lot of concerns about its performance. After all, I have been developing C/C ++ for more than 15 years. I understand the overhead of allocating and releasing memory blocks.
Of course, every version of Windows and every version of C will optimize the heap algorithm to improve performance.

Of course, GC developers have been optimizing GC to improve performance, just like for Windows and C Runtime developers. A feature in GC called "Generation" exists completely to improve performance.
One generation of garbage collection (also called ephemeral Garbage Collector) has the following features:

The newer the object, the shorter the lifecycle
The older the object, the longer the lifecycle
New objects have strong exercises and are often accessed around them.
One area of the compression pair is faster than the whole heap.

Of course, many articles have demonstrated these features through a large number of programs. Therefore, we discuss how these features affect GC implementation.


During initialization, the managed object does not contain objects. The object added to the heap is called The 0th generation, as shown in fiure 2. Simply put, the 0th generation object is the new object detected by GC.

Now, if an object is added to the heap, the managed heap will be full, and GC must be executed. When the GC analysis heap is performed, it creates garbage (purple part) and non-garbage object graphs.
Any surviving object is compressed to the left of the heap. These objects survive as a collection, which is older. Now we call them 1st generations. (For example, figure 3)

If new objects are added to the heap, the new ones will be placed in the 0th generation. If the 0th generation is full, GC starts to run. This time, the first generation of surviving objects will
Compress to 2nd generations (such as Figure 4 ). All 0th-generation objects are compressed into 1st-generation objects. At this time, the 0th generation has space, and new objects can be stored in the 0th generation.

Currently, only the second generation of GC is supported. In the future, when recycling, any second-generation surviving objects will remain in the second-generation.


Generation GC Performance Optimization

As I mentioned earlier, the replacement mechanism is to improve performance. When it is full, recycling starts. GC can choose to check only The 0th generation and ignore other higher generation.
After all, the shorter the lifecycle of a new object. Therefore, recycling and compressing 0th-generation objects can usually reclaim a large amount of space, which is much faster than recycling and detecting all generations.

This is the simplest optimization measure provided by the GC generation concept. Generation-level collection does not traverse all objects in the managed heap, which improves GC performance. If the root or an object references an object of the old generation
GC ignores the internal applications of these old objects, thus reducing the time needed to create an object graph. Of course, it may be that the old object references the new object. Therefore, when these objects are detected,
The recycler will use the write listening support provided by the getwritewatch method in Win32 kernel32.dll to let the recycler know that the object has been written after the previous collection.
These specific old pairs want to check whether they have referenced new objects.


If 0th generations of resources cannot provide enough resources, the payback period will be from 1st and 0 generations. If not, the recycler recycles resources from generation 2, 1, and 0.
The lookback tool determines which generation of uanfa is continuously optimized by Microsoft.


Most of the heap (heap for C Runtime) will be allocated to an object if a large empty control is found. Therefore, if I create multiple objects consecutively, it is very likely that these object addresses are separated by several bytes.
However, in the hosting center, the continuous allocation of several objects can ensure that the object is continuous in the memory.


As mentioned earlier, new objects generally have strong exercises and are often asked back. Because the new object is allocated to the continuous memory space, you can use it
Reference addresses for better performance. In particular, it is very likely that all objects are allocated to the CPU buffer. Your program can access these objects very quickly
Because the CPU will do a lot of work to forcibly Access RAM to ensure the cache hit rate.


Microsoft's performance testers showed that the heap hosting ratio was faster than other traditional memory allocation methods using win32 HeapAlloc functions. These testers also showed that the machines on the Pentium MHz
When GC recycles 0th generations of resources, it takes less than 1 millisecond. Microsoft's goal is to reduce GC time than common page errors.

Direct control of system. GC

The System. GC class allows your application to directly control garbage collection. For beginners, you can use the GC. MaxGeneration attribute to query the maximum generation supported by GC. Currently, GC. MaxGeneration


At the same time, you can call one of the following two methods to force GC to recycle resources.

Void GC. Collect (Int32 Generation
Void GC. Collect ()


The first method allows you to specify the recycle generation. You can pass a parameter that contains 0 to GC. MaxGeneration. If the value is 0, 0th generations are recycled. If the value is 1, 1 is recycled. If the value is 0, 2 is recycled.
Reclaim generation 2, 1, and 0. The Collect Function Identifier of the parameter is equivalent:

GC. Collect (GC. MaxGeneration );


In most cases, you should try to avoid using the Collect function. It is best to have the GC run automatically as needed. However, even if the program is more aware of its actions than it is running, you can
Explicitly specify force reclaim. For example, it makes sense to force your application to perform a full reclaim after the user saves all of his data files. I guess the browser should be
When the page is uninstalled, perform a full recycle. You may execute a garbage collection task when your program requires a long time to perform operations. This can hide the GC collection time and prevent
The user recycles resources when operating applications.


The GC class also provides the WaitForPendingFinalizes method. This method only suspends the call thread and calls the Finalize method of the object until the freachable queue is cleared.
Half of most programs do not call this function.


GC also provides two methods to determine the generation of an object:
Int32 GetGeneration (Object obj)
Int32 GetGeneration (WeakReference wr)


The first function requires an object reference as a parameter, while the second function requires an object to be referenced. Of course, the returned value is a certificate between 0 and GC. MaxGeneration.

The code segment in Figure 5 helps you understand how generation works, and shows how to use the GC method I just discussed.

Figure 5 GC Methods Demonstration private static void GenerationDemo() {    // Let's see how many generations the GCH supports (we know it's 2)    Display("Maximum GC generations: " + GC.MaxGeneration);    // Create a new BaseObj in the heap    GenObj obj = new GenObj("Generation");    // Since this object is newly created, it should be in generation 0    obj.DisplayGeneration();    // Displays 0    // Performing a garbage collection promotes the object's generation    Collect();    obj.DisplayGeneration();    // Displays 1    Collect();    obj.DisplayGeneration();    // Displays 2    Collect();    obj.DisplayGeneration();    // Displays 2   (max generation)    obj = null;         // Destroy the strong reference to this object    Collect(0);         // Collect objects in generation 0    WaitForPendingFinalizers();    // We should see nothing    Collect(1);         // Collect objects in generation 1    WaitForPendingFinalizers();    // We should see nothing    Collect(2);         // Same as Collect()    WaitForPendingFinalizers();    // Now, we should see the Finalize                                    // method run    Display(-1, "Demo stop: Understanding Generations.", 0);}


Performance of multi-threaded programs

In the previous sections, I explained the GC algorithm and optimization measures. However, the previous discussion was based on the premise that only one thread is running. In the real world,
It is likely that multiple threads operate on the object allocation in the managed heap during access hosting. When a thread starts a collection, other threads cannot access
Other objects (including object references in its own stack), because the recycler may move these objects and change the memory address.


Therefore, when GC starts a collection, all threads that operate on hosted code must be suspended. The runtime has different mechanisms to safely suspend these threads.
To ensure smooth recovery. The reason for using multiple mechanisms is to ensure that the thread runs for as long as possible, while minimizing other loads. I don't want to go deep
But it can be said that Microsoft has done a lot of work to reduce the garbage collection overhead. Microsoft will constantly modify these mechanisms to ensure
More efficient garbage collection.


The following section describes some GC mechanisms when a program uses multiple threads:


Complete code interruptionWhen recycling starts, the recycler will interrupt all program threads. The recycler then determines where the thread is interrupted and uses the tables provided by the JIT compiler.
The review will show where the table name thread is stopped, which object applications the code is accessing, and where the object applications are located (in variables, CPU registers, etc)


HijackingThe address pointer of a specific function can be modified in the thread stack during the payback period. When the current execution function returns, the specific function is executed and the thread is suspended.
Modifying the thread execution sequence is called thread hijacking. After the collection is complete, the thread will wake up and return to the previously called function.


Security PointWhen the JIT compiler compiles a function, it can insert a specific method to check whether the GC is waiting. If yes, the thread is suspended and GC is running,
Then the thread runs again. The location where the compiler inserts and calls these methods is called GC security points.

Note: thread hijacking allows the thread to continue executing the original code after executing the unmanaged code when GC occurs. This is not a problem because the unmanaged code is not
Access the managed object unless it is a fixed object and does not contain other object references. A fixed object is an object that GC cannot move in memory.
If the managed code is returned from the unmanaged code in the thread, the thread is hijacked and suspended until the GC completes collection.

In addition to the mechanism I just mentioned, GC also provides some improvements to enhance the performance of multi-threaded program object allocation and collection.


Synchronous allocationIn a multi-processor system, The 0th generation of the hosting pair is divided into multiple memory regions, with each thread in one zone. This allows multiple threads to allocate memory at the same time without having
Use exclusive access to the managed heap.


Scalable recoveryRun the server's execution engine (MSCorSvr. dll) in a multi-processor system. The managed heap is divided into multiple parts, one part of the CPU, and when the collection is initialized
Each CPU has one thread, and each thread recycles all its parts at the same time. The execution engine (MSCorSvr. dll) of the workstation does not support this function.


Reclaim large objects

You need to pay attention to the performance optimization measures. Large objects (20 K or more) are allocated to a special large object heap. The objects in the heap can be the same as other small objects.
Is finalized and released. However, large objects will not be compressed, because compressing 20 K memory blocks will waste a lot of CPU time.

Note that these mechanisms are transparent to your program code. For you and developers, there is just one hosting pair. These mechanisms only aim to improve the performance of the application at the beginning.


Listen for garbage collection

The Microsoft runtime team provides a series of performance counters that provide real-time statistics on many runtime operations. You can use System Monitor Activx in Windows 2000
To view the statistics. To access System Monitor control, you only need to run perfmon.exe. Click the + button in the toolbar. The window shown in Figure 6 is displayed.

Monitor the garbage collection during running, select the COM + Memory Mace object (windows 2003 is. net clr Memory), and then you can select a specific program from the example list.
At this time, system monitor will plot the Selected Real-time statistics. Figure 7 describes the functions of each counter.




It means all GC content. Last month, I spoke about how resources are allocated, how automatic recovery works, and how to use the finalization function to run Automatic Object clearing,
And the reborn function can re-access an object. This month, I explained how to introduce weak referenced objects and strongly referenced objects, classify objects as generation performance advantages, and how you can use
System. GC manually controls Resource Recovery. I also mentioned the performance optimization measures provided by GC in multi-threaded programs, what will happen when the object exceeds 20 K, and finally how to use Windows 2000
To monitor the performance of garbage collection. With this, you should be able to simplify your memory management and improve the performance of your application.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.