As. NET advanced content, the garbage collector (GC) is something that must be understood. In the "easy to understand" principle, this article explains how the garbage collector works in the CLR.
Basic knowledge
Managed heap (Managed heap)
Let's take a look at MSDN's explanation: When you initialize a new process, the runtime retains a contiguous area of address space for the process. This reserved address space is referred to as the managed heap.
"The managed heap is also a heap," Why do you say so? This is to say that you do not want to be confused by the term, the premise of this knowledge point is "the difference between the value type and the reference type". This assumes that the reader already knows that "value types are stored in the stack, and reference types are stored in the heap." (Reference-type references are stored in the stack) "this important concept. So, according to this theory, except for value types, the CLR requires that all resources be allocated from the managed heap.
The managed heap maintains a pointer, named Nextobjptr, that points to where the next object is allocated in the heap.
CPU register (CPU register)
This is a basic computer knowledge, here to review, to help the following "root" concept of understanding.
CPU registers are the CPU's own "temporary memory", which is faster than memory access. Press with the CPU far more recently, the closest is the register, then the cache (computer one, tertiary cache), the last memory.
Root (Roots)
Any static fields defined in a class, parameters of a method, local variables (reference type variables only), and so on are all roots, and the object pointers in the CPU registers are also root. The root is a variety of entry points that the CLR can find outside of the heap.
objects can be reached with unreachable (Objects reachable and unreachable)
If a root references an object in the heap, the object is "unreachable", otherwise it is not reached.
Reasons for garbage collection
From the computer's point of view, all programs are to reside in memory and run. and memory is a limiting factor (size). In addition, the managed heap has a size limit. If the managed heap has no size limit, C # executes faster than C (the structure of the managed heap makes it faster to allocate objects than the C run-time heap). Because of the address space and storage constraints, the managed heap is going through a garbage collection mechanism to maintain its normal operation, guaranteeing the allocation of objects without "memory overflow".
Fundamentals of Garbage Collection
Recycling is divided into two stages: tag –> compression
The process of marking, in fact, is the process of judging whether an object can be reached. When all the roots are checked, the heap will contain objects that are up to (marked) and unreachable (unlabeled).
When the tag is complete, enter the compression phase. In this phase, the garbage collector traverses the heap linearly to find contiguous blocks of memory for unreachable objects. And move the accessible object here to compress the heap. This process is somewhat similar to defragmentation of disk space.
As shown, the green box indicates an unreachable object, and the yellow box is an unreachable object. When an unreachable object is cleared, the move-up object implements memory compression (becoming more compact).
After compression, the variables and CPU registers for pointers to these objects are now invalidated, and the garbage collector must revisit all the roots and modify them to point to the new memory location of the object. This can result in significant performance damage. This loss is also a major drawback of the managed heap.
Based on the above characteristics, garbage collection-induced recovery algorithm is also a research topic. Because if you wait until the hosting heap is full to start a garbage collection, it's really "slow".
Garbage collection algorithm-generational (Generation) algorithm
Generation is a mechanism used by the CLR garbage collector, and its sole purpose is to improve the performance of the application. Recycling is faster than recycling the entire heap.
CLR managed heap support 3 generation: No. 0 generation, 1th generation, 2nd generation. The No. 0 generation of space is about 256KB, the 1th generation is about 2M, and the 2nd generation is about 10M. The newly constructed object will be assigned to the No. 0 generation,
As shown, when the No. 0 generation of space is full, the garbage collector starts recycling, the unreachable object (C, E) is recycled, and the surviving object is classified as the 1th generation.
When the No. 0 generation of space is full, and the 1th generation is beginning to have a lot of unreachable objects and the space will be full, then both generations of garbage will be recycled. The surviving object (to reach the object), the No. 0 generation to the 1th generation, the 1th generation to the 2nd generation.
The actual CLR's generation-recycling mechanism is more "intelligent", and if the newly created object has a short life cycle, the No. 0 generation of garbage will be reclaimed by the garbage collector immediately (no more space allocated). In addition, if the No. 0 generation is recycled, there are still many objects that can be reached,
Does not release much memory, it will increase the No. 0 generation of the budget to 512KB, the recycling effect will be changed to: The number of garbage collection will be reduced, but each time will be recycled a lot of memory. If you have not freed much memory, the garbage collector will perform
Full Recycling (3 generation), if not enough, throws a "memory overflow" exception.
In other words, the garbage collector dynamically adjusts the allocated space budget for each generation based on the size of the reclaimed memory! Achieve automatic optimization!
Summarize
There is a basic idea behind garbage collection: programming languages (most of them) seem to always have access to unlimited memory. And developers can always allocate, distribute, and distribute--like magic, inexhaustible.
. NET garbage collector basically works by clearing the unreachable object by the most basic principle of Mark clearing, compressing and defragmenting the available memory like disk defragmentation, and finally realizing the performance optimization by the generational algorithm.