Design of CLR garbage collection
Author: Maoni Stephens (@maoni0)-2015
Attached: For information on garbage collection, refer to the garbage collection manual as referenced in the Resources section at the end of this article.
Component architecture
The GC contains two components, namely the memory allocator and the garbage collector. The memory allocator is responsible for acquiring more memory and triggering garbage collection when appropriate. The garbage collector reclaims the memory of objects that are no longer used in the program.
There are several ways to invoke the garbage collector, such as manually calling Gc.collect or when the terminating thread receives an asynchronous notification that represents low memory (called).
Design of memory Allocator
The memory allocator is called by the execution Engine (EE) Memory allocation helper function and attaches the following information:
- Size of request
- Thread Assignment Context
- An identity that describes whether the object can be terminated.
The GC does not discriminate between different objects. Get the size of the object by executing the engine.
Based on the size of the object, the GC divides it into two categories: small objects (< 85,000 bytes) and large objects (>= 85,000 bytes). In principle, size objects can be treated equally, but it is more expensive to compress large objects, so the GC differentiates them.
The GC free memory to the memory allocator is done through the memory allocation context. The size of the memory context has the allocation limit defined:
- The memory allocation context (Allocation contexts) is a smaller area on a thread-specific heap area (heap segment). On a single-processor (that is, a logical processor) machine, use the single context, which is the No. 0 generation memory allocator context.
- memory allocation quotas (Allocation quantum) are allocations that are allocated when the allocator requires more memory to perform object allocations in a memory allocation context. This quota is usually 8k, and the average size of the managed object is approximately 35 bytes, so that the allocation request for many objects can be satisfied in a single allocation amount.
Large objects do not use allocation contexts and quotas. A large object itself is larger than these small memory areas (8k quotas). Furthermore, the advantages of these areas (discussed below) apply directly to small objects. The large objects are allocated directly on the heap area.
The dispenser is designed with the following objectives:
- trigger GC When appropriate: The allocator is triggered when it exceeds the allocated budget (a threshold set by the collector) or when the allocator cannot allocate on the heap. The allocation budget and the managed heap area are described in detail later in this article.
- preserve the locality of objects: objects allocated on the same heap, and the virtual memory addresses that hold them are next to each other.
- increase the efficiency of the cache: the allocator allocates memory on a per- allocation basis, instead of an object assignment. It prepares these memory zero for the CPU cache in advance, because objects are created in this memory immediately thereafter. Allocation quotas are usually 8k.
- improve lock efficiency: thread affinity and quorum guarantee for memory allocation contexts there is only one thread that writes the memory of the specified quorum allocation. The result is that the object assignment does not need to be locked as long as the current memory allocation context is not exhausted.
- Memory Integrity: for newly created objects, the GC always resets the memory to zero to prevent the object from referencing a random memory location.
- to keep the heap ergodic: The allocator guarantees that the remaining memory of the quota is a free object. For example, if there are only 30 bytes left in the quota, and the next object to be allocated is 40 bytes, the allocator creates a free object for those 30 bytes and requests a new allocation quota.
Memory allocation APIs
Object* GCHeap::Alloc(size_t size, DWORD flags); Object* GCHeap::Alloc(alloc_context* acontext, size_t size, DWORD flags);
The above functions can be used to assign large objects and small objects. There is also an object that can allocate memory directly in the large object heap:
Object* GCHeap::AllocLHeap(size_t size, DWORD flags);
Collector's Design GC targets
The GC will strive for the most efficient use of memory and the human intervention of programmers who try to avoid writing "managed code". High efficiency is defined as:
- GC should occur frequently enough to avoid having a large number of allocated but useless objects (garbage) on the managed heap, resulting in unnecessary use of memory.
- The GC should be as infrequent as possible, avoiding the use of useful CPU time, even in the case of low memory-induced frequent GC.
- The GC should have an efficient output. If the GC reclaims only a small fraction of the memory, then the GC, including the CPU cycles it uses, is wasted.
- Each time the GC should be as fast as possible. Many workloads require low latency.
- Managed code programmers should be able to achieve efficient memory usage without having to know too much detail about the GC.
- The GC should adjust itself to accommodate different memory usage patterns.
Logical form of the managed heap
The CLR GC is a generational collector, where objects are logically divided into generations. When the nth generation has been collected, the remaining surviving objects are identified as the first generation of n+1 . This process is called upgrading. We have decided to downgrade or not upgrade.
The small object heap is divided into 3 generations: Gen0, Gen1 and Gen2. Large objects have only one generation-Gen3. Gen0 and Gen1 are known as short-lived generations (the object survives not long).
For the small object heap, the generation number represents its age-gen0 belongs to the youngest generation. This is not to say that all objects in Gen0 are younger than any object in Gen1 or Gen2. Some exceptions are mentioned later in this article. Collection generation refers to the collection of this generation and all its younger generations.
In principle large objects can be handled in the same way as small objects, but the cost of compressing large objects is very high and is treated differently. For performance reasons, large objects are only a generation and are always collected with Gen2. Gen2 and Gen3 can be very large, but the cost of collecting short-lived generations (Gen0 and GEN1) is limited.
Memory allocation occurs in the youngest generation-always gen0 for small objects, and Gen3 for large objects, because there is only one generation.
Physical form of the managed heap
The managed heap is a series of managed heap areas. A managed heap area is a contiguous area of memory that the GC applies to from the operating system. The heap area is divided into the Size object area, corresponding to the size object. The heap area of each heap is chained together. There is at least one small object heap area and a large object heap area-used to persist for loading the CLR.
Each small object heap always has only one short-lived zone, which is used to save gen0 and gen1 generations. This heap area may contain Gen2 objects. In addition to the short-lived zone, there may be 0, one or more additional heap areas, which are used as Gen2 heap areas and save Gen2 objects.
There is one or more heap areas on the large object heap.
The use of the heap area is from the low address to the high address, that is, the low address object in the heap area for a longer time than the Address object. There are also some anomalies in the following contexts.
The heap area can be requested on demand and deleted if it does not contain a surviving object, but the first heap area on the heap is always there. For each heap, a heap area is requested at a time, which occurs when the small object is garbage collected and when a large object is created. This has better performance because large objects are only recycled with gen2 (more expensive to execute).
The heap areas are linked together in the order in which they are requested. The last heap on a linked list is always a short-lived area. The reclaimed heap area (no surviving objects) is reused instead of being deleted directly, which is a new short-lived area. Heap reuse occurs only in the small object heap. Whenever a large object is allocated, the entire large object heap is considered. The allocation of small objects only considers short-lived areas.
Allocate budget
Assigning a budget is a logical concept associated with each generation. This is a size limit in the generation that is used to trigger a GC when it is exceeded.
Budgeting is a property that is set on a generational basis based on the survival rate of the generational object. If the survival rate is high, then the budget is larger, so that the objects destroyed at the next GC will have a better ratio to the surviving objects.
Determine which generation is recycled
When a GC is triggered, the GC must decide which generation to recycle. There are several factors to consider in addition to the budget allocation:
- Generation Fragmentation-if the generation of memory fragmentation is serious, then the yield on this generation may be high.
- If the memory load on the machine is large, the GC will recycle more aggressively to generate more free space. This is important to avoid unnecessary page scheduling.
- If there is no space in the short-lived heap, the GC will be more active in recovering short-lived objects (more gen1 recycling) to avoid applying for a new heap area.
Process Labeling phase of GC
The goal of the labeling phase is to find all surviving objects.
The benefit of generational recycling is that only one part of the heap needs to be considered, not all objects are processed every time. When a short-lived generation is recycled, the GC only needs to find the surviving object in this generation, which is escalated by the execution engine. In addition to the execution engine may reference objects, older generations of objects may also refer to the next generation of objects.
For GC use cards to label older generations. The card is set by the JIT helper function when the operation is assigned. If the JIT-helper function sees an object in the range of a short-lived area, then sets the byte that contains the card to indicate its source location. When collecting short-lived areas, the GC can look at the cards set on the heap and process the corresponding objects in turn.
Planning Phase
The planning phase simulates the compression process to determine the final effect, and if the compression effect is good then the GC starts compression, otherwise it performs cleanup.
Migration phase
If the GC determines compression and the result moves the object, references to those objects must be updated. The migration phase requires that all references to objects in the collected generations be processed. In contrast, the labeling phase only handles the surviving objects and therefore does not need to consider weak references (weak reference).
Compression phase
This stage is straightforward because the new address that the object should move is calculated at the planning stage, and the compression phase only needs to copy the object past.
Cleanup phase
The cleanup phase looks at the space between the two surviving objects. It creates idle objects for these spaces. Adjacent idle objects are merged. It saves all idle objects in the list of idle objects (freelist).
Code flow
Terms:
- WKS GC: workstation GC.
- SRV GC: server GC
Functional behavior wks GC and closed parallel GC
- The user thread ran out of budget allocations and triggered a GC.
- The GC calls Suspendee to pause the managed thread.
- The GC decides which generation to recycle.
- Perform the labeling phase.
- Executes the planning phase and decides whether to perform compression.
- If you want to compress, perform the migration and compression process. Otherwise, the cleanup process is performed.
- The GC calls Restartee to recover the managed thread.
- The user thread resumes execution.
WKS GC and open parallel GC
These illustrate how a background GC is implemented:
- The user thread ran out of budget allocations and triggered a GC.
- The GC calls Suspendee to pause the managed thread.
- The GC determines whether a background GC is required to run.
- If a background GC is required, wake it up. The background GC thread calls Restartee to recover the execution of the managed thread.
- Managed threads run and allocate memory while the background GC executes.
- The user thread may run out of budget allocations and trigger a short-lived generation of GC (which we call the foreground GC). This process is the same as "WKS GC and shut down parallel GC".
- The background GC calls Suspendee again to complete the callout and calls Restartee to perform the cleanup phase in parallel while the user thread is running.
- Background GC processing is complete.
SVR GC and shutdown of parallel GC
- The user thread ran out of budget allocations and triggered a GC.
- The server GC thread was awakened by the ice call Suspendee to pause the managed thread.
- The server GC thread performs GC work (as with the WKS GC and shuts down the parallel GC).
- The server GC thread calls Restartee to recover the managed thread.
- The user thread resumes execution.
SVR GC with parallel GC turned on
This scenario is similar to the WKS GC and opens a parallel GC, except that there is no background GC on the server GC thread.
Physical architecture
This section is designed to help you understand the code process.
After the user thread has used up the quota, apply for the new quota by Try_allocate_more_space.
Try_allocate_more_space calls Garbagecollectgeneration when a GC needs to be triggered.
If the WKS GC is closed and parallel gc,garbagecollectgeneration is executed on the user thread that triggered the GC, the code process is as follows:
GarbageCollectGeneration() { SuspendEE(); garbage_collect(); RestartEE(); } garbage_collect() { generation_to_condemn(); gc1(); } gc1() { mark_phase(); plan_phase(); } plan_phase() { // actual plan phase work to decide to // compact or not if (compact) { relocate_phase(); compact_phase(); } else make_free_lists(); }
In the case of WKS GC and the parallel GC is turned on (by default), the code process for the background GC is as follows:
GarbageCollectGeneration() { SuspendEE(); garbage_collect(); RestartEE(); } garbage_collect() { generation_to_condemn(); // decide to do a background GC // wake up the background GC thread to do the work do_background_gc(); } do_background_gc() { init_background_gc(); start_c_gc (); //wait until restarted by the BGC. wait_to_proceed(); } bgc_thread_function() { while (1) { // wait on an event // wake up gc1(); } } gc1() { background_mark_phase(); background_sweep(); }
Information
- . NET CLR GC Implementation
- The garbage Collection handbook:the Art of Automatic Memory Management
- Garbage collection (Wikipedia)
Design of CLR garbage collection