1. Stack and heap
Each thread corresponds to a stack, and the CLR creates this stack,stack for the thread when it is created. The main function is to record the performance of the function. Value type variables (parameters of functions, non-member variables such as local variables) are allocated in the stack, objects of reference types are allocated in the heap, and reference pointers to heap objects are saved in the stack. GC is responsible for the release of heap objects only, heap memory space management
Heap memory allocation
With the exception of pinned object, the memory allocation in the heap is simple, and a pointer records the starting address allocated in the heap, allocating memory continuously according to the size of the object
Stack structure
Each function call logically generates a frame (stack frame) in the thread stack, and the corresponding stack frame is released when the function returns
Use a simple function to see how the CLR handles the stack at execution time:
static void Main (string[] args)
{
int r = Sum (2, 3, 4, 5, 6);
}
private static int Sum (int a, int b, int c, int d, int e)
{
Return a + B + C + D + E;
}
The main assembly code after JIT compilation is as follows (other cases the assembler code may be different, but using this simple function to see the management of the stack is sufficient):
; = = = function main====
Push 4; 3rd parameter to the last parameter stack
Push 5
Push 6
mov edx,3; 1th and 2nd parameters are put into ECX, edx register respectively
MOV ecx,2
Call DWORD ptr ds:[00ad96b8h]; called function sum, the return address (i.e. the address of the following MOV statement) is automatically pressed
mov dword ptr [Ebp-0ch],eax; set function return value to local variable R (function call end return value in EAX register)
; = = = function sum====
Push ebp; save Original EBP Register
MOV ebp,esp; Saves the current stack pointer in Ebp, and later uses EBP to address parameters and local variables
Sub esp,8; allocation of two local variables
mov dword ptr [EBP-4],ECX; the 1th parameter is put into a local variable
mov dword ptr [Ebp-8],edx; the 2nd parameter is put into a local variable
...... ; Check code for CLR
mov Eax,dword ptr [ebp-4]; a + B + C + D + E
Add Eax,dword ptr [ebp-8]; 1th parameter + 2nd parameter (2+3)
Add Eax,dword ptr [ebp+10h]; + 3rd parameter (4)
Add Eax,dword ptr [ebp+0ch]; + 4th parameter (5)
Add Eax,dword ptr [ebp+8]; + 5th parameter (6)
mov esp,ebp; restore stack pointer (local variable is freed)
Pop ebp; restore the original EBP register value
RET 0Ch; function return. 1: Return address automatically out of stack; 2:esp minus 0Ch (12 bytes), which clears the call parameters from the stack; 3: Return value in EAX register
The stack status of the execution time is as follows (the stack base address is high-end and the stack top is a low-end address):
Stack state Change Process:
a). The caller puts the 3rd, 4th, and 5th parameters in the Stack, 1th, 2nd, respectively, into the ECX, edx register
b). The call instruction calls the function sum and automatically returns the function to the stack, and the code jumps to the function sum to begin execution
c). Function sum Save the Register EBP stack and put the ESP in EBP for back-facing parameters and local variable addressing
d). Define local variables and omit extra code, regardless of the SUM function business
e). Perform the addition operation, and the results are saved in the EAX register
f). Recovers the ESP register so that all local variables in the function sum and other stack operations are released
g). The value of the original EBP is out of the stack and restored to EBP, so that the stack is fully restored to the state when the SUM function call is entered
h). RET instruction execution function return, return value in the EAX register, return address is the address of the call instruction stack, return address automatically out of the stack. 0Ch instructs the processor to release 12 bytes from the stack when the function returns, that is, the parameters of the stack are cleared by the callee. After the function returns, the stack allocation for this sum call is all released
This calling convention is similar to __fastcall
Combining ref parameters of reference type variables and value types, the following code simplifies the stack state as follows:
Code:
public static void Run (int i)
{
int J = 9;
MyClass1 C = new MyClass1 ();
c.x = 8;
int result = Sum (i, 5, ref J, C);
}
public static int Sum (int a, int b, ref int C, MyClass1 obj)
{
int r = A + B + C + obj.x;
return R;
}
public class MyClass1
{
public int x;
}
Stack Status:
Any time a reference type is allocated in the heap, only the reference address of the object is saved in the stack. After the run function has finished executing, the MyClass1 object C in the heap becomes a recyclable garbage object that is recycled during GC
2. Mark-compact tag compression algorithm
Simply put. NET GC algorithm as the MARK-COMPACT algorithm
Phase 1:mark-sweep Mark Purge phase
Let's assume that all objects in the heap can be reclaimed, and then find objects that cannot be recycled, mark them, and finally the objects that are not tagged in the heap can be recycled.
Phase 2:compact Compression Phase
After the object is reclaimed, the heap memory space becomes discontinuous, moving the objects in the heap so that they are re-arranged from the heap base address, similar to the defragmentation of disk space
After the heap memory has been reclaimed and compressed, you can continue to use the previous heap memory allocation method, where only one pointer is used to record the start address of the heap assignment.
Main processing steps: Suspend thread and OK roots=> create reachable objects graph=> Object Recycle =>heap compress = pointer fix
It is possible to understand that the reference relationships of objects in Roots:heap are intricate (cross-referencing, circular referencing), and that a complex graph,roots is a variety of entry points that the CLR can find outside of the heap. Where GC searches for roots include global objects, static variables, local objects, function invocation parameters, object pointers in the current CPU register (as well as finalization queue), and so on. It can be categorized into 2 types: Static variables that have been initialized, objects that are still in use by threads (STACK+CPU Register)
Reachable objects: Refers to objects that can be reached from roots based on the object reference relationship. For example, the local variable of the currently executing function object A is a root object whose member variable refers to object B, then B is a reachable object. From roots you can create reachable objects graph, the remaining objects are unreachable and can be recycled
Pointer repair is because the compact process moves the heap object, the object address changes, and all reference pointers need to be repaired, including the stack, pointers in the CPU register, and reference pointers to other objects in the heap
There is a slight difference between the debug and release execution modes, the objects that are not referenced by subsequent code in release mode are unreachable, and the debug mode needs to wait until the current function is executed before the objects become unreachable. To track the contents of a local object for debugging purposes
The managed object passed to COM + will also become root, and has a reference counter to be compatible with COM + 's memory management mechanism, and the reference counter is 0 o'clock these objects may be recycled objects
Pinned objects refers to an object that cannot be moved after the assignment, such as an object passed to unmanaged code (or using the fixed keyword), and the GC cannot modify the reference pointer in unmanaged code while the pointer is being repaired, so moving those objects will result in an exception. Pinned objects can cause fragmentation of the heap, but in most cases objects passed to unmanaged code should be recycled in GC
3. Generational generational algorithm
The program may use hundreds of m, a few grams of memory, the memory area of such a high cost of GC operation, the generation of the algorithm has a certain statistical basis, the performance of the GC improved effect is more obvious
The object is divided into new and old according to the life cycle, according to the result of the statistic distribution law, the new and old regions can adopt different recycling strategies and algorithms, strengthen the recovery and processing intensity of the new region, and strive for the short time interval, the smaller memory area, A large number of newly discarded local objects on the execution path that are not used at a lower cost are promptly reclaimed
Hypothetical prerequisites for the generational algorithm:
a). A large number of newly created objects have a shorter life cycle, and older objects have longer life cycles
b). Recovering part of the memory is faster than a full-memory-based recycling operation
c). The association between newly created objects is generally strong. The heap allocates objects that are contiguous and highly correlated to increase the hit rate of the CPU cache.
. NET divides the heap into 3 age zones: Gen 0, Gen 1, Gen 2
The heap is divided into 3 age zones, and the corresponding GC is available in 3 ways: # Gen 0 Collections, # gen 1 collections, # Gen 2 collections. If gen 0 heap memory reaches the threshold, the 0 generation gc,0 GC is triggered after the surviving object in Gen 0 enters Gen 1. If Gen 1 's memory reaches the threshold, the 1 generation gc,1 GC recycles the Gen 0 heap and Gen 1 heap, and the surviving objects go into Gen 2. 2 Generation GC recycles the Gen 0 heap, Gen 1 heap, and Gen 2 heap
Gen 0 and Gen 1 are smaller, these two generations are always around 16M, the size of Gen 2 is determined by the application and can reach a few g, so the cost of the 0 generation and 1 generation GC is very low, and the 2 generation GC is called full GC, which is usually expensive. The roughly calculated 0 and 1 generation GC should be able to complete in milliseconds to dozens of milliseconds, while Gen 2 heap may take a few seconds for the full GC to be large. In general terms. NET application runs during the 2, 1 and 0 generation GC should be roughly the same frequency as 1:10:100
Figure for an ASP. Performance Moniter,gen 0 Heap size (red) Average 6m,gen 1 (blue) Average 5m,gen 2 (yellow) reached 620m,gen 0+gen 1 average 13.2M, Max 19.8M
Visually, a program's run consists of a series of function calls that create many local objects during the run, and then generate a large number of objects to be reclaimed after the function is finished. Using the generational algorithm to strengthen the garbage recovery of the new age, usually can greatly improve the garbage collection efficiency, otherwise it is very special program, or unreasonable object Association design. For example, an ASP. NET program, you should ensure that most of the objects used for HTTP request processing are released in 0 generation and 1 generation garbage collection
A few pointers for the heap can be used to determine the age range, and when creating reachable objects graph, depending on the address of the object, you can determine which age area the object is in, and if 0 generation GC encounters a 1-generation, 2-generation heap object when creating graph, Can be directly over without continued traversal, older age objects if you reference a newer age object, you can subscribe to the memory Update notification through the Win32 API Getwritewatch, recorded in the "card table", to assist the lower-age GC to correctly construct graph
4. LOH
In. NET 1.1 and 2.0, objects under 85000 bytes are referred to as small objects, which are allocated in the Gen 0 heap, where more than 85000 bytes of objects are called large objects, and are allocated in the large object heap. This is because the GC moves large chunks of memory during heap compression and consumes a significant amount of CPU time, and the performance tuning practice determines that a threshold of 85000 bytes
The Loh is recycled only on 2 generation GC, with the MARK-SWEEP algorithm, without compression, so the memory allocation in the Loh is discontinuous, using an idle list to record free space in the Loh and manage the freed space
After Obj1 and Obj2 are released, their space is merged to become a node of the free list, which is then assigned to OBJ4
when do I trigger a garbage collection?
As mentioned earlier, 0-generation and 1-generation garbage collection is primarily controlled by thresholds. The initial gen 0 heap size is related to the size of the CPU cache, while the runtime CLR dynamically adjusts the Gen 0 heap size based on the memory request state, but the total size of Gen 0 and Gen 1 remains around 16M
The Gen 2 heap and Loh are reclaimed at full GC, and the full GC is primarily triggered by a class of 2 events:
A) The number of objects entering Gen 2 heap and Loh exceeds a certain percentage. Registerforfullgcnotification parameters Maxgenerationthreshold, Largeobjectheapthreshold can set this value for Gen 2 heap and Loh, respectively
b). When the operating system memory is tight. The CLR receives a notification message that the operating system is memory-intensive, triggering the full GC
5. Heap details, expansion and contraction
The Delling of the heap is a logical structure, the heap actual memory is requested and allocated as well as released in segment (segment), workstation GC mode segment size 16m,server GC mode segment size is 64M. Gen 0 and Gen 1 heap are always in the same segment, called Ephemeral segment (freshman segment), so Max (gen 0 heap Size+gen 1 heap size) ≈16m | | The 64m,gen 2 heap consists of 0 or more segments, and the Loh consists of one or more segments
. NET program starts, the CLR creates 2 segment for the heap, one as ephemeral segment and the other for the Loh: NET uses VirtualAlloc to request and allocate heap memory, there is not enough space to allocate new objects in the Loh, or there is too much space for the 1-generation GC to enter Gen 2 objects. NET assigns a new segment to the Loh or the small object heap. Application for new segment failure will be thrown by EE OutOfMemory exception
Fully idle segments after full GC will be freed and memory returned to the operating system
An important improvement to the GC for. NET 2.0 is to improve heap fragmentation as much as possible. The heap fragments are mainly caused by pinned objects, and there are 2 main improvement measures. The first is a deferred upgrade, if ephemeral segment exists pinned objects, then as long as possible to delay their upgrade to Gen 2 point in time, consider pinned objects while maximizing the current ephemeral segment space The second is to reuse the space of Gen 2, which segments may re-use as ephemeral segment if there is enough room for segments objects in Gen 2 pinned
6. GC Mode
There are Workstation GC with Concurrent GC off, Workstation GC with Concurrent GC on, Server GC 3
Workstation GC with Concurrent GC off: For a single CPU machine for high throughput, a series of strategies to observe memory allocations and the status of each GC, dynamically adjust the GC policy, as much as possible to make the program as the runtime state changes to achieve efficient GC operation, But all threads are frozen while GC is in progress
Workstation GC with Concurrent GC on: Interactive programs that are important for response time, such as streaming media playback, etc. (if a full GC causes the application to break for a few seconds, more than 10 seconds, the user will not be able to tolerate it). This approach takes advantage of multi-CPU processing of full GC, not freezing all threads during full GC, but rather freezing all the threads into a short amount of time, while the thread freezes over time, the application can still run normally and allocate memory, mainly by the Gen 0 The heap size is implemented much larger than the Non-concurrent GC, allowing the thread to still allocate memory in the Gen 0 heap while the GC is operating, but the thread will still block if the Gen 0 heap is exhausted after the GC is still not finished. The cost of this approach is that working set and GC takes more time than the Non-concurrent GC.
Server GC: Servers applications for multi-CPU machines achieve high throughput and scalability to take full advantage of the server's large memory. NET creates a set of heap for each CPU (including Gen 0, 1, 2, and LOH) and one GC thread, each of which can perform GC operations independently for the corresponding heap, while the other CPUs perform the processing normally. The best scenario is that the memory structure is basically the same across multiple threads, performing the same or similar work
Only workstation GC can be used on a single CPU machine, workstation GC with Concurrent GC on mode by default, and a single CPU machine configured as server GC is invalid, still using workstation GC , ASP. NET on a multi-CPU server uses the server GC mode by default, and server GC cannot use the concurrent mode
Concurrent GC can be used on a single CPU machine, which is independent of the number of CPUs
For an ASP. NET program, you should try to ensure that one CPU corresponds to only one GC thread, preventing any performance problems caused by conflicts between multiple GC threads on the same CPU. If you use a Web garden, you should use workstation GC with Concurrent GC off. Web garden in order to improve throughput will result in multiple times of memory usage, each work process memory has a lot of duplication, the best application of Web garden is to use a shared resource pool between multiple processes, Avoid duplication of memory and increase throughput as much as possible. At this point the server GC should be similar to the Web garden, but the Web garden is in multiple processes, and the server GC is implemented in the same process through multithreading, and there is currently no further information in the server GC. A lot of things can only be guessed based on available information.
Disable concurrent GC for workstation GC:
<configuration>
<runtime>
<gcconcurrent enabled= "false"/>
</runtime>
</configuration>
To enable the server GC:
<configuration>
<runtime>
<gcserver enabled= "true"/>
</runtime>
</configuration>
7. Finalization
When an object with a Finalize method is garbage collected. NET calls the Finalize method before it is recycled, and is handled as follows:
a). When the heap creates an object with a Finalize method, the object pointer is placed into the finalization queue;
b). When garbage collection, an object with a Finalize method, if it becomes unreachable, removes its pointer from the finalization queue and puts it into the freachable queue, which is not recycled in this garbage collection process Other unreachable objects that do not have a Finalize method are recycled normally. The object in the Freachable queue is reachable (the other objects it references are also reachable)
c). After the garbage collection is complete, if the freachable queue is non-empty, a dedicated runtime thread finalizer thread is awakened, calling the Finalize method of the object in the Freachable queue one at a time, Then remove its pointer from the freachable queue
d). After the process of step c, these objects become unreachable when the second garbage collection is processed, and are recycled as normal
Because Finalize method is designed to be used for the release of unmanaged resources, the release of these resources can take a long time to optimize the performance of garbage collection processing, so that the Finalize method is called exclusively to a separate thread finalizer Thread is processed asynchronously, which also causes the object of the Finalize method to be processed by 2 garbage collection
Reference:
Garbage Collection-past, Present and future, Patrick Dussud, Chinese translation:. NET garbage collector past present and future (i), (ii)
C # Heap (ing) Vs Stack (ing) in. NET part I, part II, part III, part IV Matthew Cochran
Garbage collection:automatic Memory Management in the Microsoft. NET Framework Jeffrey Richter
Garbage Collection part 2:automatic Memory Management in the Microsoft. NET Framework Jeffrey Richter
CLR Inside out:large Object Heap uncovered Maoni Stephens
Heap:pleasures and Pains Murali R. Krishnan
The dangers of the Large Object Heap Andrew Hunter
Garbage Collection Notifications
Garbage Collector Basics and performance Hints Rico Mariani
CLR Inside out:investigating Memory issues Claudio Caldato and Maoni Stephens
Understanding garbage Collection in. NET Andrew Hunter
Using GC Efficiently Part 1, Part 2, Part 3, Part 4 Maoni Stephens
Notes on the CLR garbage Collector Vineet Gupta
The Mystery of Concurrent GC Mark Smith
Garbage Collection curriculum Ferreira Paulo, Veiga Luís
Java theory and practice:a Brief History of garbage collection Brian Goetz
Http://www.cnblogs.com/riccc/archive/2009/09/01/dotnet-memory-management-and-garbage-collection.html
. NET memory management, garbage collection