. NET garbage collection mechanism (ii)

Source: Internet
Author: User

First, the need for GC

1, the application of resource operations, usually simply divided into the following steps: Allocate memory for the corresponding resources → initialize memory → use resources → clean up resources → free memory.

2, the application of resources (memory use) management methods, common generally have the following:

[1] Manual management: c,c++

[2] Count management: COM

[3] Automatic management:. Net,java,php,go ...

3, however, the complexity of manual management and count management can easily lead to the following typical problems:

[1] Programmer forgets to release memory

[2] application accesses memory that has been freed

The consequences are serious, common such as memory leaks, data content garbled, and most of the time, the program's behavior will become weird and unpredictable, as well as access violation and so on.

The solutions given by. NET, Java, and so on, are memory management through the automated garbage collection mechanism GC. In this way, problem 1 is naturally resolved, and problem 2 does not exist on the basis.

Summary: Unable to automate the memory management method is very easy to produce a bug, affecting system stability, especially in the online multi-server cluster environment, the program occurs when the bug must be located to a server and then dump the memory to analyze the bug, and greatly discourage the developer's programming enthusiasm, And a steady stream of similar bugs makes people sick.

Second, how the GC works

The GC workflow is mainly divided into the following steps:

Mark → program (plan) → cleanup (Sweep) → reference update (relocate) → Compression (compact)

  

1. Marking

Goal: Find all instances with a reference not to 0 (live)

Method: Find all the GC root nodes (GC root), put them in the queue, then recursively iterate through all the root nodes and all of the referenced child nodes and child nodes, marking all the nodes that are traversed to live. Weak references are not taken into account

2. Planning and cleanup

[1] Plan

Objective: To determine whether compression is required

Method: Iterate through all the tags (Live) on all of the current generation, making decisions based on specific algorithms

[2] Cleaning up

Goal: Reclaim all free space

Method: Iterate through all the tags on the current generation (live or Dead) and add all memory blocks in the middle of the live instance to the available memory list

3. Reference updates and compression

[1] Reference update

Target: Update all referenced addresses

Method: Calculate the new address for each instance after compression, find all the GC root nodes (GC root), put them in the queue, then recursively traverse all the root nodes and all the referenced child nodes and child nodes, and update the addresses referenced in all traversed nodes, including weak references.

[2] Compression

Goal: Reduce memory fragmentation

Method: the instance is moved to the corresponding location according to the new calculated address.

Third, the root node of the GC

The recurring GC root node in this article is also what GC root is.

Each application contains a set of roots (root). Each root is a storage location that contains a pointer to the reference type object. The pointer either references an object in the managed heap, or is null.

In an application, this object becomes the target of the garbage collector as long as an object becomes unreachable, that is, there is no root (root) reference to the object.

In a concise English description: GC roots is not objects in themselves but is instead references to objects. Also, any object referenced by a GC Root would automatically survive the next garbage collection.

. The following are some of the objects that can be used as GC root in net:

1. Global variables

2. Static variables

3, all local variables on the stack (JIT)

4. Parameter variables passed on the stack

5. Variables in registers

Note that only variables of a reference type are considered to be roots, and a variable of value type is never considered a root. Because value types are stored on the stack, reference types are stored on the managed heap.

Iv. when the GC occurs

1, when the application allocates new objects, the GC's generation budget has reached the threshold, such as the No. 0 generation of the GC is full;

2, the code actively explicitly call System.GC.Collect ();

3. Other special cases, such as Windows reporting out of memory, CLR offloading AppDomain, CLR shutdown, and even some extreme situations where system parameter settings change may cause GC reclamation.

V. Generation in the GC

Generation (Generation) was introduced primarily to improve performance (performance) to avoid collecting the entire heap. A generation-based garbage collector makes the following assumptions:

1, the more new objects, the shorter the survival period;

2, the older the object, the longer the survival period;

3, the recovery of a part of the heap, faster than the entire heap recovery.

. NET's garbage collector divides objects into three generations (Generation0,generation1,generation2). The contents of the different generations are as follows:

1. G0 Small Object (size<85000byte): The newly allocated object is less than 85000 bytes.

2. G1: G0 object survived in GC

3. G2: Large object (Size>=85000byte); The G1 object that survived the GC

Object o = new byte[85000]; Large Objectconsole.writeline (GC. Getgeneration (o)); Output is 2,not 0

It must be known here that the CLR requires that all resources be allocated from the managed heap (managed heap), the CLR manages two types of heap, the small object heap (small object Heap,soh), and the large object heap (large object Heap,loh), All memory allocations greater than 85000byte will be made on the Loh.

Generation Collection rules: When a generation n is collected, the surviving objects in this generation are marked as objects of the n+1 generation. The GC performs different check policies for different generations of objects to optimize performance. The No. 0 Generation object is checked for each GC cycle. About 1/10 of GC cycles check for NO. 0 and 1th generation objects. About 1/100 of the GC cycles check for all objects.

VI. explicit invocation of GC

The cost of GC is usually very large and its operation is uncertain, and Microsoft's programming specification strongly recommends that you do not explicitly invoke the GC. However, you can still use some of the GC's methods in your code for manual recycling, provided you have a deep understanding of the GC's recycling principle, otherwise it is easy to manually invoke the GC in a particular scenario to interfere with GC's normal recycling or even introduce unpredictable errors.

For example, the following code:

        void SomeMethod ()        {            Object O1 = new Object ();            Object O2 = new Object ();            O1. ToString ();            Gc. Collect (); This forces O2 into Gen1, because it ' s still referenced O2            . ToString ();        }

If there is no Gc.collect (), O1 and O2 will enter Gen0 in the next garbage collection, but with Gc.collect (), O2 will be marked as GEN1, that is, 0-generation reclamation does not release the memory occupied by O2

In other cases, programming is not a norm that can lead to deadlocks, such as a widely circulated code:

    public class MyClass    {        private bool isdisposed = false;        ~myclass ()        {            Console.WriteLine ("Enter destructor ...");            Lock (This)//some situation leads to deadlock            {                if (!isdisposed)                {                    Console.WriteLine ("Do Stuff ..."); c11/>}}}}    

Called by the following code:

var instance = new MyClass ();            Monitor.Enter (instance);            instance = null;            Gc. Collect ();            Gc. WaitForPendingFinalizers ();                      Console.WriteLine ("Instance is gabage collected");

The code above will cause a deadlock. The causes are analyzed as follows:

1, the client main thread call code Monitor.Enter (instance) code snippet lock Instance Instance

2. Then manually perform GC reclamation, and the primary (Finalizer) thread executes the MyClass destructor

3, inside the MyClass destructor, the lock (this) code is used, and the primary (Finalizer) thread has not released instance (also the this here), when the main thread can only wait for

Although strictly speaking, the above code is not the GC's fault, and multithreading does not seem to be relevant, but the use of lock is not correct caused.

Also note that some of the GC's behavior is completely different in debug and release mode (Jeffrey Richter in <<clr Via c#>> A timer example illustrates the problem). For example the above code, in debug mode you may find that it is normal operation, and release mode will be deadlocked.

Vii. when a GC encounters multiple threads

One big premise of the garbage collection algorithm discussed earlier is that it runs on only one thread. In real-world development, there are often cases where multiple threads are accessing the managed heap concurrently, or at least multiple threads are manipulating objects in the heap at the same time. When one thread throws a garbage collection, the other threads absolutely cannot access any threads, because the garbage collector may move those objects and change their memory locations. When the CLR wants to do garbage collection, all threads in managed code are suspended immediately, and threads that are executing unmanaged code are not suspended. The CLR then examines each thread's instruction pointer to determine where the thread is pointing. The instruction pointer is then compared to the JIT-generated table to determine what code the thread is executing.

If a thread's instruction pointer is positioned exactly at the offset of a table, it indicates that the thread has reached a safe point . Threads can be safely suspended at a secure point until the end of garbage collection. If the thread instruction pointer is not at the offset of the tag in the table, it indicates that the thread is not in a secure point and the CLR does not start garbage collection. In this case, the CLR will hijack the thread. That is, the CLR modifies the line stacks so that the thread points to a special function inside the CLR. The thread then resumes execution. When the current method finishes executing, he executes this special function, which will safely suspend the thread. However, the thread sometimes executes the current method for a long time. Therefore, when the thread resumes execution, it has about 250 milliseconds to attempt to hijack the thread. After this time, the CLR suspends the thread again and checks the instruction pointer of the thread. If a thread has reached a secure point, garbage collection can begin. However, if the thread has not yet reached a security point, the CLR checks to see if another method is called. If so, the CLR modifies the line stacks again so that the thread is hijacked after the most recently executed method returns. The CLR then resumes the thread for the next hijacking attempt. Garbage collection can only be used after all threads have reached a secure point or been hijacked. After the garbage collection is complete, all threads are restored, the application continues to run, and the hijacked thread returns the method that originally called them.

In practice, the CLR hangs threads most of the time by hijacking a thread, rather than judging whether a thread has reached a security point based on the JIT-generated table. The reason for this is that a JIT-generated table requires a lot of memory, which increases the working set and thus severely affects performance.

Here is a real case. A Web application with a large number of tasks, after the production environment occurs inexplicably, the program is not the spirit, according to the database log (in fact, according to Windows Event Tracking (ETW), IIS logs and dump files), It is found that there are irregular unhandled exceptions during task execution, and it is suspected that CLR garbage collection is the result of the analysis, which is only exposed under high concurrency conditions.

Viii. some suggestions and comments in the development

Due to the high cost of GC, the development of the usual attention to some good programming habits may have a positive impact on the GC, otherwise it may have undesirable effects.

1, try not to new very large object, large objects (>=85000byte) directly into the G2 generation, GC recovery algorithm never does not memory compression of the large object heap (LOH), because in the heap to move 85000 bytes or larger memory blocks will waste too much CPU time;

2, do not frequent new life cycle very short object, so frequent garbage collection frequent compression may lead to a lot of memory fragmentation, you can use a well-designed and stable running object Pool (Objectpool) technology to circumvent this problem

3, using better programming skills, such as better algorithms, better data structure, better solutions and so on

update: NET4.5.1 and above have supported the compression of large object heaps, which can be controlled by System.Runtime.GCSettings.LargeObjectHeapCompactionMode to compress the Loh.

Ix. GC Threads and Finalizer threads

The GC runs in a separate thread to remove memory that is no longer referenced.

Finalizer is another independent (high-priority CLR) thread that performs memory reclamation of finalizer objects.

The finalizer of an object is executed at an indeterminate time after the object is no longer referenced, not as in C + + as soon as the object goes out of the life cycle.

The GC places every object that needs to execute finalizer into a queue (moving from the finalization list to the freachable queue), and then starts another thread instead of executing the thread in the GC to perform all these finalizer, The GC thread continues to delete other objects to be reclaimed.

In the next GC cycle, the memory of these objects that finished finalizer is recycled. This means that an object that implements the Finalize method must wait two times for the GC to be fully released. This also indicates that an object with a Finalize method (which does not count by default) will automatically "extend" the life cycle in the GC.

Special note: The thread that is responsible for calling finalize does not guarantee the sequence of finalize calls for individual objects, which can cause subtle dependency problems (see <<CLR Via c#>> an interesting dependency problem).

Ten, GC Essentials finishing

1.. NET resources, and for managed resources,. NET GC can well reclaim useless garbage, while for unmanaged (such as file access, network access, etc.) need to manually clean up garbage (explicit release).

2. Release of unmanaged resources. NET provides two ways:

1). Finalizer: It seems that C + + 's destructor, in essence, is very far apart. Finalizer is the finalizer that is called by the object before it is reclaimed by GC, but it is intended to release unmanaged resources, but due to the uncertainty of the GC run time, it usually causes the unmanaged resource to be released in a timely manner. In addition, finalizer may have unexpected side effects, such as: The object being recycled has not been referenced by other available objects, but the internal finalizer to make it available again, which destroys the atomicity of GC garbage collection process and increases the GC overhead.

2). Dispose mode: C # provides the Using keyword to support dispose pattern for resource deallocation. This frees unmanaged resources in a deterministic way, and the using structure provides exception security. Therefore, it is generally recommended that Dispose Pattern be used, supplemented by checks in finalizer, if the explicit Dispose object is forgotten, the resource is freed in finalizer.

3. Collection of managed resources to determine if the object is to be recycled as long as it is not valid to determine whether the object or its contained sub-objects are not referenced

4. The cost of GC: One loses the real-time nature of managed resource recovery, and the second is that it does not unify the management of C # managed and unmanaged resources, resulting in conceptual fragmentation

5.. NET types are divided into two categories: reference types, value types, value types are allocated on the stack, no GC reclamation is required, reference types are allocated on the heap, and its release and reclamation requires a GC to complete. Objects of a reference type are recycled and need to be garbage

6. The system has a separate thread for GC, and a certain priority algorithm for the memory recycle GC is used to recycle memory resources.

7. Generation (Generation), in order to improve performance, older objects survive longer. NET is generally divided into three generations, G0,g1,g2;g0 was first recycled.

8. Garbage collection steps, labeling, finishing, and ending

9. GC. Collect () is generally used in conjunction with the Gc.suppressfinalize function, using the Gc.suppressfinalize function to prevent cleanup of conflicts.

. NET garbage collection mechanism (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.