Garbage collection mechanism GC knowledge Summary and discussion on how to use GC (Other information: low memory)

Source: Internet
Author: User

SOURCE One, why GC is required

Application-to-resource operations are usually simply divided into the following steps:

1. Allocate memory for the corresponding resource

2. Initialize Memory

3. Use of resources

4. Clean up Resources

5. Free up memory

The way an application manages resources (memory usage) is commonly seen in the following ways:

1, Manual management: c,c++

2. Count Management: COM

3, Automatic management:. Net,java,php,go ...

However, the complexity of manual management and count management can easily lead to the following typical problems:

1. Programmer forgets to release memory

2. Application access to memory that has been freed

The consequences are serious, common such as memory leaks, data content garbled, and most of the time, the program's behavior will become weird and unpredictable, as well as access violation and so on.

The solutions given by. NET, Java, and so on, are memory management through the automated garbage collection mechanism GC. In this way, problem 1 is naturally resolved, and problem 2 does not exist on the basis.

Summary: Unable to automate the memory management method is very easy to produce a bug, affecting system stability, especially in the online multi-server cluster environment, the program occurs when the bug must be located to a server and then dump the memory to analyze the bug, and greatly discourage the developer's programming enthusiasm, And a steady stream of similar bugs makes people sick.

Second, how the GC works

The GC workflow is mainly divided into the following steps:

1, Mark (Mark)

2. Planning (PLAN)

3. Cleaning (Sweep)

4. Reference update (relocate)

5. Compression (Compact)

(i), Mark

Goal: Find all instances with a reference not to 0 (live)

Method: Find all the GC root nodes (GC root), put them in the queue, then recursively iterate through all the root nodes and all of the referenced child nodes and child nodes, marking all the nodes that are traversed to live. Weak references are not taken into account

(ii), Planning and liquidation

1. Plan

Objective: To determine whether compression is required

Method: Iterate through all the tags (Live) on all of the current generation, making decisions based on specific algorithms

2. Clean up

Goal: Reclaim all free space

Method: Iterate through all the tags on the current generation (live or Dead) and add all memory blocks in the middle of the live instance to the available memory list

(iii), reference updates and compression

1. Reference update

Target: Update all referenced addresses

Method: Calculate the new address of each instance after compression, find all the GC root nodes (GC root), put them in the queue, then recursively traverse all the root nodes and all the referenced child nodes and child nodes, and update the addresses referenced in all the nodes that are traversed. Include weak references.

2. Compression

Goal: Reduce memory fragmentation

Method: the instance is moved to the corresponding location according to the new calculated address.

Third, the root node of the GC

The recurring GC root node in this article is also what GC root is.

Each application contains a set of roots (root). Each root is a storage location that contains a pointer to the reference type object. The pointer either references an object in the managed heap, or is null.

In an application, this object becomes the target of the garbage collector as long as an object becomes unreachable, that is, there is no root (root) reference to the object.

In a concise English description: GC roots is not objects in themselves but is instead references to objects. Also, any object referenced by a GC Root would automatically survive the next garbage collection.

. The following are some of the objects that can be used as GC root in net:

1. Global variables

2. Static variables

3, all local variables on the stack (JIT)

4. Parameter variables passed on the stack

5. Variables in registers

Note that only variables of a reference type are considered to be roots, and a variable of value type is never considered a root. Only a deep understanding of the differences in memory allocation and management of reference types and value types is necessary to know why root can only be reference types.

Incidentally Java, in Java, there are several objects that can be used as GC root:

1. Referenced objects in the virtual machine (JVM) stack

2. Object referenced by class static property in method area

3. The object referenced by a constant in the method area (mainly refers to the constant value declared final)

4. Referenced objects of JNI in the local method stack

Iv. when the GC occurs

1. When the application allocates new objects, theGC's generation budget has reached the threshold, such as the No. 0 generation of the GC is full

2, the code actively explicitly call System.GC.Collect ()

3. Other special cases, for example, Windows reports insufficient memory, CLR unload AppDomain, CLR shutdown, and even in some extreme cases system parameter settings may cause GC Recycle

V. Generation in the GC

Generation (Generation) was introduced primarily to improve performance (performance) to avoid collecting the entire heap. A generation-based garbage collector makes the following assumptions:

1. The more new The object, the shorter the survival time

2, the older the object, the longer the survival period

3, part of the recovery heap, faster than the entire heap recovery

. NET's garbage collector divides objects into three generations (Generation0,generation1,generation2). The contents of the different generations are as follows:

1. G0 Small Object (size<85000byte)

2. G1: G0 object survived in GC

3. G2: Large object (Size>=85000byte); The G1 object that survived the GC

  Object New byte[85000//Large Object  //output is 2,not 0

PS, it must be known here that the CLR requires that all resources be allocated from the managed heap (managed heap), the CLR manages two types of heap, the small object heap (small object Heap,soh), and the large object heap (large object Heap,loh), All memory allocations greater than 85000byte will be made on the Loh. An interesting question is why is 85000 bytes?

Generation Collection rules: When a generation n is collected, the surviving objects in this generation are marked as objects of the n+1 generation. The GC performs different check policies for different generations of objects to optimize performance. The No. 0 Generation object is checked for each GC cycle. About 1/10 of GC cycles check for NO. 0 and 1th generation objects. About 1/100 of the GC cycles check for all objects.

VI. explicit invocation of GC

The cost of GC is usually very large and its operation is uncertain, and Microsoft's programming specification strongly recommends that you do not explicitly invoke the GC. However, you can still use some of the GC's methods in your code for manual recycling, provided you have a deep understanding of the GC's recycling principle, otherwise it is easy to manually invoke the GC in a particular scenario to interfere with GC's normal recycling or even introduce unpredictable errors.

For example, the following code:

        void SomeMethod ()        {            objectnew  object ();             Object New Object ();            O1. ToString ();             // This forces O2 into Gen1, because it ' s still referenced             O2. ToString ();        }

If there is no Gc.collect (), O1 and O2 will enter Gen0 in the next garbage collection, but with Gc.collect (), O2 will be marked as GEN1, that is, 0-generation reclamation does not release the memory occupied by O2

In other cases, programming is not a norm that can lead to deadlocks, such as a widely circulated code:

     Public classMyClass {Private BOOLisdisposed =false; ~MyClass () {Console.WriteLine ("Enter destructor ..."); Lock( This)//Some situation leads to deadlock            {                if(!isdisposed) {Console.WriteLine ("Do Stuff ..."); }            }        }    }
MyClass

Called by the following code:

           var New MyClass ();            Monitor.Enter (instance);             NULL ;            Gc. Collect ();            Gc. WaitForPendingFinalizers ();                      Console.WriteLine ("instance is gabage collected");

The code above will cause a deadlock. The causes are analyzed as follows:

1, the client main thread call code Monitor.Enter (instance) code snippet lock Instance Instance

2. Then manually perform GC reclamation, and the primary (Finalizer) thread executes the MyClass destructor

3, inside the MyClass destructor, the lock (this) code is used, and the primary (Finalizer) thread has not released instance (also the this here), when the main thread can only wait for

Although strictly speaking, the above code is not the GC's fault, and multithreading does not seem to be relevant, but the use of lock is not correct caused.

Also note that some of the GC's behavior is completely different in debug and release mode (Jeffrey Richter in <<clr Via c#>> A timer example illustrates the problem). For example the above code, in debug mode you may find that it is normal operation, and release mode will be deadlocked.

Vii. when a GC encounters multiple threads

This section of the main reference <<clr Via c#>> thread hijacking .

One big premise of the garbage collection algorithm discussed earlier is that it runs on only one thread. In real-world development, there are often cases where multiple threads are accessing the managed heap concurrently, or at least multiple threads are manipulating objects in the heap at the same time. When one thread throws a garbage collection, the other threads absolutely cannot access any threads, because the garbage collector may move those objects and change their memory locations. When the CLR wants to do garbage collection, all threads in managed code are suspended immediately, and threads that are executing unmanaged code are not suspended. The CLR then examines each thread's instruction pointer to determine where the thread is pointing. The instruction pointer is then compared to the JIT-generated table to determine what code the thread is executing.

If a thread's instruction pointer is positioned exactly at the offset of a table, it indicates that the thread has reached a safe point . Threads can be safely suspended at a secure point until the end of garbage collection. If the thread instruction pointer is not at the offset of the tag in the table, it indicates that the thread is not in a secure point and the CLR does not start garbage collection. In this case, the CLR will hijack the thread. That is, the CLR modifies the line stacks so that the thread points to a special function inside the CLR. The thread then resumes execution. When the current method finishes executing, he executes this special function, which will safely suspend the thread. However, the thread sometimes executes the current method for a long time. Therefore, when the thread resumes execution, it has about 250 milliseconds to attempt to hijack the thread. After this time, the CLR suspends the thread again and checks the instruction pointer of the thread. If a thread has reached a secure point, garbage collection can begin. However, if the thread has not yet reached a security point, the CLR checks to see if another method is called. If so, the CLR modifies the line stacks again so that the thread is hijacked after the most recently executed method returns. The CLR then resumes the thread for the next hijacking attempt. Garbage collection can only be used after all threads have reached a secure point or been hijacked. After the garbage collection is complete, all threads are restored, the application continues to run, and the hijacked thread returns the method that originally called them.

In practice, the CLR hangs threads most of the time by hijacking a thread, rather than judging whether a thread has reached a security point based on the JIT-generated table. The reason for this is that a JIT-generated table requires a lot of memory, which increases the working set and thus severely affects performance.

This is the end of the concept narrative, the hand has been copied soft ^_^, the book is sold expensive and the theoretical level of the book is just as reasonable.

Here is a real case. A Web application with a large number of tasks, after the production environment occurs inexplicably, the program is not the spirit, according to the database log (in fact, according to Windows Event Tracking (ETW), IIS logs and dump files), It is found that there are irregular unhandled exceptions during task execution, and it is suspected that CLR garbage collection is the result of the analysis, which is only exposed under high concurrency conditions.

Viii. some suggestions and comments in the development

Due to the high cost of GC, the development of the usual attention to some good programming habits may have a positive impact on the GC, otherwise it may have undesirable effects.

1, try not to new very large object, large objects (>=85000byte) directly into the G2 generation, GC recovery algorithm never does not memory compression of the large object heap (LOH), because the Move down 85000 bytes or larger memory blocks in the heap will waste too much CPU time

2, do not frequent new life cycle very short object, so frequent garbage collection frequent compression may lead to a lot of memory fragmentation, you can use a well-designed and stable running object Pool (Objectpool) technology to circumvent this problem

3, using better programming skills, such as better algorithms, better data structure, better solutions and so on

update: NET4.5.1 and above have supported the compression of large object heaps, which can be controlled by System.Runtime.GCSettings.LargeObjectHeapCompactionMode to compress the Loh. Refer to here.

According to experience, sometimes the idea of space in the programming of time really can not be used, bad, not only the system can guarantee, may lead to memory overflow (out of Memories), about Oom, you can refer to an article I have written before an effective prevention. NET application for Oom experience Memo.

Before the maintenance of a system, found that there are a lot of big data processing logic, but there is no batch and paging processing, as the volume of data is constantly expanding, hidden problems will continue to be exposed. Then I rewrite, all in accordance with the idea of batch design, with multi-threaded, multi-process and distributed cluster technology, the large amount of data can also be handled very well, and performance will not decline, the system will become more stable and reliable.

Ix. GC Threads and Finalizer threads

The GC runs in a separate thread to remove memory that is no longer referenced.

Finalizer is another independent (high-priority CLR) thread that performs memory reclamation of finalizer objects.

The finalizer of an object is executed at an indeterminate time after the object is no longer referenced, not as in C + + as soon as the object goes out of the life cycle.

The GC places every object that needs to execute finalizer into a queue (moving from the finalization list to the freachable queue), and then starts another thread instead of executing the thread in the GC to perform all these finalizer, The GC thread continues to delete other objects to be reclaimed.

In the next GC cycle, the memory of these objects that finished finalizer is recycled. This means that an object that implements the Finalize method must wait two times for the GC to be fully released. This also indicates that an object with a Finalize method (which does not count by default) will automatically "extend" the life cycle in the GC.

Special note: The thread that is responsible for calling finalize does not guarantee the sequence of finalize calls for individual objects, which can cause subtle dependency problems (see <<CLR Via c#>> an interesting dependency problem).

At last, it is far more rewarding to read a good book than to read 10 of the 20 less-than-reliable books.

Reference:

<<CLR Via c#>>

<< in-depth understanding of Java virtual Machines >>

<<c# in Depth>>

<<think in Java>>

Https://msdn.microsoft.com/en-us/library/ms979205.aspx

Http://msdn.microsoft.com/zh-cn/magazine/cc188793%28en-us%29.aspx

Garbage collection mechanism GC knowledge summary and talk about how to use GC (other information: Out of memory)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.