Garbage collection mechanism GC knowledge summary and discussion on how to make good use of GC, gc

Source: Internet
Author: User

Garbage collection mechanism GC knowledge summary and discussion on how to make good use of GC, gc
Garbage collection mechanism GC knowledge further summary and talk about how to make good use of gc I. Why GC?

Generally, an application can perform the following operations on resources:

1. allocate memory for corresponding resources

2. initialize memory

3. Use Resources

4. Clear Resources

5. release memory

Applications manage resources (memory usage) in the following ways:

1. manual management: C, C ++

2. Count management: COM

3. Automatic Management:. NET, Java, PHP, GO...

However, the complexity of manual management and counting management can easily lead to the following typical problems:

1. the programmer forgot to release the memory.

2. The application accesses the released memory.

The consequences are very serious, common such as memory leakage, garbled data content, and most of the time, program behavior will become weird and unpredictable, as well as Access Violation.

. NET, Java, and other solutions are provided by the automatic garbage collection mechanism GC for memory management. In this way, question 1 is naturally solved, and question 2 has no foundation.

Conclusion: Automated memory management is prone to bugs that affect system stability, especially in online multi-server cluster environments, when a program is executed, the bug must be located on a server and then dump the memory to analyze the bug. This is a great blow to developers' enthusiasm for programming and a steady stream of similar bugs are annoying.


Ii. How does GC work?

The GC workflow consists of the following steps:

1. Mark)

2. Plan)

3. Clean (Sweep)

4. Relocate)

5. Compact)


(1) mark

Objective: To find all instances that reference not 0 (live)

Method: locate all GC Root nodes, put them in the queue, and recursively traverse all the Root nodes and all referenced subnodes and subnodes, mark all nodes that have been traversed as live. Weak references will not be taken into account

(2) plan and clean up

1. Plan

Objective: To determine whether compression is required

Method: traverse all the tags (Live) on all current generations and make decisions based on specific algorithms.

2. Clean up

Objective: To reclaim all free space

Method: traverse all the tags (Live or Dead) on all current generations, and add all the memory blocks in the middle of the Live instance to the available memory linked list.

(3) Update and compress references

1. Reference update

Target: update all referenced addresses

Method: Calculate the new address corresponding to each instance after compression, find all GC Root nodes (GC Root), and put them in the queue, then recursively traverse all the root nodes and all the referenced child nodes and child nodes, and update the addresses referenced in all the nodes to be traversed, including weak references.

2. Compression

Objective: To reduce memory fragments

Method: Move the instance to the corresponding location based on the calculated new address.


Iii. GC Root Node

What is the GC Root node that appears repeatedly in this article?

Each application contains a group of root ). Each root is a storage location, which contains a pointer to a reference type object. This pointer either references an object in the managed heap or is null.

In an application, as long as an object becomes inaccessible, that is, no root references the object, this object will become the target of the garbage collector.

GC roots are not objects in themselves but are instead references to objects. in addition, Any object referenced by a GC root will automatically keep ve the next garbage collection.

In. NET, there are several types of objects that can be treated as GC Root:

1. Global Variables

2. Static variables

3. All local variables on the stack (JIT)

4. parameter variables uploaded to the stack

5. Variables in registers

Note that only variables of the reference type are considered root, and variables of the value type are never considered root. Only by deeply understanding the differences between memory allocation and management of the reference type and value type can we know why root can only be the reference type.

JAVA can be used as the GC Root object in Java as follows:

1. referenced objects in the Virtual Machine (JVM) Stack

2. Objects referenced by class static attributes in the Method Area

3. The object referenced by constants in the method area (mainly the constant value declared as final)

4. Objects referenced by JNI in the local method Stack


4. When does GC occur?

1. When the application allocates new objects, the budget of the GC generation has reached the threshold. For example, the GC 0th generation is full.

2. The Code actively and explicitly calls System. GC. Collect ()

3. In other special cases, for example, windows reports insufficient memory, CLR detaches AppDomain, and CLR closes. In some extreme cases, system parameter settings may change, which may cause GC collection.


V. Generation in GC

Generation (Generation) is introduced mainly to improve Performance and avoid collecting the entire Heap ). A generation-based garbage collector makes the following assumptions:

1. the newer the object, the shorter the survival period

2. The older the object, the longer the survival period

3. reclaim part of the heap, which is faster than the whole heap

The. NET Garbage Collector divides objects into three generations (Generation0, Generation1, Generation2 ). The contents of different generations are as follows:

1. G0 small object (Size <85000 Byte)

2. G1: the surviving G0 object in GC

3. G2: Large Object (Size> = 85000 Byte); G1 object surviving in GC

  object o = new Byte[85000]; //large object  Console.WriteLine(GC.GetGeneration(o)); //output is 2,not 0

Ps. You must know that CLR requires that all resources be allocated from managed heap. CLR manages two types of heap, small object heap and SOH) large object heap (LOH), where all memory allocations larger than 85000byte are carried out on LOH. An interesting question is why 85000 bytes?

Generation collection rules: When a generation N is collected, the surviving objects in this generation will be marked as N + 1 objects. GC performs different check policies on different generations of objects to optimize performance. Each GC cycle checks the 0th generation object. About 1/10 of GC cycle checks for 0th generation and 1st generation objects. About 1/100 of GC cycle checks all objects.

6. Carefully and explicitly call GC

GC is usually costly, and its operation is uncertain. in Microsoft's programming specifications, it is strongly recommended that you do not explicitly call GC. However, you can use some GC Methods in the framework for manual collection in your code. The premise is that you must have a deep understanding of the GC Collection Principle, otherwise, manual GC calls may easily interfere with normal GC recovery or even introduce unpredictable errors in specific scenarios.

For example, the following code:

void SomeMethod()        {            object o1 = new Object();            object o2 = new Object();            o1.ToString();            GC.Collect(); // this forces o2 into Gen1, because it's still referenced            o2.ToString();        }

If there is no GC. collect (), o1, and o2 will both enter Gen0 in the next automatic garbage collection, but GC will be added. collect (), o2 will be marked as Gen1, that is, the 0 generation recycles the memory occupied by o2

Also, non-standard programming may lead to deadlocks. For example, there is a wide spread of code:

Public class MyClass {private bool isDisposed = false ;~ MyClass () {Console. WriteLine ("Enter destructor..."); lock (this) // some situation lead to deadlock {if (! IsDisposed) {Console. WriteLine ("Do Stuff...") ;}}} copy the code

Run the following code:

var instance = new MyClass();            Monitor.Enter(instance);            instance = null;            GC.Collect();            GC.WaitForPendingFinalizers();                      Console.WriteLine("instance is gabage collected");

The above code will cause a deadlock. Cause analysis:

1. The client's main thread calls the Code Monitor. Enter (instance) code segment to lock the instance.

2. Perform GC collection manually. The main (Finalizer) thread will execute the MyClass destructor.

3. Inside the MyClass destructor, the lock (this) code is used, while the main (Finalizer) thread has not released the instance (that is, this here). At this time, the main thread can only wait

Strictly speaking, the above code is not a GC error and does not seem to be related to multi-threaded operations, but is caused by incorrect use of Lock.

At the same time, note that some GC behaviors are completely different in Debug and Release modes (Jeffrey Richter explained this problem by referring to a Timer example in <CLR Via C #> ). For example, in Debug mode, you may find that the code is running normally, but in Release mode, the code is deadlocked.


7. When GC encounters Multithreading

For more information, see <CLR Via C #>Thread hijacking.

The garbage collection algorithm discussed earlier has a major premise: It runs only in one thread. In actual development, multiple threads often access the managed heap at the same time, or at least multiple threads simultaneously operate on objects in the heap. When a thread triggers garbage collection, other threads cannot access any thread, because the Garbage Collector may move these objects and change their memory location. When the CLR wants to perform garbage collection, it immediately suspends all threads in the executed managed code. The threads that are executing the unmanaged code do not pause. Then, CLR checks the instruction pointer of each thread to determine where the thread points. Then, the command pointer is compared with the table generated by JIT to determine the code being executed by the thread.

If the instruction pointer of a thread happens to be the offset position marked in a table, it indicates that the thread has arrived atSecurity Point. Threads can be safely suspended at security points until the garbage collection ends. If the thread instruction pointer is not in the offset position marked in the table, it indicates that the thread is not in the security point, and CLR will not start garbage collection. In this case, the CLR will hijack the thread. That is to say, CLR modifies the thread stack so that the thread points to a special function within the CLR. Then, the thread resumes execution. After the current method is executed, it executes this special function, which suspends the thread safely. However, the Thread Sometimes executes the current method for a long time. Therefore, after the thread resumes execution, it takes about 250 milliseconds to hijack the thread. After this time, CLR will suspend the thread again and check the instruction pointer of the thread. If the thread has reached a security point, garbage collection can begin. However, if the thread does not reach a security point, the CLR checks whether another method is called. If yes, CLR modifies the thread stack again to hijack the thread after a method is returned recently. Then, CLR restores the thread and performs the next hijacking attempt. Garbage collection is available only when all threads arrive at a security point or are hijacked. After the garbage collection, all threads will be restored, the application continues to run, and the hijacked thread returns the method originally called.

In practice, CLR usually suspends a thread by hijacking the thread, instead of judging whether the thread has reached a security point based on the JIT-generated table. The reason is that the JIT-generated table requires a large amount of memory, which will increase the working set and seriously affect the performance.

At this end, the concept has been mentioned, and ^ _ ^ has been copied by hand. The book is sold as well as the theoretical level in the book.

Here is a real case. A web application uses a large number of tasks, and then encounters an inexplicable phenomenon in the production environment. The program does not work properly, according to the Database Log (in fact, you can also track the Windows event (ETW) IIS logs and dump files), and finds irregular unhandled exceptions during Task execution. After analysis, it is suspected that it is caused by CLR garbage collection, of course, this situation will only be exposed under high concurrency conditions.


8. Some suggestions and opinions during development

Due to the high cost of GC, some good programming habits during development may have a positive impact on GC; otherwise, it may have a negative effect.

1. Try not to use new large objects. large objects (> = 85000 bytes) are directly classified as G2 generation, and GC collection algorithms never compress memory of Large object heaps (LOH, because moving down 85000 bytes or larger memory blocks in the heap will waste too much CPU time

2. Do not frequently create objects with short lifecycles. Frequent garbage collection and frequent compression may lead to a lot of memory fragments. You can use the well-designed and stable object pool (ObjectPool) technology to avoid such problems

3. Use better programming skills, such as better algorithms, better data structures, and better solutions

Update:. NET4.5.1 and later versions support compressing large object heaps. You can use System. Runtime. GCSettings. LargeObjectHeapCompactionMode to control the compression of LOH. Refer to here.

Based on experience, sometimes the space change time in programming ideas cannot be used in disorder, and it is not easy to use. Not only can the system be guaranteed, but it may lead to Memory overflow (Out Of Memory, you can refer to an article I wrote earlier to effectively prevent this problem. NET application OOM experience.

Previously, when maintaining a system, we found that there was a lot of big data processing logic, but there was no batch or paging processing. As the data volume continued to expand, hidden problems were constantly exposed. I then designed and implemented it multiple times in batches during the rewrite process. With the multi-thread, multi-process, and distributed cluster technologies, a large amount of data can be processed well without compromising the performance, the system will become more stable and reliable.


9. GC thread and Finalizer thread

GC runs in an independent thread to delete memory that is no longer referenced.

Finalizer is executed by another independent (high-priority CLR) thread to reclaim the memory of the Finalizer object.

The execution time of the Finalizer of an object is an uncertain time after the object is no longer referenced. It is not the same as that in C ++ that the Destructor is executed immediately when the object exceeds its lifecycle.

GC places each Finalizer object to a queue (from the final list to the freachable Queue), and thenStart another thread instead of the thread executed in GCTo execute all these finalizers, And the GC thread continues to delete other objects to be recycled.

In the next GC cycle, the memory of these Finalizer objects will be recycled. That is to say, an object that implements the Finalize method must wait for two GC times before it can be completely released. This also indicates that objects with the Finalize method (the Object is not counted by default) will automatically "extend" the lifecycle in GC.

Note: The thread responsible for calling Finalize does not guarantee the call sequence of the Finalize of each object, this may cause a subtle dependency problem (see <CLR Via C #> An interesting dependency problem ).

Finally, I felt that reading a good book over and over again is far more rewarding than reading ten or twenty books that are less reliable.


<CLR Via C #>

<Understand Java Virtual Machine in depth>

<C # In Depth>

<Think In Java>


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.