Step by Step C # technical discussion 4. Garbage collection mechanism (GC)

Source: Internet
Author: User

GC's past and present

Although this article uses. net as the goal to describe GC, the concept of GC was not just born soon. As early as 1958, The Lisp Language Implemented by John McCarthy, the famous Tuling prize winner, already provided the GC function, which was the first occurrence of GC. Lisp Programmers think that memory management is too important, so it cannot be managed by the programmers themselves. But in the days that followed, Lisp was not a climate, and the language using memory manual management assumed the upper hand, represented by C. For the same reason, different people have different opinions. C Programmers think that memory management is too important to be managed by the system, and sneer at the speed of running the Lisp program as a tortoise. Indeed, many people cannot accept the GC speed and the large amount of system resource occupation during the time when every Byte was carefully calculated. Later, in 1984, the Small talk language developed by Dave Ungar adopted the Generational garbage collection technology for the first time (this technology will be discussed below), but Small talk has not been widely used.
GC did not appear on the stage of history until the middle of 1990s, thanks to Java's progress. Today's GC is not Wu xiameng. Java uses the Virtual Machine (VM) mechanism to manage program running, including GC management. In the late 1990s S,. net emerged. net was managed by CLR (Common Language Runtime) in a way similar to Java. The emergence of these two camps has brought people into the development age based on virtual platforms, and GC has become increasingly popular at this time.
Why is GC used? It can also be said why automatic memory management should be used? There are several reasons:
1. Improved the abstraction of software development;
2. programmers can focus on actual problems without being distracted to manage memory problems;
3. The module interface can be clearer and the coupling between modules can be reduced;
4. The Bug caused by improper memory management is greatly reduced;
5. Make memory management more efficient.
In general, GC can help programmers get rid of complicated memory problems, thus improving the speed, quality, and security of software development.
 

What is GC?

GC, as its name is, is garbage collection. Of course, this is only for memory. Garbage Collector (Garbage Collector, which also becomes GC without confusion) uses the root of the application to traverse all objects dynamically allocated by the application on Heap [2], identify whether they are referenced to determine which objects are dead and which need to be used. The object that is no longer referenced by the root of the application or another object is the dead object, that is, the so-called garbage, which needs to be recycled. This is how GC works. To achieve this principle, GC has multiple algorithms. Common algorithms include Reference Counting, Mark Sweep, and Copy Collection. Currently, the mainstream virtual systems. net CLR, Java VM and Rotor both adopt the Mark Sweep algorithm.

1. Mark-Compact Mark Compression Algorithm
The. net gc algorithm is regarded as the Mark-Compact algorithm.
Phase 1: Mark-Sweep Mark clearing stage
Assume that all objects in heap can be recycled first, then find out the objects that cannot be recycled and mark these objects. Finally, all objects not marked in heap can be recycled.
Phase 2: Compact compression phase
The heap memory space becomes discontinuous after the object is recycled. Moving these objects in heap enables them to be arranged continuously from the heap base address, similar to the disk space fragmentation.
 
After Heap memory is recycled and compressed, you can continue to use the heap memory allocation method, that is, you can use only one pointer to record the start address of heap allocation.
Main processing steps: thread suspension => OK roots => Create reachable objectsgraph => Object recycling => heap compression => pointer repair
It can be understood as follows: The reference relationships of objects in heap are complex (cross-reference, and circular reference) to form a complex graph. roots is a variety of entry points that CLR can find outside heap. Logsearch by GC includes global objects, static variables, local objects, function call parameters, and object pointers (finalizationqueue) in the current CPU register. It can be classified into two types: initialized static variables and objects in use by the thread (stack + CPU register)
Reachable objects: refers to the objects that can be reached from the roots based on the object reference relationship. For example, if the local variable object A of the current function is A rootobject and its member variable references object B, B is A reachable object. You can create a reachable objectsgraph from the roots. The remaining object is unreachable and can be recycled.
 
The pointer is fixed because the compact process moves the heap object and the object address changes. You need to fix all reference pointers, including the pointer in stack, CPUregister, and the reference pointer of other objects in heap.
There is a slight difference between the Debug and release execution modes. In the release mode, the objects not referenced by subsequent code are unreachable. In the debug mode, the objects will become unreachable only after the current function is executed, the purpose is to track the content of a local object during debugging.
The hosted object passed to COM + will also become the root, and has a reference counter to be compatible with the memory management mechanism of COM +. When the reference counter is 0, these objects may become recycled objects.
Pinnedobjects refers to the objects that cannot be moved after allocation. For example, if an object is passed to an unmanaged code (or the fixed keyword is used), GC cannot modify the reference pointer in the unmanaged code During pointer repair, therefore, moving these objects will cause an exception. Pinnedobjects can cause heap fragments, but in most cases, objects passed to unmanaged code should be reclaimed during GC.
Ii. Generational Algorithms
The program may use several hundred MB or several GB of memory. GC is very expensive for such memory areas. The generational algorithm has a certain statistical basis, which significantly improves GC performance.
Objects are divided into new and old objects according to their lifecycles. Different recycling policies and algorithms can be used for new and old regions based on the results of statistical distribution rules, strengthen the collection and processing of new regions, and strive to recycle a large number of recently discarded local objects in the execution path at a low cost within a short interval and a small memory area.
Preconditions for the generational algorithm:
1. A large number of newly created objects have shorter lifecycles, while older objects have longer lifecycles.
2. reclaim part of the memory is faster than reclaim all the memory.
3. Newly created objects are usually highly correlated. The objects allocated by heap are continuous, and the correlation degree is strong to improve the CPU cache hit rate.
. NET divides heap into three age zones: Gen 0, Gen 1, and Gen 2
 
Heap is divided into three age-based regions. There are three GC Methods: # Gen 0 collections, # Gen 1 collections, # Gen 2 collections. If Gen 0 heap memory reaches the threshold, the 0 generation GC is triggered. After the 0 generation GC, the surviving objects in Gen 0 enter Gen1. If the memory of Gen 1 reaches the threshold value, 1 generation GC is performed. 1 generation GC recycles Gen 0 heap and Gen 1 heap together, and the surviving object enters gen2. Gen 0 heap, Gen 1 heap, and Gen 2 heap are collected by two generations of GC.
Gen 0 and Gen 1 are relatively small, and the age of these two generations is always around 16 m. The size of Gen2 is determined by the application and may reach several GB, therefore, the cost of the 0-and 1-generation GC is very low. The 2-generation GC is called the fullGC, which usually costs a lot. The rough calculation of the 0-and 1-generation GC should be completed in milliseconds to dozens of milliseconds. The fullGC may take several seconds when the Gen 2 heap is large. In general, the GC frequency of generation 2, Generation 1, and generation 0 during the. NET application runtime should be roughly.

 

3. Finalization Queue and Freachable Queue

The two queues are related to the Finalize method provided by the. net object. These two queues are not used to store real objects, but to store a group of pointers to objects. When the new operator is used in the program to allocate space on the Managed Heap, GC will analyze it, if the object contains the Finalize method, add a pointer to the object in the Finalization Queue. After GC is started, the Mark stage is used to identify the garbage. Search in the spam. If any pointer in the Finalization Queue points to the object in the spam, the object will be separated from the spam, and move the pointer to it to Freachable Queue. This process is called the Resurrection of an object. The dead object is saved. Why do we need to save it? Because the Finalize method of this object has not been executed, it cannot be killed. Freachable Queue does nothing at ordinary times, but once a pointer is added in it, it will trigger the Finalize method execution of the indicated object, and then remove the pointer from the Queue, this is the object that can die quietly .. The System. GC class of net framework provides two methods to control Finalize, ReRegisterForFinalize and SuppressFinalize. The former is the Finalize method that requests the system to complete the object, and the latter is the Finalize method that requests the system not to complete the object. The ReRegisterForFinalize method is to re-Add the pointer to the object to the Finalization Queue. This is a very interesting phenomenon, because the objects in the Finalization Queue can be re-generated. If you call the ReRegisterForFinalize method in the Finalize method of the object, in this way, an object that will never die on the stack is formed. Like the Phoenix Nirvana, it can be resumed every time it dies.

 

Managed resources:

All types in. Net are (directly or indirectly) derived from the System. Object type.

The type in CTS is divided into two categories: reference type (managed type), which is allocated to the memory stack and value type ). Value types are allocated on the stack.

The value type is in the stack. When it comes out first, the Life Sequence of the value type variable is sequential. This ensures that the value type variable will release resources before the scope is released. It is simpler and more efficient than the reference type. Stack is allocated from high address to low address

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.