Inadvertently see JavaScript V8 and Python as the basis for garbage collection using reference counts and tag cleanup, wondering if there are other garbage collection algorithms?
Reply content:
In front of a lot of answers, which have some reliable content, feel to add also very troublesome, I put a book I recommend it:
[Garbage Collection] [Garbage Collection] [Automatic useless Memory Unit recycling] related readings
The Garbage Collection Handbook is a very eye-opener and a perfect answer to the question of the Lord.
If you don't want to buy a book, start learning from this website: Introduction to memory management
Also welcome to the forum on this HLLVM group, high-level language virtual machine
I've posted a few posts before, such as introducing the basic implementation of copying GC: A problem with HotSpot VM Serial GC
And the implementation of the G1 GC: [HotSpot VM] Ask the principle of G1 algorithm
And comparison of several GC: why the Concurrent garbage collector (CMS) is not using tags
Discussion on reference count and tracing GC I'll just put a portal. How does reference counting maintain all object references in the garbage collection mechanism? -Rednaxelafx's answer
=======================================================
In addition to the problem description spit a slot:
inadvertently see JavaScript V8 and Python as the basis for garbage collection using reference counts and tag cleanup, wondering if there are other garbage collection algorithms?
C Python is based on the reference count, Mark-sweep for backup yes.
But V8 's GC is not purely based on mark-sweep.
Initially released, V8 was also relatively simple, using a two-generation GC, where new space was copying GC, and the global GC selectively used Mark-sweep or mark-compact based on fragmentation.
Its abstract thinking can be seen in this entry function: v8/mark-compact.cc at 0.1 · V8/V8 GitHub
void Markcompactcollector::CollectGarbage() { Prepare(); markliveobjects(); Sweeplargeobjectspace(); if (Compacting_collection_) { encodeforwardingaddresses(); updatepointers(); relocateobjects(); rebuildrsets(); } Else { sweepspaces(); } Finish();}
Feel the Reference counting on Wikipedia
and tracing garbage collection
The entry is still quite good. In general, GC can be categorized in a number of ways:
Some GC must traverse the graph of the object that needs the GC to obtain a precise information about which object is alive and which object is dead. We call this GC tracing GC, which does not need to traverse the called Non-tracing GC (such as reference counting, which can only obtain an approximate information, so it cannot process the ring in the diagram).
Some GC requires a programmer/compiler to cooperate to know exactly which pointer is the object (the one that needs the GC). Some GC does not need to rely on guessing (guessing is also difficult!). Can not guess can only be regarded as pointer, but also to do GC. The former is called the precise GC, which is called the Conservative GC (such as Boehm GC). We mainly discuss precise GC below.
After some GC allocates memory, this block of memory may be moved to another place to prevent memory fragmentation and improve cache locality (caching locality, how to translate it). ), this GC is called the moving GC, and the GC that does not do so is called the non-moving GC. Moving GC is naturally tracing GC, because they have to know how to traverse a graph of objects that need a GC, otherwise they can't move (after all, when moving an object, you also change the value of where the object is stored, pointing to the new location).
Some GC processes the entire object graph at once, while some GCs are optimized to handle only the newer objects in a fraction of the time. This optimization is based on a phenomenon: The new object is easier to point to the old object, while the old object is less pointed to the new object, the new object is relatively easy to die, and the objects that live longer are likely to live longer. Many programming methods can cause this phenomenon, such as immutable data structures and so on. So targeted, GC can distinguish between the age of the object, the memory allocation of the region into the (larger) old area and (smaller, in order to cache the local) new district, and then depending on whether the base is full or not, each GC to determine whether only the GC new zone or all need GC. If you only need the GC new district, then traverse the object graph as long as the object is encountered in the old area, directly as the object is alive, the following diagram will not be seen, directly cut off. Met the object of the new district, based on the object survived several times the GC to see if you want to move it to the old area. Of course, this phenomenon is not absolute, or the old object point to the new object of the situation, how to do? This is going to change the object every time (this practice is called GC write barrier, is to have a check every time the object is modified), to check whether the modified object is not the old object, the modified value is not a new object, if it is true, then use a method (such as remembered Set,card marking, etc.) to remember this exception. This is what the GC calls the generational GC.
The most common way of non-tracing GC is reference counting, which can be seen in Python,objective-c,c++,rust. An easy-to-read implementation is the shared_ptr of Madeleine Libstdc++, who really likes to see C + + protobuf Protobuf/shared_ptr.h at Master Google/protobuf GitHub
Readability is good) rust/rc.rs at Master Rust-lang/rust GitHub
Naive Mark-and-sweep is a simpler tracing GC that needs to traverse two object graphs, the first mark, and the second sweep. I have a toy Scheme interpreter used it: Overminder/sanya-c GitHub
, see SGC.C (there is still a lot of noise in the code.) Because stack is not entirely a root set, you need to avoid some locations)
Cheney ' s semi-space copying GC
is a relatively simple kind of moving GC, is allocated 2 blocks as large as the memory, the first block ran out of the GC, the living object moved to the second block up, dead on the no matter, the cycle. I have a toy Scheme that uses it in the JIT: overminder/sanya-native GitHub
, see Gc.cpp.
I also have a toy Scheme compiler that implements a simple generational gc:yac/scm_generational_gc.c at Master Overminder/yac GitHub
。 Only 2 generations, using the remembered set. This is really more than the implementation of the other GC is complex, was also a variety of segfault, with Valgrind Debug for a long time ...
——————————
Modified at night, it should be called precise GC instead of exact GC. Added an implementation of a shared_ptr that can be seen. Garbage collection algorithm can be divided into two categories: one is based on the reference counting reference counting method is based on tracing, this kind of expanded copy collection, tag sweep, tag compression, Generation collection and later G1 the whole heap division, Make a remember set for each block to manage. The specific deployment and advantages and disadvantages of these methods and the corresponding memory allocation methods can be garbage Collection Handbook
Found in the second edition (2011). G1 had to search the Sun Company's paper. Before answering a similar question, you can refer to: What garbage collection algorithms are used in the implementation of various programming languages? What are the advantages and disadvantages of these algorithms? -Shai Zhili's answer reference count is the way to determine which objects need to be recycled, while tag replication tag cleanup and tagging is garbage collected. Mark-and-sweep Garbage Collection
Csapp mentioned a little, but I personally do not remember ... Reference count GC processing what is a reference count
Reference counts are a form of garbage collection, and each object has a count to record how many references to it. Its reference count is transformed as in the following scenario
- When an object adds a reference, such as assigning a value to a variable, a property, or passing in a method, the reference count performs the plus 1 operation.
- When an object reduces a reference, such as a variable leaving the scope, the property is assigned another object reference, the object that contains the property is recycled, or the method that passed in the parameter is returned, the reference count performs a minus 1 operation.
- When the reference count becomes 0, the object is not referenced and can be marked as garbage for recycling.
Reference traversal GC Processing what is Reference object traversal
The garbage collector iterates through the objects from the point known as GC roots, and all the points that can be reached are marked as surviving, and the unreachable objects in the heap are marked as garbage and then cleared away. What are GC roots
- Class, a class that is loaded by the System class loader. These classes are never unloaded, and they can hold references to objects in a static property way. Note that a class that is normally loaded by a custom class loader cannot be a GC Roots
- threads, Surviving threads
- Local variables or parameters in the Java method stack
- Local variables or parameters in the Jni method stack
- JNI Global Reference
- Objects for synchronous monitoring
- Objects held by the JVM, which are not recycled by GC for special purposes. These objects may be the class loader for the system, some important exception handling classes, some objects reserved for handling exceptions, and some custom class loaders that are performing class loading. However, the specific objects mentioned above depend on the specific JVM implementation.
Learn more about how you can access the garbage collector to handle circular references
There are four main types of GC (Garage Collection).
1. Reference counting: Each object has a counter, record the number of references, counter 0 when the GC.
2. Mark and Sweep: Traversal (which can traverse to a description and reference exist) each can also find the object and mark it, and then GC all the objects without a tag.
3. Copy Collection:2 revision, 2 in sweep need to scan all the objects in the heap, the CPU is large. Instead of using replication instead of tags, the cc simply copies the traversed object to another heap and then empties the old heap to complete the GC.
4. Generational Collection: Also 2 revision, because 3 of the replication is also very slow. Simply put the heap into four areas, Eden, Survivor1, Survivor2, tenured. When Eden is full, it triggers the GC operation, which occurs in the Eden, survivor1,2 area. The Eden Zone objects that survived in the GC are moved to the Survivor1 and 2 zones, and the objects in the survivor area survive multiple GC and move to the tenured area. Tenured also have GC, but the frequency is very low. One word is: I check you several times no problem, you reduce the frequency of checking.
There are a lot of strange and strange GC generally in these kinds of improvements.
For example, the current JVM is generally
Generational Collection + Parallel Collector + Concurrent Mark and Sweep Collector
I don't really like to answer too long and too detailed. Therefore, the shortcomings of their existence, I do not detail. The reference count (reference counting) is not a recycling mechanism, and root accessibility analysis (root reachability analyses) is an algorithm used to determine whether an object survives.
The garbage collection algorithm is the future:
- Mark Purge (Mark-sweep)
- Tag compression (mark-compact)
- Generational collection (generational Collection)
- Copy (Copying)
Read the relevant chapters in three books
1CLR via C #, only describes the best of the giant hard think
2 in-depth understanding of Java virtual machines, JVM advanced features and best practices, the more variants of the algorithm, the process of evolutionary optimization is more detailed.
3cocos2d-x authoritative guide, can be regarded as the introduction of OC, that is, the development of iOS garbage collection, introduced a count-based, more convenient usage.
There are not many pages altogether.