Transferred from: http://blog.codingnow.com/2008/06/gc.html
In essence, both the reference counting policy and the garbage collection strategy belong to the automated management of resources. The so-called automated management is the logic layer does not know when the resources are released, and rely on the underlying library to maintain the life of the resource.
and manual management, it is accurate to know the life of the resource, in an accurate location to reclaim it. In C + +, it is embodied in the destructor that the resources used in the delete are written, and the code that is automatically generated by the compiler destructors the base class and member variables.
Therefore, writing a garbage collector for C + + does not conflict with manual management of resources. Automated management is used in almost all of the C + + projects of all sizes, using only a reference counting strategy rather than garbage collection. That said, we used C + + or C for a long time to combine manual management and automatic management in building systems. Regardless of the reference count, or garbage collection, the software implementation of the details, the manual management of the place we can still manually manage.
Why use resource life time to manage automatically?
Let's look at object-oriented, and if everything is the object, the life of each object should be the responsibility of itself, we can directly and accurately the time of death. Unfortunately, there are many things that are not purely objects. The most important one is the object container. They maintain a reference to a group of similar objects in addition to their own properties.
An object can be referenced separately by several containers, which makes the container different from the cat and dog object entities. Because the container references a thing that is not equal to this thing is part of the container (sometimes, sometimes not). When we divide the whole world into objects, all atoms are divided into layers of objects, and we find that 00 Total concepts cannot be extracted with objects. Reference rather than possession, this is unavoidable.
The essence of object-oriented is that many objects are extracted in a common place. In this way, the use of all kinds of containers can not be avoided.
It is also true that the object himself does not know whether or not he has been able to declare death. Unless you know your connection to another object (this relationship is not an object). Resources can be objects, and automation management is precisely the relationship between these objects and objects that are managed.
Reference counting is one of the easiest scenarios to implement: Record the number of times an object is referenced, not exactly who references it. This reduces the cost of establishing and de-referencing. But there must be something to lose. In the process of reference counting, we also lost important information: who quoted themselves. Therefore, the reference count increases in the cost of handling indirect references.
The judgment of the object of death is: whether the object and the world are still connected, either directly or indirectly. Therefore, an object may be out of the world even if there is another object that directly references it. To solve this problem, a system that uses a reference count must notify the object and its associated objects when the object and the world are detached. The cost of destroying objects is increased, which is the short board that refers to the counting strategy.
The frequency at which objects are destroyed depends on the average lifetime of the objects. The life time of the object, on the one hand, is affected by the granularity of the object, the finer the granularity of the object, the shorter the average life time of the object (although there is no direct contact on the surface, but the actual design often leads to this result); On the other hand, we tend to implement the container and the reference relationship as an object (conceptually For example, many smart pointers that automatically maintain reference counts are a small container that maintains a unique reference to an object and is implemented as a small object.
In general, the nature of the object itself does not change with its position in the memory space. But a reference relationship, usually implemented with pointers, is related to memory addresses. C + + lacks a semantic representation of an object moving in memory, with the equivalent of copying a new object in a new block of memory and destroying the original.
On the other hand, in the program's running sequence, the nested scopes on the stack caused by the function call can also be considered as containers, and the machine instructions traverse through these scopes, and the temporary constructed references (smart pointers) to the objects are placed in these scopes. The more frequently a function is called, the more frequently these scopes are created and destroyed.
This results in C + + having to rely on a large number of inline functions, allowing the compiler to learn more about contextual information before it can mitigate the burden of small objects (smart pointers) creating destruction. The STL library must also do some optimizations for it, such as the STL port, which is an exception to the POD type. Unfortunately, smart pointers are not pods, which makes the compiler smart enough to add and subtract references in the merge execution sequence, which is almost impossible, given the multithreading factor, unless the compiler can know the thread's information.
C + + has many advantages over C in implementing object-oriented programming. One of these is that, when describing an object as part of another object, the lifetime of the relevant part can be maintained automatically through the construction and destructor mechanisms. But what it fails to address in the language is how life is handled when the relationship is simply a reference. The former, we have almost the only clear and concise solution, while the latter according to actual needs can have a variety of options, gu C + + at the language level does not provide a consistent solution. Unfortunately, C + + has always been able to provide a simple and easy to use, with a universal GC library. Everyone favors a more easily implemented reference counting scheme, which is related to the complexity of the implementation. After all, when implementing GC, C lacks the necessary language support (and C + + is developed from C on the implementation level).
Then take a look at garbage collection, a more mature algorithm based on the Mark Cleanup (or tag grooming) or its variants. Simply put, it is the collector framework that records the connection between objects and objects (the location of these contact information is not important, can be in the object's memory layout space, or in a separate place, the key is that the information can be accessed by the collector). Determine the root of a world, and periodically begin to traverse the world from this root, mark objects associated with them, and finally reclaim objects that are not marked.
From an algorithmic point of view, the time cost of establishing a connection between an object and an object is consistent with the time cost of the reference count, both O (1). But in practice, the cost of the former is usually greater. The space cost is also the former slightly larger, but there is no difference in order of magnitude.
GC-managed objects are much less expensive to destroy. It does not need to be notified and it has an associated object.
That's why many software that uses GC is sometimes more efficient than using reference counting software.
However, the GC has an additional time cost derived from the process of tagging. Completing a complete cleanup process will inevitably traverse every living object in the world. The cost is O (n), and n increases as the overall number of objects increases. So we should reduce the number of objects managed by GC, and at this point, manual management still makes sense. That is, when you explicitly define an object as part of another object, you can consider a manual management approach.
Another bad thing is that when implemented, we tend to place the association information between objects in the memory layout space of the object itself, traversing the object in the world means accessing the memory of all objects. This means that the page is exchanged when the virtual memory space is larger than the actual physical memory space. I think, to a large extent, the vast system of languages such as Java or C # is occasionally running slowly, and the root cause is here. Of course, these can be improved. is not the problem of the algorithm itself.
It can be said that the GC (garbage collection) has shifted the cost of destroying the short-term objects in RC (reference counting) to a one-time mark-clearing process. This orthogonal decomposition of logical processing and resource management. This decomposed problem will make it easier to improve performance (such as multicore development) as hardware progresses. However, this advantage is not too obvious in smaller software or standalone modules. Instead, the GC itself is much more complex than RC, and will become its Achilles heel.
There is no need for object-oriented software, even for resource automation management. At this point, both the GC and RC are useless.
The humble garbage collector I've made is just trying to make some simple attempts to build software for the C or C + + language with a few more choices.
(go) Comparison of reference counts with garbage collection