Python GC uses "reference count" (reference counting) primarily to track and recycle garbage.
On the basis of the reference count, the problem of circular references that the container object may produce is resolved through mark-clear (Mark and sweep). Increase the efficiency of garbage collection by "generational recycling" (generation collection) for space Exchange time.
Reference count
In Python, the life cycle of most objects is managed by the object's reference count. In broad terms, reference counting is also a garbage collection mechanism. And is also one of the most intuitive. The simplest garbage collection technology.
Principle: When a reference to an object is created or copied. The reference count of the object is added 1. When a reference to an object is destroyed, the object's reference count is reduced by 1, and when the object's reference count is reduced to 0 o'clock, it means that the object has not been used by anyone. Be able to free up the memory it occupies.
Although the reference count must increase the number of administrative reference counts each time the memory is allocated and freed, it is compared to other mainstream garbage collection techniques. The reference count has a maximum of a bit, i.e. "real-time". No matter what memory, once there is no reference to it, it will be recycled immediately. Other garbage collection counts must be able to reclaim invalid memory under certain special conditions (for example, memory allocation failure).
The reference counting mechanism performs efficiency issues: the additional operations that the reference counting mechanism brings to maintain reference counts and the memory allocations and releases that are made in Python execution. The number of times the reference assignment is proportional. This is a weak point compared to other mainstream garbage collection mechanisms, such as "Mark-clear", "Stop-copy", because the extra operations that these technologies bring are basically only related to the amount of memory to be recycled.
It is unfortunate to assume that the operating efficiency is not yet a weakness of the reference counting mechanism. There is also a fatal weakness in the reference counting mechanism, and it is precisely because of this weakness that the chivalrous garbage collection never includes the reference count, which can cause this fatal weakness to be a circular reference (also called a cross-reference).
Questions:
Circular references enable the reference count of a group of objects to be 0. However, these objects are not actually referenced by any external objects. They are merely references to each other. This means that no one will ever use this set of objects. The memory space occupied by this set of objects should be reclaimed, and the reference count for each object is not 0 because of the existence of mutual references. Therefore, the memory occupied by these objects will never be freed. For example:
a = []
b = []
A.append (b)
B.append (a)
Print a
[[[...] ]
Print B
[[[...] ]
this point is deadly. This is no different from memory leaks caused by manual memory management.
to resolve the issue. Python introduces other garbage collection mechanisms to compensate for the flaw in reference counting: "Mark-erase". "Generational recycling" of two collection techniques.
Mark-Clear
"Mark-clear" is to resolve the issue of circular references. A container object that can include references to other objects (for example: List,set,dict. Class Instance) can produce circular references.
We must admit a fact. If two objects have a reference count of 1. But there is only a circular reference between them, so both of these objects need to be recycled. Other words. Their reference count, though not 0, is displayed. But the actual valid reference count is 0. We have to take the circular reference off first, then the effective count of the two objects appears. If two objects are a, B. We start with a, because it has a reference to B, then the reference count of B minus 1, and then follow the reference to B, because B has a reference to a, the same as the reference of a minus 1, so that the loop refers to the inter-object loop extraction.
But then there is a problem, assuming that object A has an object that references C, and C does not reference a. Suppose the C-count reference is reduced by 1, and the last A is not recycled, obviously. We mistakenly subtract the reference count of C by 1, which will result in a dangling reference to C at some point in the future. This requires that we must recover the reference count of C without being deleted, assuming this scenario is used. Then the complexity of maintaining the reference count is multiplied.
principle: "Mark-clear" uses a better approach, we do not modify the actual reference count, but Copies a copy of the reference count of the objects in the collection. Modifies the copy of the object reference.
Any modification to the copy does not affect the maintenance of the object's life.
The only function of this count copy is to look for the root object collection (objects in the collection cannot be recycled).
After successfully finding the root object collection. First, the current memory list is divided into two, and a linked list maintains the root object collection. becomes the root list, while the other list maintains the remaining objects and becomes the unreachable linked list. The reason for the two linked lists is based on the consideration that today's unreachable may have objects in the root list. Objects that are directly or indirectly referenced by these objects cannot be recycled, once in the process of tagging. When this object is found, it is moved from the unreachable list to the root list. After the mark is finished. All the remaining objects in the unreachable list are the de facto garbage objects, and the next garbage collection needs to be limited to the unreachable linked list.
Generational recycling
Background: Generational garbage collection is a garbage collection mechanism developed in the early 80, and a series of studies have shown that no matter what language is used to develop. Regardless of the type of development, the size of the program, there is such a point.
That is: a certain proportion of the memory block life cycle is relatively short, is generally millions of machine instruction time. And the rest of the memory block. Longer life cycle. Even from the beginning of the program continues until the end of the program.
From the previous "mark-clear" This garbage collection mechanism to see. The additional operations that this garbage collection mechanism brings are actually related to the total number of memory blocks in the system. The more memory chunks that need to be recycled, the more additional operations the garbage test brings, and the less extra operations the garbage collection brings, and vice versa. When the amount of memory that needs to be reclaimed is less, garbage detection will result in less additional operations than garbage collection. To improve the efficiency of garbage collection. Use the "space-for-time strategy".
Principle: The whole memory blocks in the system are divided into different sets according to their survival time, each set becomes a "generation", and the frequency of garbage collection decreases with the increase of the survival time of "generation".
In other words, the longer the object is alive. The less likely it is to be garbage, the less frequent the garbage collection should be. So how to measure this survival time: it is generally measured by several garbage collection actions, assuming that the more garbage collection an object passes, the longer it will survive.
Example:
When some memory blocks M has survived after 3 garbage collection cleaning, we have divided the memory block m into a set a. And the newly allocated memory is divided into set B. When garbage collection starts to work, most cases are garbage collected only for collection B, and collection A is garbage collected for quite a long time, which makes the garbage collection mechanism need to deal with less memory, the efficiency naturally increased.
In this process, some of the memory blocks in set B are transferred to set a because of their long lifetime, and of course there is some garbage in the collection A, which is delayed because of the generational mechanism.
In Python. There is a total of 3 "generations", that is, Python actually maintains 3 linked lists.
Specific knowledge of Python source code can be viewed.
Python garbage collection mechanism