In-depth analysis of Python's garbage collection mechanism, python garbage collection
I. Overview:
The GC module of Python uses reference counting to track and recycle garbage. Based on the reference count, you can also use mark and sweep to solve the possible circular reference problem of container objects. The generation collection is used to increase the efficiency of garbage collection in exchange for space.
Ii. Reference count
In Python, the lifecycle of most objects is managed by the reference count of objects. Broadly speaking, reference counting is also a garbage collection mechanism and the most intuitive and simple garbage collection technology.
Principle: when an object's reference is created or copied, the reference count of the object is incremented by 1. When an object's reference is destroyed, the reference count of the object is reduced by 1; when the reference count of an object is reduced to 0, it means that the object has not been used by anyone and the memory occupied by it can be released.
Although the reference count must be added to manage the reference count every time the memory is allocated and released, the reference count is the largest compared with other mainstream garbage collection technologies, that is, "real-time". Any memory will be immediately recycled once it is referenced. Other garbage collection counts must be collected under certain special conditions (for example, memory allocation fails) to recycle invalid memory.
Execution efficiency of the reference counting mechanism: the additional operations for maintaining the reference counting caused by the reference counting mechanism are proportional to the memory allocation and release in the Python runtime. This is a weakness compared with other mainstream garbage collection mechanisms, such as "mark-clear" and "stop-Copy, because the additional operations brought by these technologies are basically related to the amount of memory to be recycled.
If the execution efficiency is only a weakness of the reference counting mechanism, it is unfortunate that the reference counting mechanism still has a fatal weakness, so that chivalrous garbage collection never includes the reference count, the weakness that can lead to this is loop reference (also known as cross reference ).
Problem description:
Loop reference can make the reference count of a group of objects not 0. However, these objects are not actually referenced by any external objects, and they are only referenced by each other. This means that no one will use this group of objects. The memory space occupied by this group of objects should be recycled. Because of the mutual reference, the reference count of each object is not 0, therefore, the memory occupied by these objects will never be released. For example:
a = []b = []a.append(b)b.append(a)print a[[[…]]]print b[[[…]]]
This is fatal, which is no different from the memory leakage caused by manual memory management.
To solve this problem, Python introduces other garbage collection mechanisms to make up for the defects of reference counting: "mark-clear" and "generational Recycle.
3. Mark-clear
Mark-clear is used to solve the problem of circular reference. Container objects that can be referenced by other objects (such as list, set, dict, class, and instance) may generate circular references.
We must acknowledge the fact that if the reference count of both objects is 1, but there is only a circular reference between them, both objects need to be recycled. That is to say, although their reference count is not 0, the actual valid reference count is 0. We must remove the circular references first, so the valid counts of these two objects will appear. Assume that two objects are A and B. We start from A, because it has A reference to B, then the reference count of B is reduced by 1, and then the reference reaches B, because B has A reference to A, it also reduces the reference of A by 1, which completes the loop removal of the circular reference objects.
However, there is A problem. Assume that object A has an object that references C, and C does not reference A. If the reference of C count is reduced by 1, and A is not recycled in the end, obviously, we mistakenly reduce the reference count of C by 1, which will lead to a suspended reference to C at some point in the future. This requires that we must restore the reference count of C if A is not deleted. If this solution is used, the complexity of maintaining the reference count will multiply.
Principle: "mark-clear" adopts a better practice. Instead of modifying the actual reference count, we copy the reference count of objects in the collection, modify the copy referenced by this object. Any changes made to the copy will not affect the maintenance of the object's lifecycle.
The unique function of this counting copy is to find the root object set (the objects in this set cannot be recycled ). After finding the root object set, we first split the current memory Linked List into two parts. One linked list maintains the root object set and becomes the root linked list. The other linked list maintains the remaining objects, becomes an unreachable linked list. The reason for splitting into two linked lists is based on the following considerations: the current unreachable may exist objects in the root linked list, directly or indirectly referenced objects, these objects cannot be recycled. Once such objects are found during the marking process, they are moved from the unreachable linked list to the root linked list. After the marking is completed, all the remaining objects in the unreachable linked list are truly spam objects. The subsequent garbage collection only needs to be restricted in the unreachable linked list.
Iv. Generational recovery
Background: generational garbage collection technology was developed in the early 1980s S. A series of studies have shown that regardless of the language used for development, no matter what type or scale the program is developed, there is such a point of similarities. That is, a certain proportion of memory blocks have a shorter life cycle, which is usually the time for millions of machine commands, while the remaining memory blocks have a longer life cycle, it may even continue from the beginning to the end of the program.
From the preceding garbage collection mechanism such as "mark-clear,The additional operations brought about by this garbage collection mechanism are actually related to the total number of memory blocks in the system. When there are more memory blocks to be recycled, the more additional operations are involved in spam detection, the less additional operations are involved in garbage collection. On the contrary, when there are fewer memory blocks to be recycled, spam Detection will lead to fewer additional operations than garbage collection. To improve the efficiency of garbage collection, the "space-for-time policy" is adopted ".
Principle: divide all memory blocks in the system into different sets based on their survival time, and each set becomes a "Generation ", the garbage collection frequency decreases with the survival time of the "Generation. That is to say, the longer the active object, the more likely it is to be garbage, the less frequent it is to collect garbage. How to measure the survival time is usually measured by the number of garbage collection actions. If an object is collected more times, the following result can be obtained: the longer the object will survive.
Example:
When some memory block M remains alive after three garbage collection, we will move the memory block M to A set, the newly allocated memory is divided into Set B. When garbage collection starts, in most cases, garbage collection is only performed on collection B, and garbage collection on collection A takes A long time to complete, this reduces the amount of memory to be processed by the garbage collection mechanism, and naturally improves the efficiency. In this process, some memory blocks in Set B will be transferred to set A due to A long survival time. Of course, there are actually some garbage in set, the garbage collection will be delayed due to this generational mechanism.
In Python, there are three "generations" in total, that is, Python actually maintains three linked lists. For details, refer to the Python source code for details.
Advantages and principles of JAVA's garbage collection mechanism, and two collection mechanisms are considered
A notable feature of Java is the introduction of the garbage collection mechanism, which helps c ++ programmers solve the most troublesome memory management problems, it makes memory management unnecessary for Java programmers when writing programs. Because of the garbage collection mechanism, objects in Java do not have the "Scope" concept, and only objects can be referenced with "Scope ". Garbage collection can effectively prevent memory leakage and effectively use available memory. The garbage collector is usually used as a separate low-level thread to clear and recycle objects that have died or are not used in the memory heap for a long time, programmers cannot call the Garbage Collector to recycle an object or all objects in real time. The collection mechanism involves generational replication, garbage collection, marking, and incremental garbage collection.
For the garbage collection mechanism
If the object you created has references, the garbage collection mechanism will not be recycled because it is not usable by the peas. It does not know whether the object is needed or not, but if it is given null, the object to which the original variable points will not be referenced, So garbage collection can be automatically recycled. Therefore, the garbage collection mechanism can only recycle the space of objects that he considers as spam.
If:
Student s = new Student ();
Actually defined in a method, when the method is returned after execution, the s compound garbage collection mechanism (provided that no variable points to this object ).