Some understanding of the Python garbage collection mechanism

Source: Internet
Author: User

Overview: Garbage collection is primarily done by reference counting, which means that when an object is not referenced by another object, it frees up memory. But there will be some circular reference objects, through the above method, there is no way to clear out. So, Python has another mechanism to solve this problem, that is, mark-erase. Mark-Clear: The main process is to scan all container objects (not to scan int, string, these simple objects, because they cannot contain references to other objects, do not cause circular references), and in one way these objects are divided into two parts, some of which can be deleted, some of which cannot be deleted.    The objects that can be deleted are then recycled. So first of all, how do you organize these container objects? Python uses a doubly linked list to track these objects, and all container objects are inserted into the list after they are created. Each container object will have a pygc_head structure inside it.
  1. typedef union _gc_head {
  2. struct {
  3. union _gc_head *gc_next;
  4. union _gc_head *gc_prev;
  5. Py_ssize_t gc_refs;
  6. } gc;
  7. double dummy; /* force worst-case alignment */
  8. } PyGC_Head;
In this structure, it is clear that Gc_next and Gc_prev are two pointers to implementing a doubly linked list, which is used when the  gc_refs is resolved in a bidirectional reference.     By the above method, each time a container object is created, he is added to the list of internally maintained collection objects. In this way, each time you perform a garbage collection, you can iterate through the linked lists to mark the cleanup. However, there is a problem, when the garbage collection, the program is suspended execution, garbage collection will not continue to execute after the end. Each time garbage collection is performed, all container objects need to be traversed, and if the objects in the current process are more numerous, the execution efficiency of the program will be affected. An optimization approach is to: generational collection     generational collection         Generational collection based on a statistical fact: During program execution, some chunks of memory are allocated in a very short time and then released. The memory blocks that survive longer are less likely to be released, and may even survive the execution of the program, which in fact represents a small percentage. Therefore, Python divides the surviving memory blocks of the system into three different generations according to their survival time (the young generation, the green age, the old age). Each generation corresponds to a doubly linked list described above. The newly created container object will be added to the young generation, and if he survives after several recoveries, put him in the youth generation. In turn. This can reduce the frequency of the old age and the youth generation of scanning. Because older chunks of memory are less likely to be released after scanning, the younger generation is prioritized for scanning.     The number of objects that each generation (doubly linked list) accommodates is limited, and when this limit is exceeded, a token-purge process is triggered. The current version (python3.5.2, the young generation is 700, the youth generation and the old age is ten)     So, how to mark-clear the process? For example, we have a garbage collection for the youth generation. We need to find objects in this list that are referenced by objects other than the linked list into the root object collection. From these root objects, you can then iterate through the list of collected links to find the object that needs to be deleted (that is, the object that has a circular reference) into the unreachable collection. The objects in the Unreachable collection are then reclaimed.   How do I find which objects are circular references? Suppose that both object A, and B have a reference count of 1, but a refers to b,b and references a, so A and B can be recycled. We need to identify this object and put him in the UNREAC.Hable in the collection. So we first need to remove the circular reference, we first iterate through the list of collections, each of which the reference count of each object is reduced by 1, so that the reference count of A and B becomes 0. Thus, if his reference count is still greater than 0 for the remainder of the object, the object cannot be deleted (since there is more than one object referencing him), we put it in the root object collection. (PS: Here is a problem, false into the reference count of object C minus 1, this time the reference count of object C is 0, but in fact, C is a reference to other objects, it is not a problem, here is used in the above pygc_head inside the gc_refs,  That is, not directly manipulating the reference count of the object C, but copying a copy, which is used to do this.     Now we have a root object collection, and the objects in this collection cannot be deleted. Therefore, objects referenced by these objects are also not deleted. Now you only need to traverse the list of collections to find the Unreacheble object. After the token is completed, the collection is performed.   NOTE:

When Python implements the __del__ () method on its own, for such an object, as defined by Python, the function needs to be called before releasing the resource occupied by the object. Because the Python garbage collection mechanism does not guarantee the order of garbage collection, the B object may still be called in a.__del__ () after B is removed, which will cause an exception.

To do this, Python takes a more conservative approach, which means that the object is not garbage collected when __del__ () is present for a custom class. Such objects, Python will be placed directly into a garbage list, this list will not be released during the run, there is no memory leak for Python, but for the program, there has actually been a memory leak

Some understanding of the Python garbage collection mechanism

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.