A deep analysis of Python's garbage collection mechanism _python

Source: Internet
Author: User
Tags garbage collection

First, overview:

Python's GC module uses a "reference count" (reference counting) to track and recycle garbage . on the basis of reference counting, you can also solve the problem of circular references that a container object might generate through tag-clear (Mark and sweep) . by "generational recycling" (generation collection) space in exchange for time to further improve the efficiency of garbage collection .

Second, reference count

In Python, the life cycle of most objects is managed through the reference count of objects. In broad terms, reference counting is also a garbage collection mechanism, and it is also one of the most intuitive and simplest garbage collection techniques.

Principle: When a reference to an object is created or copied, object's reference count plus 1, when an object's reference is destroyed, the object's reference count is reduced by 1, and when the object's reference count is reduced to 0 o'clock, it means that the object has not been used by anyone, and can release the memory it occupies.
Although reference counts must be added to manage reference counting every time the memory is allocated and freed, the reference count has one of the biggest, "real time", in comparison to other mainstream garbage collection techniques, and any memory that is not referenced to it is immediately recycled. Other garbage collection counts must be in a special condition (such as a memory allocation failure) to recycle invalid memory.

The reference counting mechanism performs an efficiency problem: the additional operation of reference counting that is brought by the referral counting mechanism is proportional to the amount of memory allocated and released in the Python run, and the number of times the reference is assigned. This is a weakness compared to other mainstream garbage collection mechanisms, such as "tag-purge" and "Stop-copy", because the extra operations of these technologies are basically related to the amount of memory to be recycled.
If execution efficiency is only a weakness of the reference counting mechanism, unfortunately, the reference counting mechanism also has a fatal weakness, and it is because of this weakness that the chivalrous garbage collection has never included the reference count, which can lead to the fatal weakness of circular references (also referred to as cross-references).

Problem Description:

Circular references can make the reference count of a group of objects not 0, but these objects are not actually referenced by any external object, and they are just references to each other. This means that no one else will be using this set of objects and should reclaim the memory space occupied by this group of objects, and then because of the existence of cross-references, each object's reference count is not 0, so the memory occupied by these objects will never be freed. Like what:

A = []
b = []
a.append (b)
B.append (a)
print a
[[[...]
]] Print b
[[...]]]

This is fatal, and this is no different from the memory leaks that are generated by manual memory management.
To solve this problem, Python introduces other garbage collection mechanisms to make up for reference-counting flaws: "tag-clear", "generational recycling" two collection techniques.

Third, Mark-clear

Mark-Clear is to solve the problem of circular references. A container object, such as list,set,dict,class,instance, that can contain references to other objects may produce a circular reference.
We have to admit the fact that if the reference count for two objects is 1, but there is only a circular reference between them, then both objects need to be reclaimed, that is, their reference count, although not 0, is actually a valid reference count of 0. We have to take the circular reference off first, then the valid count of the two objects appears. Suppose two objects are a, B, we start with a, because it has a reference to B, then subtract the reference count of B by 1, then follow the reference to B, because B has a reference to a, and the reference to A is also reduced by 1, so that the loop-referenced object is removed.
But there's a problem, assuming that object A has an object that references C, and C does not refer to a, if the C-count reference minus 1, and the last A is not recycled, obviously, we mistakenly reduce the reference count of C by 1, which will cause at some point in the future to appear a dangling reference to C. This requires us to recover the reference count of C without being deleted, and if so, the complexity of maintaining the reference count will multiply.

Principle: "tag-clear" takes a better approach, and instead of changing the actual reference count, we copy the reference count of the object in the collection and change the copy of the object reference. Any changes made to the copy will not affect the maintenance of the object's life.
The only effect of this count copy is to look for the root object collection (objects in the collection cannot be reclaimed). When the root object collection is successfully found, the current list of memory chains is split into two, a list of which maintains the root object collection, becomes the root list, and the rest of the list maintains the remaining objects, becoming the unreachable list. The reason to split into two linked lists is based on the consideration that there may be objects in the root list, directly or indirectly, that cannot be recycled, once the object is discovered in the process of marking, the unreachable. Move it from the unreachable list to the root list; When the tag is finished, all the objects remaining in the unreachable list are truly garbage objects, and the next garbage collection is limited to the unreachable list.

Four, the generation of recycling

Background: Generational garbage collection is a garbage collection mechanism developed in the early 80, and a series of studies have shown that no matter what language development is used, no matter what type of development, the size of the program, there is a similar point. That is: a certain proportion of the memory block life cycle is relatively short, usually millions of machine instruction time, and the rest of the memory block, the survival cycle is longer, even from the beginning of the program will continue to the end of the program .
From the garbage collection mechanism in front of "mark-clear", the extra operations of this garbage collection mechanism are actually related to the total number of blocks of memory in the system, and the more memory blocks that need to be recycled, the more additional operations the garbage detection brings, and the less the extra work that garbage collection brings; The less memory blocks that need to be recycled, the less the extra action that garbage detection brings to the garbage collection. In order to improve the efficiency of garbage collection, the "Space Time strategy" is adopted.

Principle: all memory blocks in the system are divided into different sets according to their survival time, each set becomes a "generation", and the frequency of garbage collection decreases with the increase of "generation" survival time. In other words, the longer you live, the less likely you are to be garbage, and you should reduce the frequency of garbage collection. So how do you measure this survival time: usually by using a few garbage collection actions, if an object passes through more garbage collection, it can be concluded that the object survives longer.

An example is provided:

When some memory block M is still alive after 3 garbage collection cleaning, we row the memory block m into a set a, and the newly allocated memory is divided into set B. When garbage collection begins to work, most cases are garbage collected only for collection B, and collection A is garbage collected for a considerable amount of time, which makes the garbage collection mechanism less memory to process and improves efficiency. In this process, some of the memory blocks in set B are moved to set a for a long time to live, of course, there is actually some rubbish in set a, which is deferred because of the mechanism of this generational.
In Python, there are a total of 3 "generations", that is, Python actually maintains 3 lists. Detailed knowledge of the Python source code can be viewed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.