Python garbage collection mechanism

Source: Internet
Author: User

Tag: Set operation BSP causes reference mark highlight base to produce

The GC module of Python mainly uses "reference count" (reference counting) to track and recycle garbage. On the basis of the reference count, you can also solve the problem of circular references that can be generated by the container object through mark-clear (Mark and sweep). Further increase the efficiency of garbage collection by "generational recycling" (generation collection) for space Exchange time.

One, reference counting

In Python, the life cycle of most objects is managed by the object's reference count. Broadly speaking, reference counting is also a garbage collection mechanism, and it is also one of the most intuitive and simplest garbage collection techniques.

Principle: When a reference to an object is created or copied, the object's reference count is added 1, and when an object's reference is destroyed, the object's reference count is reduced by 1, and when the object's reference count is reduced to 0 o'clock, it means that the object has not been used by anyone, and it frees up the memory it occupies.

Although the reference count must include the action of managing the reference count each time the memory is allocated and freed, the reference count has a maximum of one, "real-time", and any memory, once no reference to it, is reclaimed immediately, compared to other mainstream garbage collection techniques. Other garbage collection counts must be under some special conditions (such as memory allocation failure) in order to reclaim invalid memory.

Reference counting mechanism execution efficiency issue: The additional operation of the reference counting mechanism to maintain the reference count is proportional to the amount of memory allocated and released in the Python run and the number of times that the reference is assigned. This is a weakness compared to other mainstream garbage collection mechanisms, such as "mark-clear" and "stop-copy", because the additional operations that these technologies bring are essentially related to the amount of memory to be recycled.

If execution efficiency is only a weakness of the reference counting mechanism, then unfortunately, there is a fatal weakness in the reference counting mechanism, and it is precisely because of this weakness that the chivalrous garbage collection has never included the reference count, which can cause this fatal weakness to be a circular reference (also called a cross-reference).

Problem Description:

Circular references can make the reference count of a group of objects not 0, but these objects are not actually referenced by any external objects, they are only references to each other. This means that no one will ever use this set of objects, the memory space occupied by this set of objects should be reclaimed, and then because of the existence of mutual references, each object's reference count is not 0, so the memory occupied by these objects will never be freed. Like what:

A = []b = []a.append (b) b.append (b) print a[[[...]] Print b[[[[...]]

This is fatal, and this is no different from the memory leaks generated by manual memory management.

To solve this problem, Python introduces other garbage collection mechanisms to compensate for the shortcomings of reference counting: "Mark-clear", "generational recovery" two collection techniques.

Second, Mark-clear

"Mark-Clear" is to resolve the issue of circular references. A container object that can contain references to other objects (for example: list,set,dict,class,instance) can produce circular references.

We must admit the fact that if two objects have a reference count of 1, but there is only a circular reference between them, then both objects need to be reclaimed, that is, their reference count is not 0, but actually valid reference count is 0. We have to take the circular reference off first, then the effective count of the two objects appears. Suppose that two objects are a, B, we start with a, because it has a reference to B, then the reference count of B minus 1, and then the reference to B, because B has a reference to a, also the reference to a is reduced by 1, so that the loop-referenced object is completed loop extraction.

However, there is a problem, assuming that object A has an object reference C, and C does not reference A, if the C count reference minus 1, and finally a is not recycled, obviously we mistakenly subtract the reference count of C by 1, which will result in a dangling reference to C at some point in the future. This requires that we have to recover the reference count of C without being deleted, and the complexity of maintaining the reference count will multiply if you adopt such a scenario.

Principle: "Mark-clear" takes a better approach, we do not change the actual reference count, but instead copy a copy of the reference count of the object in the collection, altering the copy of the object reference. Any changes made to the copy do not affect the maintenance of the object's life.

The only function of this count copy is to look for the root object collection (objects in the collection cannot be recycled). When the root object collection is successfully found, the current memory list is divided into two, a linked list maintains the root object collection, becomes the root list, and the other list maintains the remaining objects and becomes the unreachable linked list. The reason for the two linked list is based on a consideration: Now unreachable may exist in the root linked list of objects, directly or indirectly referenced objects, these objects can not be recycled, once in the process of tagging, the discovery of such objects, Move it from the unreachable linked list to the root list, and when the tag is complete, all the remaining objects in the unreachable list are real garbage objects, and the next garbage collection is only limited to the unreachable linked list.

Third, the generation of recycling

Background: Generational garbage collection is a garbage collection mechanism developed in the early 80, and a series of studies have shown that no matter which language is developed, regardless of the type of development, and the size of the program, there is the same point. That is: a certain proportion of memory block life cycle is relatively short, usually millions of machine instruction time, and the remaining memory block, the survival period is longer, even from the beginning of the program until the end of the program.

From the garbage collection mechanism of "mark-clear" above, the additional operations brought by this garbage collection mechanism are actually related to the total number of memory blocks in the system, and the more memory chunks that need to be reclaimed, the more the additional operations are brought by garbage detection, and the less the additional operations are caused by garbage collections; When the amount of memory that needs to be reclaimed is less, garbage detection will result in less additional operations than garbage collection. In order to improve the efficiency of garbage collection, the use of "space-time strategy."

Principle: all memory blocks in the system are divided into different sets according to their survival time, each set becomes a "generation", and the frequency of garbage collection decreases with the increase of the survival time of "generation". In other words, the longer the object, the less likely it is to be garbage, the less frequent the garbage collection. So how to measure this survival time: usually measured by several garbage collection actions, if an object passes more garbage collection, it can be concluded that the object will survive longer.

Python garbage collection mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.