Python garbage collection mechanism

Source: Internet
Author: User

There are three main Python garbage collection mechanisms, the first is to use reference counting to track and recycle garbage, in order to solve the loop
Referring to the problem, the use of the tag-clear method, the tag-purge method brings additional operations to the total memory in the system actually
The total number of blocks is correlated, and the more chunks of memory that need to be reclaimed, the more additional operations the garbage check brings, in order to improve garbage collection
, using the "space-for-time strategy," which uses a generational mechanism, reduces the amount of memory that is not recycled for a long time
garbage collection efficiency.

First look at Python's Memory management architecture:

Layer 3:object-specific Memory (int/dict/list/2:python's Object allocatorPython Implements and maintains an interface (Pyobject_new/del) that implements the creation/destruction of Python objects, involving objectparameters/1:python's raw memory allocator ( Pymem_ API)Python implements and maintains a layer NO. 0 memory management interface, which provides a unified raw memory management Interface Encapsulation Reason: Different operating system C behavior is not necessarily consistent, ensure portability, same semantics of the same behavior layer 0:unde RlyingGeneral-Purpose allocator (EX:C library malloc) operating system provides a memory management interface that is implemented and managed by the operating system, and Python cannot interfere with this layer of behavior
Reference counting mechanism

Reference counting is a garbage collection mechanism, and it is also the most intuitive and simplest garbage-collected technology. When a reference to an object is created or copied, the reference count of the object is added to 1, and the reference count of an object is destroyed by 1. If the object's reference count is reduced to 0, it means that the object is no longer being used by anyone and can
The memory that is occupied.
Advantages of the reference counting mechanism: real-time, for any memory once there is no reference to it, it will be immediately recycled (it needs to meet the threshold)
Disadvantages of the reference counting mechanism: the additional operations that the reference counting mechanism brings to maintain reference counts and the memory allocations and releases that are run in Python, referencing the assigned
The number of times is proportional, in order to match the reference counting mechanism, in the allocation and release of memory to achieve the highest efficiency, Python designed a large number of
Memory pooling mechanism to reduce the operation of malloc and free during operation.

 from Import Getrefcount>>> a = [C/a]>>> getrefcount (a)2>>> B =a >>> Getrefcount (a)3>>>
Tag-purge mechanism

A fatal weakness of the reference counting mechanism is that there may be a circular reference problem:
The reference count for a set of objects is not 0, but these objects are not actually referenced by any external variables, they are just references to each other, which means that this does not
Some people use this set of objects, they should reclaim the memory of these objects, and then because of the existence of mutual references, each object's reference count is not 0, so these objects
The memory that is occupied is never recycled.
The tag-purge mechanism is intended to solve the problem of circular references. First, only container objects will produce circular references, so-called container objects are internal
Objects that can hold references to other objects, such as list, Dict, class, and so on, such as Pyintobject, Pystringobject, are not likely to produce circular references.
So when the Python garbage collection mechanism runs, only those container objects need to be checked, and in order to keep track of each container, these objects need to be organized into a collection.
Python uses a doubly linked list, so the container object is inserted into the list after it is created. This list is also known as a list of objects that can be collected.

In order to solve the problem of circular reference, the concept of valid reference count is proposed, that is, the two object reference count of circular reference is not 0, in fact a valid reference count is 0
Assuming that two objects are a, B, we start with a, because it has a reference to B, the reference count of B is reduced by 1, and then the reference is reached B, because B has a reference to a.
The reference to A is also reduced by 1, so that the loop-referenced inter-object loop extraction is completed. However, this directly modifies the actual reference count, and there may be an issue with dangling references.
Therefore, the method of modifying the count count copy is adopted.
The only function of this count copy is to look for the root object collection (objects in the collection cannot be recycled). When the root object collection is successfully found,
We can start with the root object, along the chain of references, one after the other to mark the memory can not be recycled. First, the current memory list is divided into
A linked list maintains the root object collection, becomes the root list, and the other list maintains the remaining objects and becomes the unreachable linked list. The reason why you want to split the two linked list
is based on the consideration that unreachable may present objects that are directly or indirectly referenced by objects in the root list, which cannot be recycled.
Once such an object is found in the markup process, it is moved from the unreachable list to the root list, and when the tag is complete, the unreachable list remains
All objects are literally garbage objects, and the next garbage collection is only limited to the unreachable list.

Generational recycling

The idea of generational recycling: dividing all memory blocks in a system into different collections based on their survival time, each of which is called a "generation"
The frequency of garbage collection decreases with the increase of "generation" survival time, that is, the longer the living object, the more likely it is not garbage, it should
The less to collect. When a generation of objects has gone through garbage collection and survived, it is grouped into the next generation.
There are a total of three "generations" in Python, each of which is actually a list of the collection objects mentioned above. The following array is used for generational
Three "generations" of garbage collection.

# define Num_generations 3 # define Gen_head (n) (&generations[n].head)// Three generations are put into this array /* Linked lists of Container objects * / = {/* pygc_head, Threshold, Count*/0},//700 container, more than immediately trigger garbage collection mechanism {{{Gen_head ( 1), Gen_head (1), 0}, ten, 0},// 10 {{{Gen_head (2), Gen_head (2), 0}}, ten, 0},//*_pygc_generation0 = G En_head (0);

There are three thresholds, namely 700,10,10
Thresholds can be obtained through the Get_threshold () method:

Import GC Print ( Gc.get_threshold ()) (

The first of these thresholds indicates that the NO. 0 generation linked list can hold up to 700 container objects, exceeding this limit, and will immediately start the garbage collection mechanism.

The next two threshold 10 is a generational relationship, that is, every 10 times 0 generations of garbage collection, will be combined with 1 generations of garbage collection, and every 1 times 10 generations of garbage collection,
There will be 2 generations of garbage collection 1 times. That is the embodiment of space change time.


the process of garbage collection:
--When allocating memory, the threshold value (No. 0 generation of container) is detected, triggering garbage collection
Put together all the collected object lists (will merge the linked list of the "generation" younger "generations" of the current processing into the current "generation")
--Calculate a valid reference count
--count equals 0 and greater than 2 collections based on valid reference count
-Object with a reference count greater than 0 into the next generation
-an object with a reference count equal to 0, performing a recycle
--Recycle the elements within the container, minus the reference count of the corresponding element (breaking the circular reference)
--Python underlying memory management mechanism reclaims memory

Reference Documentation:
Http://www.cnblogs.com/vamei/p/3232088.html
http://python.jobbole.com/83548/
http://python.jobbole.com/82061/
Python Source code anatomy

Python garbage collection mechanism

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.