Python's garbage collection mechanism

Source: Internet
Author: User
Tags class definition prev

In general,Python's garbage collection mechanism is based on the reference count, and in order to solve the problem of circular references and improve efficiency, take the tag-purge and generational collection as a secondary approach.

1. Reference counting

The core idea of reference counting is that each object has a counter ob_refcnt to mark the number of times the object has been referenced (for example, 1 variables refer to 1, 2 variables refer to 2). When the number of references is 0 o'clock, the system reclaims the object immediately.

Advantages:

    1. Simple

    2. Real-time sex

Disadvantages:

    1. Maintain reference count consumption resources

    2. Circular references

Detailed Description:

1.1 Principle of reference counting

Let's start by describing how Python objects get memory space.

Here's a simple example

Class Node (object):     def __init__ (self,val):         == Node ("ABC")

When the object is created, Python immediately requests memory from the operating system. (Python actually implements a set of its own memory allocation system, which provides an abstraction layer on top of the operating system heap.) For the sake of simplicity, do not expand said)

Let's say we've created three Python node objects:

Internally, when an object is created, Python always holds an integer in the object's C struct, called the reference number . When an object has just been created, Python sets this value to 1:

A value of 1 indicates that there is one pointer pointing to or referencing the three objects. If we now create a new node instance, JKL:

As before, Python sets the JKL reference number to 1. However, please note that since we changed the N1 point to JKL, we no longer point to Abc,python to set the reference number of ABC to 0.
At the moment, the Python garbage collector immediately stands up! Whenever the object's reference count is reduced to 0,python, it is released immediately, and the memory is returned to the operating system:

Above Python recycles the memory used by the ABC node instance. This garbage collection algorithm for Python is called reference counting . It was invented by George-collins in 1960. As you can imagine, a roommate with mild OCD ( an Obsessive compulsive disorder ) keeps cleaning up behind you, and when you drop a dirty dish or cup, a guy is ready to put it in the dishwasher!

Let's look at the second example. Join us to let N2 quote N1:

The number of references to the DEF on the left has been reduced by python, and the garbage collector reclaims the Def instance immediately. At the same time, the number of references to JKL has changed to 2 because both N1 and N2 point to it.

1.2 Circular References

For reference counting this algorithm, if a data structure refers to itself, that is, if the data structure is a circular data structure, then some of the value of the reference meter will definitely not become zero. To better understand the problem, let's give an example. The following code shows the node class that we used earlier:

We have a constructor (called Init in Python) that stores a separate property in an instance variable. After the class definition we create two nodes, ABC and DEF, in the diagram as the left rectangle. The reference count for two nodes is initialized to 1 because each of the two references points to each node (N1 and N2).

Now, let's define two additional attributes in a node, next and Prev

We set the N1.next to point to N2, and set N2.prev back N1. Now, our two nodes use circular referencing to form a double-ended linked list. Also note that the reference count values for ABC and DEF have increased to 2. Here are two pointers to each node: first, N1 and N2, followed by Next and Prev.

Now, assuming our program no longer uses these two nodes, we set both N1 and N2 to null (none in Python).

Well, Python will reduce the reference count of each node to 1 as usual. But neither of those objects has an external reference. In other words, our programs no longer use these node objects, so we want Python's garbage collection mechanism to be smart enough to release these objects and reclaim the memory space they occupy. However, this is not possible because all reference counts are 1 instead of 0. The Python reference counting algorithm cannot handle objects that point to one another.

2. Mark-Clear

Mark-Clear focuses only on those objects that may produce circular references , and it is clear that immutable objects such as Pyintobject and Pystringobject are not likely to produce circular references because they cannot hold references to other objects within them. Circular references in Python always occur between container objects, which are objects that can hold other objects internally, such as List, Dict, class, and so on. This also causes the cost of the method to depend on the number of container objects. In addition, mark-Erase only focuses on newly created objects, and traces of objects that are freed because the reference count is zero

Python loops through each object on the list, checks each object referenced in the list, removes a copy of its reference count according to the rule, and then recycles according to reachable and unreachable rules. In fact, the rule is a little complicated:

Essentially: These container objects are formed into a large collection, detecting that the container object in this set is not referenced by other container objects, and if so, the direct count-1. Then the problem comes, adding a to the reference to the B,B without referring to a. According to our Rules, B's ob_refcnt will become 0 and then be deleted, but in fact A and B are not circular references. So if you follow this logic directly, an error occurs: A reference to B becomes a dangling reference.

So Python adds a clever design here :

Step1: Copy the ob_refcnt of the object so that there is a copy of the ob_refcnt. This allows the ob_refcnt of B in the above example to become 0, thus avoiding the appearance of dangling references.

Step2: The collection of objects is divided into two subcollections based on whether the value of the copy ob_refcnt is 0: reachable and unreachable. Reachable a copy of an object in the collection ob_refcnt is not ob_refcnt to 0 in the 0,unreachable collection, then unreachable is the object that can be reclaimed. Back to the example, a belongs to reachable,b belonging to unreachable.

Step3: Examine the objects in the reachable collection, and if there are objects referencing objects in the unreachable collection, place the object in the unreachable collection in the Reachable collection. Back in the example, because a references B, B is adjusted from unreachable to reachable.

STEP4: Check whether any new objects in the Step3 have been adjusted from unreachable to reachable; if so, repeat step3 (consider the example of a reference b,b reference C, you know what STEP4 means)

STEP5: Clears the objects in the Unreachable collection.

Cons : The process of marking and purging is inefficient.

Obviously, tag-purge is very draining of system resources, so the next step is to introduce the method of generational collection.

3. Collection of Generations

The source of this algorithm is derived from the weakly-generational hypothesis (weak generational hypothesis). The hypothesis is made up of two points: first of all, the object of the year's pro usually dies fast, while the old object is likely to survive for longer periods of time.

In other words: Because the old object has survived for a long time, is the possibility of garbage object is smaller, so the detection frequency can be reduced. The new object is the probability of a garbage object, so the detection frequency is a little higher than the line.

The idea of generational collection is that all memory blocks in the system are divided into different sets according to their survival time, each set becomes a "generation", and the garbage collection frequency decreases with the increase of the survival time of "generation", and the survival time is usually measured by several garbage collections.

Python By default defines a collection of three generations of objects, the larger the number of indexes, the longer the object survives.

The Python collection mechanism triggers the 0-generation algorithm mentioned above, releasing "floating garbage" and moving the remaining objects to a generation list.

Over time, the objects used by the program are gradually moved from the zero-generation list to the generation list. Python's handling of objects in a generation list follows the same approach, and Python moves the remaining active objects to the second-generation list once the assigned count value and the released count value accumulate to a certain threshold.

In this way, the objects that your code uses for the long term, the active objects that your code continues to access, are transferred from the zero-generation list to the generation and then to the second generation. With different threshold settings, Python can process these objects at different intervals. Python handles the most frequent 0 generations, followed by a generation and then the second generation.

Reference documents:

Python garbage collection mechanism detailedin-depth analysis of Python's garbage collection mechanism http://python.jobbole.com/82061/

Python's garbage collection mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.