Python memory management mechanism

Source: Internet
Author: User

As the saying goes, come out mixed sooner or later to also, Python also has a lot of knowledge points did not summarize into a blog, owes too much, first part of it

1. Memory usage of Python objects

Memory management is an important aspect of language design. It is an important factor in determining language performance. Whether it is the manual management of C language or the garbage collection of Java, it becomes the most important feature of language.

Python believes that everything is an object, in the use of objects need to be memory management, in short, the use of objects need to borrow system resources, to allocate memory for objects, after use, also need to release the borrowed system resources (to prevent memory leaks , when an object has no need to reuse should be recycled , another object that is in use holds its reference so that it cannot be recycled, which causes the object that should be reclaimed to be recycled and stuck in the heap memory, which creates a memory leak. For Python programmers, Python's interpreter takes on a complex task of memory management, so Python programmers do not have to care about memory management, but it is necessary to understand the memory management mechanism of Python;

1) Reference counting mechanism

Python uses a reference counting mechanism to manage memory;

Python believes that everything is an object, and that its core is a struct:PyObject

int*ob_type;} Pyobject;

Pyobject are the necessary content for each object, which ob_refcnt is the reference count. When an object has a new reference, it will ob_refcnt increase, and when the object referencing it is deleted, it ob_refcnt will be reduced

#define Py_incref (OP)   // increase count #define// Decrease Count    if 0 )         ;      Else          *) (OP))

When the reference count is 0 o'clock, the object's life is over.

Take the common assignment statement as an example:

' Hello '

When the Python interpreter executes to this statement, it first creates a string object ' Hello ' and assigns the object's reference to A; In Python, there is an internal tracking variable that records the number of references to all objects in use, a variable called a reference count;

by Getrefcount () in the SYS package, you can view a reference count for an object;

1>>>ImportSYS2>>>3>>> A ='Hello'4>>> Sys.getrefcount ('Hello')536>>>Sys.getrefcount (a)728>>>9>>> B ='Hello'Ten>>> Sys.getrefcount ('Hello') One4 A>>>Sys.getrefcount (a) -3 ->>> the>>> C =b ->>> Sys.getrefcount ('Hello') -5 ->>>Sys.getrefcount (a) +4 ->>>

In the previous example, the 第3-6 line. The reference count for ' Hello ' is 3, first ' Hello ' is created with a reference count of 1, then the reference is assigned to a, the reference count plus 1, to 2, and then the reference count is added 1 to 3 when the reference count is viewed through the Getrefcount () function;

The last sentence is wrong, because:

N. Import sys>>> sys.getrefcount ('winter')3>>>

It is unclear why the reference count of the start ' Winter ' is 3, and to be understood later, you can refer to

Fun with Python ' s Sys.getrefcount ()

Similarly, a reference count is 1 after an assignment statement, and then the reference count is added 1 to 2 when the reference count is viewed through the Getrefcount () function;

Then the 第9-13 and 第15-19 lines are all explanations of what will cause the increase in the reference count.

2) Case of increased reference count
    • object is created: x = 3.14
    • Additional aliases are created: y = x
    • is passed as a parameter to the function (new Local reference): Foobar (x)
    • becomes an element of the container object: MyList = [123, X, ' XYZ ']
3) Reduction in reference count:
    • A local reference has left its scope. At the end of the Foobar () function
    • An alias for the object was explicitly destroyed: Del y
    • An alias of an object is assigned to another object: x = 123
    • Object removed from a Window object: Mylist.remove (x)
    • The Window object itself is destroyed: Del myList

Where del xxx will have two results, for example:

>>>ImportSYS>>>>>> A ='Hello'>>> B =a>>>>>> Sys.getrefcount ('Hello')4>>>Sys.getrefcount (a)3>>>Sys.getrefcount (b)3>>>>>>dela>>>>>> Sys.getrefcount ('Hello')3>>>Sys.getrefcount (b)2>>>Sys.getrefcount (a) Traceback (most recent): File"<stdin>", Line 1,inch<module>Nameerror:name'a'  is  notdefined>>>

The result: A is removed from the current namespace, and the reference count of B and ' Hello ' is reduced by 1

2. Garbage collection mechanism

Eating too much is always getting fatter, so does python. As the objects in Python become more and more, they will occupy more and more memory. But you don't have to worry too much about Python's figure, it would be nice to "lose weight" at the right time, start garbage collection (garbage collection), and erase useless objects . There are garbage collection mechanisms in many languages, such as Java and Ruby. Although the ultimate goal is to create a slim reminder, there are a lot of differences in weight loss programs in different languages, and Java, see Java Memory management and garbage collection

Garbage collection in Python is based on reference counting, which is supplemented by tag-purge and generational collection.

The GC module of Python mainly uses "reference count" (reference counting) to track and recycle garbage. On the basis of the reference count, you can also solve the problem of circular references that can be generated by the container object through mark-clear (Mark and sweep). Further increase the efficiency of garbage collection by "generational recycling" (generation collection) for space Exchange time.

1) reference count pros and cons

When a reference to an object is created or copied, the object's reference count is added to 1, and when the reference to an object is destroyed, the object's reference count is reduced by 1, and when the object's reference count is reduced to 0 o'clock, it means that the object has not been used by anyone and frees up the memory it occupies.

Advantages:

Although the reference count must include the action of managing the reference count each time the memory is allocated and freed, the reference count has one of the biggest advantages over other mainstream garbage collection techniques, namely, "real-time", any memory that is immediately recycled once it has no references to it. Other garbage collection counts must be under some special conditions (such as memory allocation failure) in order to reclaim invalid memory.

Disadvantages:

The additional operation of the reference counting mechanism to maintain reference counts is proportional to the amount of memory allocations and releases that are made in the Python run, and the number of times that the reference is assigned. This is a weakness compared to other mainstream garbage collection mechanisms, such as "mark-clear" and "stop-copy", because the additional operations that these technologies bring are essentially related to the amount of memory to be recycled.

if execution efficiency is only a weakness of the reference counting mechanism, then unfortunately, there is a fatal weakness in the reference counting mechanism , and it is precisely because of this weakness that the chivalrous garbage collection has never included the reference count, which can trigger this fatal weakness Circular references (also called cross-references).

2) Circular Reference

Container objects that contain references to other objects (for example: list,set,dict,class,instance) can produce circular references.

First, let's look at a small example:

1>>>2>>>ImportSYS3>>>4>>> A = []5>>> B = []6>>>A.append (b)7>>>8>>>ID (b)941589704Ten>>> One>>>Sys.getrefcount (a) A2 ->>>Sys.getrefcount (b) -3 the>>>delb ->>> ->>> ->>>Sys.getrefcount (a) +2 ->>>ID (a[0]) +41589704 A>>>

In the example above, we see that after A.append (b), the reference count of B becomes 2 (line 14th, which is 3, actually 2, because getrefcount increases the reference count); After we del B, the reference count of B-1, to 1 (actually should be a B reference [] reference count); To make sure our theory is correct, we compare the ID value to show that the [] is actually occupying the occupied memory space (lines 8th and 21st); This object is accessible because it is still referenced in a, and if the last del A, the reference count of the [] reference of B will be reduced by 1. becomes 0, it will be recycled;

So, let's look at a small example of a reference count:

1>>>ImportSYS2>>>3>>> A = []4>>> B = []5>>>A.append (b)6>>>B.append (a)7>>>8>>>Sys.getrefcount (a)93Ten>>>Sys.getrefcount (b) One3 A>>> ->>>dela ->>>delb the>>>

In the above example, after line 6th, you can see that the reference count of A and B are all 2 (ignoring the added reference count of Getrefcount), then the reference count of A and B is 1, not 0 after del A and Del B, so the reference counting mechanism cannot be recycled, resulting in a memory leak;

The circular reference problem is resolved through the "mark-Clear" Method:

3) Mark-Clear

"Mark-Clear" is to resolve the issue of circular references. A container object that can contain references to other objects (for example: list,set,dict,class,instance) can produce circular references.
We must admit the fact that if two objects have a reference count of 1, but there is only a circular reference between them, then both objects need to be reclaimed, that is, their reference count is not 0, but actually valid reference count is 0. We have to take the circular reference off first, then the effective count of the two objects appears. Suppose that two objects are a, B, we start with a, because it has a reference to B, then the reference count of B minus 1, and then the reference to B, because B has a reference to a, also the reference to a is reduced by 1, so that the loop-referenced object is completed loop extraction.
However, there is a problem, assuming that object A has an object reference C, and C does not reference A, if the C count reference minus 1, and finally a is not recycled, obviously we mistakenly subtract the reference count of C by 1, which will result in a dangling reference to C at some point in the future. This requires that we have to recover the reference count of C without being deleted, and the complexity of maintaining the reference count will multiply if you adopt such a scenario.

Principle: "Mark-clear" takes a better approach, we do not change the actual reference count, but instead copy a copy of the reference count of the object in the collection, altering the copy of the object reference. Any changes made to the copy do not affect the maintenance of the object's life.
The only function of this count copy is to look for the root object collection (objects in the collection cannot be recycled). When the root object collection is successfully found, the current memory list is divided into two, a linked list maintains the root object collection, becomes the root list, and the other list maintains the remaining objects and becomes the unreachable linked list. The reason for the two linked list is based on a consideration: Now unreachable may exist in the root linked list of objects, directly or indirectly referenced objects, these objects can not be recycled, once in the process of tagging, the discovery of such objects, Move it from the unreachable linked list to the root list, and when the tag is complete, all the remaining objects in the unreachable list are real garbage objects, and the next garbage collection is only limited to the unreachable linked list.

4) Generation of recycling

Background: Generational garbage collection is a garbage collection mechanism developed in the early 80, and a series of studies have shown that no matter which language is developed, regardless of the type of development, and the size of the program, there is the same point. That is: a certain proportion of memory block life cycle is relatively short, usually millions of machine instruction time, and the remaining memory block, the survival period is longer, even from the beginning of the program until the end of the program.
From the garbage collection mechanism of "mark-clear" above, the additional operations brought by this garbage collection mechanism are actually related to the total number of memory blocks in the system, and the more memory chunks that need to be reclaimed, the more the additional operations are brought by garbage detection, and the less the additional operations are caused by garbage collections; When the amount of memory that needs to be reclaimed is less, garbage detection will result in less additional operations than garbage collection . In order to improve the efficiency of garbage collection, the use of "space-time strategy."

Principle: all memory blocks in the system are divided into different sets according to their survival time, each set becomes a "generation", and the frequency of garbage collection decreases with the increase of the survival time of "generation". In other words, the longer the object, the less likely it is to be garbage, the less frequent the garbage collection. So how to measure this survival time: usually measured by several garbage collection actions, if an object passes more garbage collection, it can be concluded that the object will survive longer.

To illustrate:

When some memory blocks M has survived after 3 garbage collection cleaning, we row the memory block m into a set a, and the newly allocated memory is divided into set B. When garbage collection begins to work, most cases are garbage collected only for collection B, and collection A is garbage collected for quite a long time, which makes the garbage collection mechanism need to deal with less memory, the efficiency naturally increased. In this process, some blocks of memory in set B are transferred to set a because of their long lifetime, and of course there is some garbage in the collection A, which is delayed because of this generational mechanism.
In Python, there are a total of 3 "generations", that is, Python actually maintains 3 linked lists. Detailed knowledge of Python source code can be viewed.

In Python, the method of generational collection is used. The object is divided into three generations, the beginning, the object at the time of creation, placed in a generation, if in a generation of garbage inspection, the change to survive, will be placed in the second generation, the same time in a second generation of garbage inspection, the object survived, will be placed in three generations.

The GC module will have a counter with a 3-length list, which can be gc.get_count() obtained.
For example (488,3,0) , 488 it refers to the last generation of garbage checks, the number of Python allocated memory minus the number of freed memory, and the memory allocation, not the increase in the reference count . For example:

3Refers to the last second generation of garbage inspection, a generation of garbage inspection, the same way, refers to the 0 last three generations of garbage inspection, the second generation of garbage check the number of times.

GC modulo has a threshold for automatic garbage collection, that is, a gc.get_threshold tuple of length 3 obtained through a function, such as(700,10,10)
Each time the counter increases, the GC module checks to see if the added count reaches the threshold number, and if it does, it executes the corresponding algebraic garbage check and resets the counter
For example, suppose the threshold value is (700,10,10) :

      • When the counter is incremented, the (699,3,0) (700,3,0) GC module executes gc.collect(0) , checking the garbage of the generation object and resetting the counter to(0,4,0)
      • When the counter is incremented, the (699,9,0) (700,9,0) GC module executes gc.collect(1) , checking the garbage of one or two-generation objects and resetting the counter to(0,0,1)
      • When the counter is (699,9,9) increased (700,9,9) , the GC module executes gc.collect(2) , which is to check the garbage of one or two, three generations of objects, and reset the counter to(0,0,0)

Not to be continued ... Extended GC Module

Python memory management mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.