Python memory management

Source: Internet
Author: User
Tags throw exception

Python memory management is implemented by reference counting. When the object's reference count is 0 o'clock, it is reclaimed by the GC.

To explore the storage of objects in memory, we can resort to Python's built-in function ID (). It is used to return the identity of an object. In fact, the so-called identity is the memory address of the object. Determining whether the memory addresses of objects A and B are consistent (not the same as the values of A and B) can be judged by IS. such as a= "good", b= "good", print (A is B)//true.

A=1

B=1

Print (A is B)//true

A= "Good"

B= "Good"

Print (A is B)//true

A= "It's a very good day"

B= "It's a very good day"

Print (A is B)//false

A=[]

B=[]

Print (A is B)//false

Because Python caches integers and short strings, each object has only one copy. For example, all integer 1 references point to the same object. Even if an assignment statement is used, only new references are created, not the object itself. Long strings and other objects can have multiple identical objects, and you can use an assignment statement to create a new object.

In Python, each object has a total number of references to that object, that is, the reference count (reference count). We can use the Getrefcount () in the SYS package to see the reference count of an object. It is important to note that when a reference is passed to Getrefcount () as a parameter, the parameter actually creates a temporary reference. As a result, Getrefcount () will get 1 more results than expected.

From sys import Getrefcount

A = [1, 2, 3]

Print (Getrefcount (a))//2

b = A

Print (Getrefcount (b))//3

When an object A is referenced by another object B, the reference count of a increases by 1.

From sys import Getrefcount

A = [1, 2, 3]

Print (Getrefcount (a))//2

b = [A, a]

Print (Getrefcount (a))//4

A reference to a container object can constitute a complex topology. We can use the Objgraph package to draw its referential relationships, such as:

x = [1, 2, 3]

y = [x, Dict (key1=x)]

z = [y, (x, y)]

Import Objgraph

Objgraph.show_refs ([z], filename= ' ref_topo.png ')

Objgraph is a third-party package for Python. You need to install Xdot before installing.

sudo apt-get install Xdot

sudo pip install Objgraph

Two objects may be referenced to each other, thus constituting a so-called reference ring (reference cycle). Such as:

A = []

b = [A]

A.append (b)

Even an object, you just have to refer to yourself, you can form a reference ring.

A = []

A.append (a)

Print (Getrefcount (a))//3

The reference count for an object may be reduced. For example, you can use the DEL keyword to delete a reference:

From sys import Getrefcount

A = [1, 2, 3]

b = A

Print (Getrefcount (b))//3

Del A

Print (Getrefcount (b))//2

a=[1,2,3]

Print (Getrefcount (a))//2

B=[a,a]

Print (Getrefcount (a))//4

Print (Getrefcount (b))//2

If a reference points to object A, the reference count of object A is reduced when the reference is redirected to a different object B:

From sys import Getrefcount

A = [1, 2, 3]

b = A

Print (Getrefcount (b))//3

A = 1

Print (Getrefcount (b))//2

Eating too much is always getting fatter, so does python. As the objects in Python become more and more, they will occupy more and more memory. But you don't have to worry too much about Python's figure, it would be nice to "lose weight" at the right time, start garbage collection (garbage collection), and erase useless objects. There are garbage collection mechanisms in many languages, such as Java and Ruby. Although the ultimate goal is to create a slim reminder, there is a big difference in weight loss programs in different languages.

On the basic principle, when the reference count of an object in Python falls to 0 o'clock, it means that there is no reference to the object, and the object becomes garbage to be reclaimed. For example, if a new object is assigned to a reference, the reference count of the object becomes 1. If the reference is deleted and the reference count of the object is 0, then the object can be garbage collected. For example, the following table:

A = [1, 2, 3]

Del A

After del A, no references have been made to this table previously established [1, 2, 3]. It is impossible for a user to touch or use this object in any way. If the object stays in memory, it becomes unhealthy fat. When garbage collection starts, Python scans the object that has a reference count of 0, emptying the memory it occupies.

Python cannot perform other tasks. Frequent garbage collection will greatly reduce python productivity. If there are not many objects in memory, it is not necessary to always start garbage collection. Therefore, Python automatically starts garbage collection only under certain conditions. When Python runs, it records the number of times it allocates objects (object allocation) and Unassigned Objects (object deallocation). Garbage collection starts when the difference between the two values is higher than a certain threshold.

We can view this threshold by using the Get_threshold () method of the GC module:

Import GC

Print (Gc.get_threshold ())//Return (700, 10, 10)

Return (700, 10, 10), followed by two 10 is the threshold associated with generational collection, which can be seen later. 700 is the threshold at which garbage collection starts. Can be reset by the Set_threshold () method in the GC.

In addition to automatic garbage collection, you can manually start a garbage collection, that is, using Gc.collect (). The next two 10 are generational collection-related thresholds, what is generational recycling? Python uses a strategy of generational recycling. The basic assumption of this strategy is that the longer the surviving object, the less likely it is to become garbage in later programs. Our programs tend to produce a large number of objects, many objects quickly produce and disappear, but there are also some objects that are used for a long time. For the sake of trust and efficiency, we believe in the usefulness of these "longevity" objects, so we reduce the frequency of scanning them in garbage collection.

Python divides all the objects into 0,1,2 three generations. All new objects are 0-generation objects. When a generation of objects has gone through garbage collection and is still alive, it is grouped into next-generation objects. When garbage collection starts, all 0-generation objects are scanned. If 0 generations have been garbage collected for a certain number of times, then scan cleanup for 0 generation and 1 generation is initiated. When 1 generations have experienced a certain amount of garbage collection, the 0,1,2 is started, that is, all objects are scanned.

These two times are returned by the above Get_threshold () (700, 10, 10) to return two 10. That is, every 10 times 0 generations of garbage collection, will be combined with 1 1 generations of garbage collection, and every 10 times 1 generations of garbage collection, there will be 1 times of the 2 generation garbage collection.

You can also use Set_threshold () to make adjustments, such as more frequent scanning of 2 generations of objects.

Import GC

Gc.set_threshold (700, 10, 5)

The existence of a reference ring can be very difficult for the garbage collection mechanism above. These reference rings may constitute some objects that cannot be used, but the reference count is not 0.

A = []

b = [A]

A.append (b)

Del A

Del b

We first created two table objects and referenced each other to form a reference ring. After you delete a A, b reference, it is not possible for these two objects to be called again from the program. However, because of the existence of the reference ring, the reference count of both objects has not dropped to 0 and will not be garbage collected.

To reclaim such a reference ring, Python copies the reference count for each object, which can be remembered as gc_ref. Suppose, for each object I, the count is gc_ref_i. Python iterates through all of the objects I. For object J referenced by each object I, subtract the corresponding Gc_ref_j by 1. After the end of the traversal, GC_REF objects that are not 0, and objects referenced by those objects, as well as the objects that continue to be referenced further downstream, need to be preserved. Other objects are garbage collected.

(1) For each container object, set a Gc_refs value and initialize it to the reference count value of the object.

(2) For each container object, find all of its referenced objects, minus 1 of the gc_refs value of the referenced object.

(3) After performing step 2, all objects with a value of gc_refs greater than 0 are referenced by non-container objects. At least one non-circular reference exists. Therefore, these objects cannot be freed. Put them in another collection.

(4) objects that cannot be disposed of in step 3, and if they refer to an object, the referenced object cannot be freed. Therefore, these objects are also placed in another collection.

(5) The objects that are left are unreachable objects and can now be freed.

Python is a dynamic type of language whose objects and references are detached. This is very different from the previous process-oriented language. To effectively free up memory, Python has built-in garbage collection support. Python takes a relatively simple garbage collection mechanism, the reference count, and therefore needs to address the problem of isolating the reference ring. Python is both common and special in other languages. Understanding this memory management mechanism is an important step in improving Python performance.

GC module is the interface module of the Python garbage collection mechanism, which can be used to start and stop garbage collection, adjust the threshold value of collection triggering, and set debugging options.

If garbage collection is not disabled, there are two cases of memory leaks in Python: either objects are referenced by objects with longer lifecycles, such as global scope objects, or there are __del__ in circular references

Garbage collection is time-consuming, so scenarios that are sensitive to performance and memory are unacceptable, and garbage collection can be disabled if a circular reference can be lifted.

The debug option of the GC module makes it easy to locate circular references, either by manual release or by using Weakref.

In Python, everything is an object and is divided into mutable and immutable objects. The difference between the two criteria is whether it can be modified in situ, "in situ" can be understood as the same address. The "address" of an object can be viewed by ID (), if the value of the object is modified through a variable, but the ID does not change, then it is mutable, otherwise it is immutable.

Determine whether two variables are equal (the same value) using = =, and determine whether two variables point to the same object using is. For example, the following A1 A2 these two variables point to an empty list with the same values but not the same object.

>>> A1, a2 = [], []

>>> A1 = = A2

True

>>> A1 is A2

False

Python has its own memory management mechanism to avoid frequent applications, freeing up memory, and avoiding the construction of large-use small objects.

Python will have its own memory buffer pool (LAYER2) and an object buffer pool (LAYER3). The program that runs the Python server on Linux knows that Python does not immediately return the freed memory to the operating system, which is why the memory buffer pool. For objects that are likely to be used frequently and are immutable , such as smaller integers, shorter strings, Python is cached in layer3 to avoid frequent creation and destruction.

A = 1

Print (Getrefcount (a))//601

The reference count information from Object 1 can also be seen, Python's object buffer pool caches very common immutable objects, such as the integer 1 here.

What is a circular reference is an object that directly or indirectly references itself, and the chain of reference forms a ring.

In Python, all objects that can reference other objects are called containers (container). Therefore, only the container can form a circular reference. The Python garbage collection mechanism takes advantage of this feature to look for objects that need to be freed. To record all of the container objects, Python links each container to a doubly linked list, using a doubly linked list for quick and easy insertion and deletion of objects in the container collection. With this doubly linked list that maintains all the container objects, Python uses the following steps to find the objects that need to be freed when it is garbage collected:

For each container object, set a Gc_refs value and initialize it to the reference count value for that object.

For each container object, find all of its referenced objects, minus 1 of the gc_refs value of the referenced object.

After you perform step 2, all objects with gc_refs values greater than 0 are referenced by non-container objects and at least one non-circular reference exists. Therefore, you cannot release these objects and put them in another collection.

Objects that cannot be disposed in step 3, if they refer to an object, are also not freed from the object being referenced, so they are also placed in another collection.

At this point, the remaining objects are unreachable objects. You can now release these objects.

About Generational Recycling:

In addition, Python divides all objects into 3 generations, from 0 to 2, according to the ' Time to live '. All newly created objects are assigned the No. 0 generation. When these objects continue to exist after a garbage collection, they are put into the 1th generation. If an object in the 1th generation is still in stock after a garbage collection, it is placed in the 2nd generation. The frequency of collection of Python for different generations of objects is not the same. can be defined by Gc.set_threshold (threshold0[, threshold1[, Threshold2]) . Python performs a garbage collection of No. 0 generation objects when the number of new objects in the Python garbage collector minus the number of deleted objects is greater than threshold0. Whenever the No. 0 generation is checked more than Threshold1, the 1th generation object is garbage collected. Similarly, when the 1th generation is checked more than Threshold2, the 2nd generation object is also executed for garbage collection.

Why generational, the root of the algorithm comes from weak generational hypothesis. The hypothesis is made up of two ideas: first of all, the objects of the year are usually dying quickly, for example, a large number of objects exist in the local scope, while the old objects are likely to survive longer, such as global objects, module, class.

The principle of garbage collection as indicated above, detailed can look at the Python source code, but in fact the garbage collector to consider __del__, weak references, and so on, will be slightly more complex.

When will the garbage collection be triggered, there are three scenarios:

The garbage collection threshold is reached and the Python virtual machine executes automatically

Call Gc.collect manually ()

When the Python virtual machine exits

For garbage collection, there are two very important terms, that is reachable and collectable(and of course, the corresponding unreachable and uncollectable), the latter will be a lot of mention.

Reachable is for Python objects, if you can find an object from the root set (root), then this object is reachable, and the opposite is unreachable, which is actually the object that exists only in the circular reference. Python's garbage collection is for unreachable objects.

And collectable is for the unreachable object, if this object can be recycled, then it is collectable, if not recycled, that is, the object in the circular reference defines __del__, then uncollectable. Python garbage collection does nothing to uncollectable objects, causing a de facto memory leak.

GC Module

The GC (garbage collector) Here is the Python standard library, which provides an interface that corresponds to the previous section, "Garbage collection" content. Through this module, you can switch GC, adjust the frequency of garbage collection, output debugging information. GC modules are the basis for many other modules (such as the Objgraph) package, where the GC's core API is introduced first.

Gc.enable (); Gc.disable (); Gc.isenabled ()

Turn on GC (enabled by default), turn off GC, and determine if GC is turned on

Gc.collection ()

Perform a garbage collection that can be used regardless of whether the GC is turned on

Gc.set_threshold (t0, T1, T2); Gc.get_threshold ()

Set the garbage collection threshold, and get the current garbage collection threshold value

Note: Gc.set_threshold (0) also has the effect of disabling GC

Gc.get_objects ()

Returns all objects managed by the garbage collector (collector). This function is very basic! As long as the Python interpreter is running, a large number of objects are collector managed, so the call to the function is time consuming!

For example, the command line starts Python

>>> Import GC

>>> Len (gc.get_objects ())

3749

Gc.get_referents (*obj)

Returns the object that the Obj object points directly to

Gc.get_referrers (*obj)

Returns all objects directly pointing to obj

Gc.set_debug (Flags)

Setting debug options is useful, and the common flag combination contains the following

Gc. Debug_colletable: Printing objects that can be reclaimed by the garbage collector

Gc. Debug_uncolletable: Print objects that cannot be reclaimed by the garbage collector, that is, objects that define __del__

Gc. Debug_saveall: When this option is set, the objects that can be pulled up are not actually destroyed (free), but are placed in the Gc.garbage list, which helps to find problems online

Memory leaks

Since Python manages memory through reference counting and garbage collection, what is the case for a memory leak? There are two types of situations:

The first is that the object is referenced by an object with a particularly long life cycle , such as a Web server, where there may be a global singleton ConnectionManager that manages all the connection connection, If connection is not being used theoretically, it is not removed from the ConnectionManager, which results in a memory leak.

The second is that the object in the circular reference defines the __DEL__ function , which is described in detail in the article "programmer-known Python Traps and defects List", in short, if you define the __DEL__ function, In circular references, the Python interpreter cannot judge the order of the destructors, so it does not do the processing.

In any environment, whether it is the server, the client, the memory leaks are very serious things.

If it is an online server, then there must be monitoring, if the memory usage is found to exceed the set threshold of the alarm immediately, early detection of a little also saved. Of course, no one wants to fix a memory leak online, which is undoubtedly to change wheels for a moving car, so try to find and resolve potential memory leaks in the development environment or stress test environment. Here, find the problem is the most critical, as long as the problem is found, solve the problem is very easy, because according to the previous statement, there are only two cases of memory leaks, in the first case, as long as the appropriate time to remove the reference, in the second case, either no longer use the __del__ function, A different implementation, or a circular reference is resolved.

So how do you find out where there is a memory leak? A weapon is two libraries: GC, Objgraph

The GC module has been introduced above, in theory, through GC module can get all the objects managed by garbage collector, also can know the reference between objects and referenced relationship, you can draw a complete reference diagram between the objects. But in fact it is more complicated, because in this process inadvertently will introduce a new reference relationship, so, have a good wheel directly use it, that is objgraph.

Objgraph

The implementation of Objgraph invokes these functions of the GC: Gc.get_objects (), gc.get_referents (), Gc.get_referers (), and then constructs a reference relationship between the objects. Objgraph's code and documentation are well written and recommended for first reading.

Here are a few very useful APIs

def count (TypeName)

Returns the number of objects of that type, in fact by Gc.get_objects () to the object used, and then counts the number of the specified type.

Def By_type (TypeName)

Returns a list of objects of that type. Online project, you can use this function to easily find a singleton object

def show_most_common_types (limits = 10)

This function is useful for printing up to the top N (Limits) objects of an instance. It is also mentioned in the article "Python Memory optimization" that the function can discover objects that can be memory-optimized with slots

def show_growth ()

This function is very useful for discovering potential memory leaks when counting the objects that have increased the most since the last call. Gc.collect () is called inside the function, so even circular references do not affect judgment.

Another more convenient approach is to use weak reference Weakref, which is a standard library provided by Python, designed to address circular references.

The Weakref module provides some of the following useful APIs:

(1) Weakref.ref (object, callback = None)

Creates a weak reference to object, the return value is the Weakref object, callback: When object is deleted, the callback function is called and there is a usage example in the standard library logging (__init__.py). Use () to dereference, if Referant has been deleted, then return none. such as the following example

Import Weakref

Class OBJ (object):

def f (self):

print ' HELLO '

if __name__ = = ' __main__ ':

o = OBJ ()

w = weakref.ref (o)

W (). F ()

Del o

W (). f ()//Throws an exception: Attributeerror: ' Nonetype ' object has no attribute ' F '. Because this time the referenced object has been deleted.

(2) Weakref.proxy (object, callback = None)

Create a proxy, the return value is a Weakproxy object, the callback function as above. Use the same time as object, if object has been deleted then throw exception referenceerror:weakly-referenced object no longer exists.

#-*-Coding:utf-8-*-

Import Weakref

Class OBJ (object):

def f (self):

print ' HELLO '

if __name__ = = ' __main__ ':

o = OBJ ()

w = Weakref.proxy (o)

W.F ()

Del o

W.F ()

(3) Weakref. WeakSet

This is a weak reference collection that is automatically removed from Weakset when the elements in the weakset are recycled. The implementation of Weakset uses the Weakref.ref, when the object joins the Weakset, using Weakref.ref encapsulation, the specified callback function is removed from the weakset. Interested in the words can be directly read the source code (_weakrefset.py), the following gives a reference example:

#-*-Coding:utf-8-*-

Import Weakref

Class OBJ (object):

def f (self):

print ' HELLO '

if __name__ = = ' __main__ ':

o = OBJ ()

WS = Weakref. WeakSet ()

Ws.add (o)

Print Len (WS) # 1

Del o

Print Len (WS) # 0

Python memory management

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.