Python Deep 06--python memory management detailed _python

Source: Internet
Author: User
Tags garbage collection memory usage in python

Memory management of language is an important aspect of language design. It is an important factor in determining language performance. Whether the manual management of C language, or Java garbage collection, are the most important features of language. This is an example of a dynamically typed, object-oriented language memory management method in Python.

object's Memory usage

Assignment statements are the most common features of a language. But even the simplest assignment statement can have a lot of meaning. Python's assignment statement is well worth studying.

A = 1

Integer 1 is an object. And A is a reference. Using an assignment statement, reference a points to object 1. Python is a dynamically typed language (reference dynamic type), and objects are separated from references. Python uses "Chopsticks" as a reference to touch and flip the real food-objects.

References and objects

To explore the storage of objects in memory, we can turn to Python's built-in function ID (). It is used to return the identity of an object (identity). In fact, the so-called identity here is the memory address of the object.

A = 1

print (ID (a))
print (Hex (ID (a)))

On my computer, they return the following:

11246696
' 0xab9c68 '

Decimal and hexadecimal representations of memory addresses, respectively.

In Python, both integers and short characters, Python caches these objects for reuse. When we create multiple references that are equal to 1, we actually have all these references pointing to the same object.

A = 1
b = 1

print (ID (a))
print (ID (b))

The above program returns

11246696
11246696

Visible A and B are actually two references that point to the same object.

To verify that two references point to the same object, we can use the IS keyword. is used to determine whether two references refer to the same object.

# true
a = 1
b = 1
print (A is B)

# true
a = ' good '
b = ' good '
print (A is b)

# False
A  = "very Good Morning"
B = "very Good Morning"
print (A is b)

# False
a = []
b = []
print (A is b

The above annotation is the corresponding running result. As you can see, because Python caches integers and short strings, each object has only one copy. For example, all the references to integer 1 point to the same object. Even with assignment statements, only new references are created, not the object itself. Long strings and other objects can have multiple identical objects, and you can use assignment statements to create new objects.

In Python, each object has a total number of references to that object, that is, the reference count (reference count).

We can use the Getrefcount () in the SYS package to see the reference count for an object. Note that when you pass to Getrefcount () with a reference as a parameter, the parameter actually creates a temporary reference. As a result, Getrefcount () will get 1 more results than expected.

From sys import Getrefcount

a = [1, 2, 3]
print (Getrefcount (a))

B = a
print (Getrefcount (b))

For the above reasons, two Getrefcount will return 2 and 3 instead of the expected 1 and 2.

Object Reference Object

Python's container object (container), such as tables, dictionaries, and so on, can contain multiple objects. In fact, the container object contains not the element object itself, but a reference to each element object.

We can also customize an object and refer to other objects:

Class From_obj (object):
  def __init__ (self, to_obj):
    self.to_obj = to_obj

b = [1,2,3]
a = From_obj (b)
Print (ID (a.to_obj))
print (ID (b))

As you can see, a refers to object B.

Object Reference object, is the most basic form of Python. Even a = 1 is an assignment that actually lets the element of a key value "a" of the dictionary refer to an Integer object 1. The Dictionary object is used to record all global references. The dictionary refers to an integer object 1. We can view the dictionary by using the built-in function globals ().

When an object A is referenced by another object B, the reference count of A is incremented by 1.

From sys import Getrefcount

a = [1, 2, 3]
print (Getrefcount (a))

B = [A, a]
print (Getrefcount (a))

The reference count of object B referencing two times a,a increased by 2.

A reference to a container object can constitute a complex topology. We can use objgraph packages to draw their reference relationships, such as

x = [1, 2, 3]
y = [x, Dict (key1=x)]
z = [y, (x, y)]

import objgraph
objgraph.show_refs ([z], filename= ' ref_ Topo.png ')

Objgraph is a third party package in Python. You need to install Xdot before installing.

sudo apt-get install xdot
sudo pip install objgraph

Two objects may be referenced to each other, thus constituting the so-called reference ring (reference cycle).

A = []
b = [A]
a.append (b)

Even an object can form a reference ring by simply referencing itself.

A = []
a.append (a)
print (Getrefcount (a))

The reference loop can cause a lot of trouble for the garbage collection mechanism, which I'll describe in more detail later.

Reference Reduction

The reference count of an object may be reduced. For example, you can use the DEL keyword to delete a reference:

From sys import Getrefcount

a = [1, 2, 3]
B = a
print (Getrefcount (b))

del a
print (Getrefcount b ))

Del can also be used to delete elements in a container element, such as:

A = [1,2,3]
del a[0]
print (a)

If a reference is to object A, the reference count for object A is reduced when the reference is redirected to a different object B:

From sys import Getrefcount

a = [1, 2, 3]
B = a
print (Getrefcount (b))

a = 1
print (Getrefcount (b))

Garbage collection

Eat too much, always get fat, so is python. As objects in Python become more and more, they will occupy more and more memory. But you don't have to worry too much about Python's shape, it will be clever to "lose weight" at the right time, start garbage collection (garbage collection), and eliminate the useless objects. There are garbage collection mechanisms in many languages, such as Java and Ruby. Although the ultimate goal is to shape a slim reminder, there are significant differences in the weight-loss programs of different languages (this can be compared to this article and Java Memory Management and garbage collection).

On the basic principle, when the reference count of a Python object drops to 0 o'clock, no reference is made to the object, and the object becomes the garbage that is being recycled. For example, a new object is assigned to a reference, and the reference count of the object becomes 1. If the reference is deleted and the object's reference count is 0, then the object can be garbage collected. For example, the following table:

A = [1, 2, 3]
del A

After del A, no references have been made to the table previously established [1, 2, 3]. Users cannot contact or use this object in any way. If this object stays in memory, it becomes unhealthy fat. When garbage collection is started, Python scans the object with the reference count of 0 to empty the memory it occupies.

However, losing weight is an expensive and laborious thing. Python cannot perform other tasks while garbage collection. Frequent garbage collection will greatly reduce the productivity of Python. If there are not many objects in memory, it is not necessary to always start garbage collection. As a result, Python automatically initiates garbage collection only under certain conditions. When Python runs, the number of times it is allocated objects (object allocation) and Unassigned Objects (object deallocation) is recorded. Garbage collection starts when the difference between the two is higher than a threshold value.

We can view this threshold through the Get_threshold () method of the GC module:

Import GC
print (Gc.get_threshold ())

Returns (700, 10, 10), and the following two 10 are the thresholds associated with the generational collection, which can be seen later. 700 is the threshold at which the garbage collection starts. Can be reset through the Set_threshold () method in the GC.

We can also manually start the garbage collection by using Gc.collect ().

Generational recycling

Python also employs the strategy of generational (generation) recycling. The basic assumption of this strategy is that the longer the object of survival, the less likely it is to become garbage in the subsequent program. Our programs tend to produce a lot of objects, many objects are quickly generated and disappear, but there are some objects that are used for a long time. For the sake of trust and efficiency, we believe in the usefulness of such "long-lived" objects, thus reducing the frequency with which they are scanned in garbage collection.

Python divides all objects into three generations of 0,1,2. All new objects are 0-generation objects. When a generation of objects experiences garbage collection and survives, it is classified as a next-generation object. When garbage collection starts, all 0 generations of objects are scanned. If the 0 generation is garbage collected for a certain number of times, the scan cleanup for generation 0 and 1 is initiated. When the 1 generation has also experienced a certain number of garbage collection, then the 0,1,2 is started, that is, all objects are scanned.

These two times are the two 10 (700, 10, 10) returned by the above Get_threshold (). That is, every 10 times 0 generations of garbage collection, will be combined with 1 times 1 generation of garbage collection, and every 10 generations of garbage collection, only 1 times in the 1 generation of garbage collection.

The same can be adjusted with set_threshold (), such as more frequent scans of 2-generation objects.

Import GC
Gc.set_threshold (700, 10, 5)

Isolated ring of references

The existence of a reference ring can make a lot of trouble for the garbage collection mechanism above. These reference loops may constitute unusable, but some objects with a reference count of not 0.

A = []
b = [A]
a.append (b)

del a
del b

We first created two table objects and referenced each other to form a reference ring. After the A,b reference has been deleted, the two objects cannot be invoked from the program, and there is no use. However, due to the existence of reference loops, the reference count of both objects is not reduced to 0 and is not garbage collected.

Isolated ring of references

To reclaim such a reference ring, Python copies the reference count for each object, which can be recorded as Gc_ref. Suppose, for each object I, the count is gc_ref_i. Python iterates through all of the object I. For each object I reference the object J, subtract the corresponding Gc_ref_j by 1.

The results after traversal

After the traversal is finished, objects that are not 0 gc_ref, and objects referenced by those objects, and objects that continue to be referenced more downstream, need to be preserved. Other objects are garbage collected.

Summarize

Python is a dynamically typed language whose objects and references are separated. This is very different from the process-oriented language. To effectively free up memory, Python has built-in garbage collection support. Python takes a relatively simple garbage collection mechanism, the reference count, and therefore needs to address the problem of orphaned reference loops.

Python and other languages have both commonality and special place. The understanding of this memory management mechanism is an important step in improving Python performance.

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.