Python memory management

Source: Internet
Author: User
Language memory management is an important aspect of language design. It is an important factor that determines the language performance. Both the manual management of C language and Java garbage collection have become the most important features of the language. The following uses the Python language as an example to describe how to manage the memory in a dynamic and object-oriented language. The assignment statement is the most common function of the language. However, memory management in the simplest language is an important aspect of language design. It is an important factor that determines the language performance. Both the manual management of C language and Java garbage collection have become the most important features of the language. The following uses the Python language as an example to describe how to manage the memory in a dynamic and object-oriented language.

Memory usage of objects

Value assignment statements are the most common functions of languages. However, even the simplest value assignment statement can be meaningful. The value assignment statement of Python is worth studying.

a = 1


Integer 1 is an object. A is a reference. Use the value assignment statement to reference a to object 1. Python is a dynamic language (refer to the dynamic type) that separates objects from references. Python uses quotes like chopsticks to touch and flip the real food-object.

a = 1print(id(a))print(hex(id(a)))


On my computer, they return:

11246696

'0xab9c68'

These are in decimal and hexadecimal notation of the memory address, respectively.

In Python, integers and short characters are cached by Python for reuse. When we create multiple references equal to 1, all these references are actually directed to the same object.

a = 1b = 1print(id(a))print(id(b))



The above program returns

11246696

11246696

It can be seen that a and B actually point to two references of the same object.

To verify that two references direct to the same object, we can use the is keyword. Is used to determine whether the two referenced objects are the same.

# Truea = 1b = 1print(a is b)# Truea = "good"b = "good"print(a is b)# Falsea = "very good morning"b = "very good morning"print(a is b)# Falsea = []b = []print(a is b)



The preceding annotations are the corresponding running results. As you can see, since Python caches integers and short strings, each object only has one copy. For example, all references to integer 1 point to the same object. Even if the value assignment statement is used, a new reference is created instead of the object itself. Long strings and other objects can have multiple identical objects. you can use the value assignment statement to create new objects.

In Python, each object has a total number of references pointing to the object, that is, reference count ).

We can use getrefcount () in the sys package to view the reference count of an object. Note that when a reference is passed to getrefcount () as a parameter, the parameter actually creates a temporary reference. Therefore, the getrefcount () result is 1 more than expected.

from sys import getrefcounta = [1, 2, 3]print(getrefcount(a))b = aprint(getrefcount(b))



For the above reason, two getrefcount will return 2 and 3 instead of the expected 1 and 2.

Object reference object

A container object (container) of Python, such as tables and dictionaries, can contain multiple objects. In fact, the container object contains not the element object itself, but references to each element object.

We can also customize an object and reference other objects:

class from_obj(object):  def init(self, to_obj):    self.to_obj = to_objb = [1,2,3]a = from_obj(b)print(id(a.to_obj))print(id(b))



We can see that a references object B.

Object reference is the most basic component of Python. Even if the expression a = 1 is used, an element of the dictionary's key value "a" is used to reference the integer object 1. This dictionary object is used to record all global references. This dictionary references the integer object 1. You can use the built-in function globals () to view the dictionary.

When object A is referenced by another object B, the reference count of object A increases by 1.

from sys import getrefcounta = [1, 2, 3]print(getrefcount(a))b = [a, a]print(getrefcount(a))



Because object B references a twice, the reference count of a increases by 2.

The reference of container objects may constitute a complex topology. We can use the objgraph package to draw its reference relationship, for example

x = [1, 2, 3]y = [x, dict(key1=x)]z = [y, (x, y)]import objgraphobjgraph.show_refs([z], filename='ref_topo.png')



sudo apt-get install xdotsudo pip install objgraph


The two objects may reference each other to form a reference cycle ).

a = []b = [a]a.append(b)



Even an object that only needs to reference itself can constitute a reference ring.

a = []a.append(a)print(getrefcount(a))



Referencing a ring will cause a lot of trouble to the garbage collection mechanism. I will detail this in the future.

Reduce references

The reference count of an object may be reduced. For example, you can use the del keyword to delete a reference:

from sys import getrefcounta = [1, 2, 3]b = aprint(getrefcount(b))del aprint(getrefcount(b))



Del can also be used to delete elements in container elements, such:

a = [1,2,3]del a[0]print(a)



If A reference points to object A, when the reference is redirected to another object B, the reference count of object A is reduced:

from sys import getrefcounta = [1, 2, 3]b = aprint(getrefcount(b))a = 1print(getrefcount(b))


Garbage collection

If you eat too much, it will always become fat, and so does Python. When Python has more and more objects, they will occupy larger and larger memory. But you don't have to worry too much about Python's shape. it will "lose weight" when appropriate, start garbage collection, and clear useless objects. Garbage collection mechanisms are available in many languages, such as Java and Ruby. Although the ultimate goal is to shape the slim reminder, there is a big difference in weight loss solutions in different languages (this can be compared with this article and Java memory management and garbage collection ).

On the basic principle, when the reference count of an object in Python is reduced to 0, it indicates that no reference points to this object, and the object becomes garbage to be recycled. For example, if a new object is assigned to a reference, the reference count of the object changes to 1. If the reference is deleted and the reference count of the object is 0, the object can be recycled. For example, the following table:

a = [1, 2, 3]del a


After del a, no reference points to the previously created [1, 2, 3] table. Users cannot access or use this object in any way. This object becomes unhealthy fat if it remains in the memory. When garbage collection is started, Python scans the object with a reference count of 0 and clears the memory occupied by it.

However, losing weight is expensive and laborious. During garbage collection, Python cannot perform other tasks. Frequent garbage collection will greatly reduce the efficiency of Python. If there are not many objects in the memory, there is no need to start garbage collection. Therefore, Python only automatically starts garbage collection under certain conditions. When Python is running, the number of times of object allocation and object deallocation is recorded. Garbage collection starts only when the difference between the two is higher than a threshold value.

We can view the threshold value through the get_threshold () method of the gc module:

import gcprint(gc.get_threshold())



Return value (700, 10, 10). the next two 10 values are the threshold values related to Generation recovery, which can be seen later. 700 is the threshold for starting garbage collection. You can use the set_threshold () method in gc to reset it.

You can also manually start garbage collection by using gc. collect ().

Generational recovery

Python also adopts the generation recycling policy. The basic assumption of this policy is that objects that have been alive for a longer period of time cannot become garbage in subsequent programs. Our programs often produce a large number of objects, many objects are generated and disappear quickly, but some objects are used for a long time. Out of trust and efficiency, we believe in their usefulness for such "long-lived" objects, so we reduce the frequency of scanning them in garbage collection.

Python divides all objects into three generations: 0, 1 and 2. All new objects are Generation 0 objects. When a generation of objects is still alive after garbage collection, they are classified as the next generation objects. When garbage collection is started, all the 0-generation objects will be scanned. If the 0th generation passes through garbage collection for a certain number of times, scanning for 0th generation and 1st generation will be started. When the first generation has also undergone a certain amount of garbage collection, it will start scanning 0, 1, 2, that is, all objects.

These two times are the two 10 returned by get_threshold () (700, 10, 10. That is to say, every 10 times of 0-generation garbage collection will be combined with 1-generation garbage collection; and every 10 times of 1-generation garbage collection, there will be 1 second-generation garbage collection.

You can also use set_threshold () to adjust it. for example, you can perform more frequent scans on the second-generation objects.

import gcgc.set_threshold(700, 10, 5)



Isolated reference ring

The existence of the reference ring will bring great difficulties to the above garbage collection mechanism. These reference rings may constitute unusable objects, but the reference count is not 0.

a = []b = [a]a.append(b)del adel b



We first created two table objects and referenced them to form a reference ring. After the references of a and B are deleted, these two objects cannot be called from the program, so they are useless. However, because of the existence of the reference ring, the reference count of these two objects is not reduced to 0 and will not be recycled.

Python and other languages are both common and special. Understanding of the memory management mechanism is an important step to improve Python performance.

The above is a detailed description of python memory management. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.