Python's in-depth learning memory management

Source: Internet
Author: User

Memory management of language is an important aspect of language design. It is an important factor in determining language performance. Whether it is the manual management of C language or the garbage collection of Java, it becomes the most important feature of language. In this example, the Python language illustrates a dynamic type, object-oriented language memory management method.

Memory usage of the object

Assignment statements are the most common language feature. But even the simplest assignment statement can have a lot of meaning. Python's assignment statement is worth studying.

?
1 a =1

The integer 1 is an object. And A is a reference. Using an assignment statement, reference a points to object 1. Python is a dynamic type of language (referencing dynamic types), and objects are separated from references. Python, like using chopsticks, touches and flips real food-objects by reference.


References and objects

To explore the storage of objects in memory, we can resort to Python's built-in function ID (). It is used to return the identity of an object. In fact, the so-called identity is the memory address of the object.

?
1234 a =1print(id(a))print(hex(id(a)))

On my computer, they return the following:

11246696
' 0xab9c68 '

Decimal and hexadecimal representations of memory addresses, respectively.

In Python, both integers and short characters, Python caches these objects for re-use. When we create multiple references that are equal to 1, we actually make all of these references point to the same object.

?
12345 a =1b = 1print(id(a))print(id(b))

The above program returns

11246696

11246696

Visible A and B are actually two references to the same object.

To verify that two references point to the same object, we can use the IS keyword. IS is used to determine whether the object referred to by two references is the same.

?
12345678910111213141516171819 # Truea = 1b = 1print(a is b)# Truea = "good"b = "good"print(a is b)# Falsea = "very good morning"b = "very good morning"print(a is b)# Falsea = []b = []print(a is b)

The above comment is the corresponding run result. As you can see, because Python caches integers and short strings, each object has only one copy. For example, all integer 1 references point to the same object. Even if an assignment statement is used, only new references are created, not the object itself. Long strings and other objects can have multiple identical objects, and you can use an assignment statement to create a new object.

In Python, each object has a total number of references to that object, that is, the reference count (reference count).

We can use the Getrefcount () in the SYS package to see the reference count of an object. It is important to note that when a reference is passed to Getrefcount () as a parameter, the parameter actually creates a temporary reference. As a result, Getrefcount () will get 1 more results than expected.

?
1234567 fromsys import getrefcount a = [1, 2, 3]print(getrefcount(a)) b =aprint(getrefcount(b))

For these reasons, two Getrefcount will return 2 and 3 instead of the expected 1 and 2.

Object Reference Object

A Python container object (container), such as a table, dictionary, and so on, can contain multiple objects. In fact, the container object contains not the element object itself, which is a reference to the individual element objects.

We can also customize an object and reference other objects:

?
12345678 class from_obj(object): def __init__(self, to_obj): self.to_obj = to_objb = [1,2,3]a = from_obj(b)print(id(a.to_obj))print(id(b))

As you can see, a References object B.

Object refers to the object, which is the most basic form of Python. Even a = 1 is an assignment that actually allows an element of the dictionary's key value "a" to refer to an Integer object 1. The Dictionary object is used to record all global references. The dictionary references the integer object 1. We can use the built-in function globals () to view the dictionary.

When an object A is referenced by another object B, the reference count of a increases by 1.

?
1234567 fromsys import getrefcount a = [1, 2, 3]print(getrefcount(a)) b =[a, a]print(getrefcount(a))

Because object B references two times, the reference count of A,a increases by 2.

A reference to a container object can constitute a complex topology. We can use objgraph packages to draw their reference relationships, such as

?
123456 x =[1, 2, 3]y = [x, dict(key1=x)]z = [y, (x, y)] importobjgraphobjgraph.show_refs([z], filename=‘ref_topo.png‘)

Objgraph is a third-party package for Python. You need to install Xdot before installing.

?
12 sudoapt-get install xdotsudo pip installobjgraph

Objgraph official website

Two objects may be referenced to each other, thus constituting a so-called reference ring (reference cycle).

?
123 a =[]b =[a]a.append(b)

Even an object, you just have to refer to yourself, you can form a reference ring.

?
123 a =[]a.append(a)print(getrefcount(a))

The citation ring can cause a lot of trouble to the garbage collection mechanism, and I'll elaborate on that later.

Reference reduction

The reference count for an object may be reduced. For example, you can use the DEL keyword to delete a reference:

?
12345678 fromsys import getrefcount a = [1, 2, 3]b = aprint(getrefcount(b)) delaprint(getrefcount(b))

Del can also be used to delete elements in a container element, such as:

?
123 a =[1,2,3]dela[0]print(a)

If a reference points to object A, the reference count of object A is reduced when the reference is redirected to a different object B:

?
12345678 fromsys import getrefcount a = [1, 2, 3]b = aprint(getrefcount(b)) a =1print(getrefcount(b))

Garbage collection

Eating too much is always getting fatter, so does python. As the objects in Python become more and more, they will occupy more and more memory. But you don't have to worry too much about Python's figure, it would be nice to "lose weight" at the right time, start garbage collection (garbage collection), and erase useless objects. There are garbage collection mechanisms in many languages, such as Java and Ruby. Although the ultimate goal is to create a slim reminder, there is a big difference in weight loss programs in different languages.


On the basic principle, when the reference count of an object in Python falls to 0 o'clock, it means that there is no reference to the object, and the object becomes garbage to be reclaimed. For example, if a new object is assigned to a reference, the reference count of the object becomes 1. If the reference is deleted and the reference count of the object is 0, then the object can be garbage collected. For example, the following table:

?
12 a =[1, 2, 3]dela

After del A, no references have been made to this table previously established [1, 2, 3]. It is impossible for a user to touch or use this object in any way. If the object stays in memory, it becomes unhealthy fat. When garbage collection starts, Python scans the object that has a reference count of 0, emptying the memory it occupies.

However, losing weight is an expensive and laborious task. When garbage collection occurs, Python cannot perform other tasks. Frequent garbage collection will greatly reduce python productivity. If there are not many objects in memory, it is not necessary to always start garbage collection. Therefore, Python automatically starts garbage collection only under certain conditions. When Python runs, it records the number of times it allocates objects (object allocation) and Unassigned Objects (object deallocation). Garbage collection starts when the difference between the two values is higher than a certain threshold.

We can view this threshold by using the Get_threshold () method of the GC module:

?
12 importgcprint(gc.get_threshold())

Return (700, 10, 10), followed by two 10 is the threshold associated with generational collection, which can be seen later. 700 is the threshold at which garbage collection starts. Can be reset by the Set_threshold () method in the GC.

We can also start the garbage collection manually, that is, using Gc.collect ().

Generational recycling

Python also uses the strategy of generational (generation) recycling. The basic assumption of this strategy is that the longer the surviving object, the less likely it is to become garbage in later programs. Our programs tend to produce a large number of objects, many objects quickly produce and disappear, but there are also some objects that are used for a long time. For the sake of trust and efficiency, we believe in the usefulness of these "longevity" objects, so we reduce the frequency of scanning them in garbage collection.


The little guy needs more tests.

Python divides all the objects into 0,1,2 three generations. All new objects are 0-generation objects. When a generation of objects has gone through garbage collection and is still alive, it is grouped into next-generation objects. When garbage collection starts, all 0-generation objects are scanned. If 0 generations have been garbage collected for a certain number of times, then scan cleanup for 0 generation and 1 generation is initiated. When 1 generations have experienced a certain amount of garbage collection, the 0,1,2 is started, that is, all objects are scanned.

These two times are returned by the above Get_threshold () (700, 10, 10) to return two 10. That is, every 10 times 0 generations of garbage collection, will be combined with 1 1 generations of garbage collection, and every 10 times 1 generations of garbage collection, there will be 1 times of the 2 generation garbage collection.

You can also use Set_threshold () to make adjustments, such as more frequent scanning of 2 generations of objects.

?
12 importgcgc.set_threshold(700, 10, 5)

Isolated reference ring

The existence of a reference ring can be very difficult for the garbage collection mechanism above. These reference rings may constitute some objects that cannot be used, but the reference count is not 0.

?
123456 a =[]b = [a]a.append(b) del adelb

We first created two table objects and referenced each other to form a reference ring. After you delete a A, b reference, it is not possible for these two objects to be called again from the program. However, because of the existence of the reference ring, the reference count of both objects has not dropped to 0 and will not be garbage collected.


Isolated reference ring

To reclaim such a reference ring, Python copies the reference count for each object, which can be remembered as gc_ref. Suppose, for each object I, the count is gc_ref_i. Python iterates through all of the objects I. For object J referenced by each object I, subtract the corresponding Gc_ref_j by 1.

The result after the traversal

After the end of the traversal, GC_REF objects that are not 0, and objects referenced by those objects, as well as the objects that continue to be referenced further downstream, need to be preserved. Other objects are garbage collected.

Summarize

Python is a dynamic type of language whose objects and references are detached. This is very different from the previous process-oriented language. To effectively free up memory, Python has built-in garbage collection support. Python takes a relatively simple garbage collection mechanism, the reference count, and therefore needs to address the problem of isolating the reference ring. Python is both common and special in other languages. Understanding this memory management mechanism is an important step in improving Python performance.

Original quote: Http://www.jb51.net/article/54544.htm

Python's in-depth learning memory management

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.