Python deep learning-memory management and python deep learning
Language memory management is an important aspect of language design. It is an important factor that determines the language performance. Both the manual management of C language and Java garbage collection have become the most important features of the language. The following uses the Python language as an example to describe how to manage the memory in a dynamic and object-oriented language.
Memory usage of Objects
Value assignment statements are the most common functions of languages. However, even the simplest value assignment statement can be meaningful. The value assignment statement of Python is worth studying.
a = 1
Integer 1 is an object. A is a reference. Use the value assignment statement to reference a to object 1. Python is a dynamic language (refer to the dynamic type) that separates objects from references. Python uses quotes like chopsticks to touch and flip the real food-object.
References and objects
To explore the storage of objects in the memory, we can turn to the Python built-in function id (). It is used to return the identity of an object ). In fact, the identity here is the memory address of the object.
a = 1print(id(a))print(hex(id(a)))
On my computer, they return:
11246696
'0xab9c68'
These are in decimal and hexadecimal notation of the memory address, respectively.
In Python, integers and short characters are cached by Python for reuse. When we create multiple references equal to 1, all these references are actually directed to the same object.
a = 1b = 1print(id(a))print(id(b))
The above program returns
11246696
11246696
It can be seen that a and B actually point to two references of the same object.
To verify that two references direct to the same object, we can use the is keyword. Is used to determine whether the two referenced objects are the same.
# Truea = 1b = 1print(a is b)# Truea = "good"b = "good"print(a is b)# Falsea = "very good morning"b = "very good morning"print(a is b)# Falsea = []b = []print(a is b)
The preceding annotations are the corresponding running results. As you can see, since Python caches integers and short strings, each object only has one copy. For example, all references to integer 1 point to the same object. Even if the value assignment statement is used, a new reference is created instead of the object itself. Long strings and other objects can have multiple identical objects. You can use the value assignment statement to create new objects.
In Python, each object has a total number of references pointing to the object, that is, reference count ).
We can use getrefcount () in the sys package to view the reference count of an object. Note that when a reference is passed to getrefcount () as a parameter, the parameter actually creates a temporary reference. Therefore, the getrefcount () result is 1 more than expected.
from sys import getrefcounta = [1, 2, 3]print(getrefcount(a))b = aprint(getrefcount(b))
For the above reason, two getrefcount will return 2 and 3 instead of the expected 1 and 2.
Object reference object
A container object (container) of Python, such as tables and dictionaries, can contain multiple objects. In fact, the container object contains not the element object itself, but references to each element object.
We can also customize an object and reference other objects:
class from_obj(object): def __init__(self, to_obj): self.to_obj = to_objb = [1,2,3]a = from_obj(b)print(id(a.to_obj))print(id(b))
We can see that a references object B.
Object Reference is the most basic component of Python. Even if the expression a = 1 is used, an element of the dictionary's key value "a" is used to reference the integer object 1. This dictionary object is used to record all global references. This dictionary references the integer object 1. You can use the built-in function globals () to view the dictionary.
When object A is referenced by another object B, the reference count of object A increases by 1.
from sys import getrefcounta = [1, 2, 3]print(getrefcount(a))b = [a, a]print(getrefcount(a))
Because object B references a twice, the reference count of a increases by 2.
The reference of container objects may constitute a complex topology. We can use the objgraph package to draw its reference relationship, for example
x = [1, 2, 3]y = [x, dict(key1=x)]z = [y, (x, y)]import objgraphobjgraph.show_refs([z], filename='ref_topo.png')
Objgraph is a third-party Python package. Install xdot before installation.
sudo apt-get install xdotsudo pip install objgraph
Objgraph Official Website
The two objects may reference each other to form a reference cycle ).
a = []b = [a]a.append(b)
Even an object that only needs to reference itself can constitute a reference ring.
a = []a.append(a)print(getrefcount(a))
The reference loop will cause a lot of trouble to the garbage collection mechanism. I will detail this point later.
Reduce references
The reference count of an object may be reduced. For example, you can use the del keyword to delete a reference:
from sys import getrefcounta = [1, 2, 3]b = aprint(getrefcount(b))del aprint(getrefcount(b))
Del can also be used to delete elements in container elements, such:
a = [1,2,3]del a[0]print(a)
If A reference points to object A, when the reference is redirected to another object B, the reference count of object A is reduced:
from sys import getrefcounta = [1, 2, 3]b = aprint(getrefcount(b))a = 1print(getrefcount(b))
Garbage Collection
If you eat too much, it will always become fat, and so does Python. When Python has more and more objects, they will occupy larger and larger memory. But you don't have to worry too much about Python's shape. It will "lose weight" when appropriate, start garbage collection, and clear useless objects. Garbage collection mechanisms are available in many languages, such as Java and Ruby. Although the ultimate goal is to shape the slim reminder, there is a big difference between different language-based weight loss solutions.
On the basic principle, when the reference count of an object in Python is reduced to 0, it indicates that no reference points to this object, and the object becomes garbage to be recycled. For example, if a new object is assigned to a reference, the reference count of the object changes to 1. If the reference is deleted and the reference count of the object is 0, the object can be recycled. For example, the following table:
a = [1, 2, 3]del a
After del a, no reference points to the previously created [1, 2, 3] table. Users cannot access or use this object in any way. This object becomes unhealthy fat if it remains in the memory. When garbage collection is started, Python scans the object with a reference count of 0 and clears the memory occupied by it.
However, losing weight is expensive and laborious. During garbage collection, Python cannot perform other tasks. Frequent garbage collection will greatly reduce the efficiency of Python. If there are not many objects in the memory, there is no need to start garbage collection. Therefore, Python only automatically starts garbage collection under certain conditions. When Python is running, the number of times of object allocation and object deallocation is recorded. Garbage collection starts only when the difference between the two is higher than a threshold value.
We can view the threshold value through the get_threshold () method of the gc module:
import gcprint(gc.get_threshold())
Return Value (700, 10, 10). The next two 10 values are the threshold values related to generation recovery, which can be seen later. 700 is the threshold for starting garbage collection. You can use the set_threshold () method in gc to reset it.
You can also manually start garbage collection by Using gc. collect ().
Generational recovery
Python also adopts the generation recycling policy. The basic assumption of this policy is that objects that have been alive for a longer period of time cannot become garbage in subsequent programs. Our programs often produce a large number of objects, many objects are generated and disappear quickly, but some objects are used for a long time. Out of trust and efficiency, we believe in their usefulness for such "long-lived" objects, So we reduce the frequency of scanning them in garbage collection.
Check more
Python divides all objects into three generations: 0, 1 and 2. All new objects are generation 0 objects. When a generation of objects is still alive after garbage collection, they are classified as the next generation objects. When garbage collection is started, all the 0-generation objects will be scanned. If the 0th generation passes through garbage collection for a certain number of times, scanning for 0th generation and 1st generation will be started. When the first generation has also undergone a certain amount of garbage collection, it will start scanning 0, 1, 2, that is, all objects.
These two times are the two 10 returned by get_threshold () (700, 10, 10. That is to say, every 10 times of 0-generation garbage collection will be combined with 1-generation garbage collection; and every 10 times of 1-generation garbage collection, there will be 1 second-generation garbage collection.
You can also use set_threshold () to adjust it. For example, you can perform more frequent scans on the second-generation objects.
import gcgc.set_threshold(700, 10, 5)
Isolated reference Ring
The existence of the reference ring will bring great difficulties to the above garbage collection mechanism. These reference rings may constitute unusable objects, but the reference count is not 0.
a = []b = [a]a.append(b)del adel b
We first created two table objects and referenced them to form a reference ring. After the references of a and B are deleted, these two objects cannot be called from the program, so they are useless. However, because of the existence of the reference ring, the reference count of these two objects is not reduced to 0 and will not be recycled.
Isolated reference Ring
To recycle such a reference ring, copy the reference count of each object in Python as gc_ref. Assume that the count for each object I is gc_ref_ I. Python traverses all object I. For the object j referenced by each object I, subtract 1 from the corresponding gc_ref_j.
Result After Traversal
After the traversal ends, the objects whose gc_ref is not 0, the objects referenced by these objects, and the objects that continue to be referenced downstream must be retained. Other objects are recycled.
Summary
As a dynamic language, Python separates objects from references. This is very different from the previous process-oriented language. To effectively release the memory, Python has built-in support for garbage collection. Python adopts a relatively simple garbage collection mechanism, that is, reference counting, and therefore needs to solve the problem of isolated reference loops. Python and other languages are both common and special. Understanding of the memory management mechanism is an important step to improve Python performance.
How does Python manage memory?
In Python, all objects smaller than 256 bytes use the Allocator implemented by pymalloc, while large objects use the System
Malloc. In addition, Python objects, such as integers, floating-point numbers, and lists, all have their own private memory pools, and objects do not share their memory pools. That is to say, if you allocate and release a large number
The memory used to cache these integers cannot be distributed to floating point numbers.
In Python, many times the applied memory is small memory. These small memory will be released soon after the application exists. Because these memory applications are not used to create objects, so there is no object 1
Memory Pool mechanism. This means that Python will execute a large number of malloc and free operations during runtime, and switch between user and core States frequently, which will seriously affect
Python execution efficiency. This is what we mentioned earlier.
C ++ calling the python module may cause memory leakage. It seems that the python memory management mechanism is a problem. I am not sure if anyone has studied it.
Py_Finalize () will free up all the memory you use in python. If you get PyObject in C, Py_Finalize () should be left empty, it's always worth it.