This article mainly introduces a rough analysis of memory leakage in Python, and analyzes the causes including garbage collection. For more information, see
Introduction
I used to blindly think that Python would not have memory leakage, but I was aware that the program I wrote was in memory leakage due to the increasing memory usage of online projects as the running time grew, memory leakage caused by logging module debug.
At present, there are other causes of memory leakage. after a day of hard work, I finally found the memory leak. Currently, the project has been running for a long time. when the business volume is small, the memory can still return to the memory usage at the startup.
Under what circumstances do you not need to be so troublesome?
If your program just runs and exits, you don't have to pay a lot of weeks to find out whether there is memory leakage, because Python releases all the memory allocated by it when it exits, if your program needs to run continuously for a long time, you need to carefully check whether memory leakage has occurred.
Scenario
How does one cause memory leakage? the project is a TCP server, which creates a connection instance for management whenever there is a connection, and the connection instance is still occupied and not released at each disconnection. the reason why the instance is not released is certainly because the reference to the connected instance is not released somewhere. so over time, the connection creation allocates memory, and the disconnection does not release the memory, therefore, memory leakage occurs.
Debugging method
Because I don't know where the memory is leaked, I need to debug it with patience.
As I know that the connection is not released when the connection is disconnected, I will simulate the creation of the connection and then send some packages to disconnect the connection. then, I will observe the memory usage through the following shell line:
PID = 50662; while true; do; ps aux | grep $ PID | grep-v grep | awk '{print $5 "" $6}'> t; sleep 1; done
If you keep it after a certain amount of growth, it means there has been no leakage.
You can also view the reference count of the object when the object is released. getrefcount (obj ). if the reference count is changed to 2, the object will be correctly recycled after it jumps out of the namespace.
Cause
In two cases in the project, the object is not properly recycled:
- Object reference that is recycled only after exiting
- Cross-reference
Object reference that is recycled only after exiting
In order to track connections, the connection object is put in a list at the same time, and this list will be recycled only when the program Exits. if it is not handled correctly, then the allocated object will be recycled only when the program exits.
Global variables and class variables are recycled only when the program exits:
_CONNECTIONS = []# ...class Connection(object): def __init__(self, sock, address) passdef server_loop(): # ... sock, address = server_sock.accept() connection = Connection(sock, address) _CONNECTIONS.append(connection) # ... sock.close()
All established CONNECTIONS are placed in the global variable _ CONNECTIONS. if the connection object is not retrieved from the list when it is disabled (reference is reduced), the connection object will not be recycled, each time a connection is established, a connection object and the object referenced by the connection object will not be recycled.
It is the same if you put the object in a class attribute, because the class object is allocated at the beginning of the program and is recycled only when the program exits.
The solution is to remove the reference (delete) to the object from the list (or other objects) at exit)
_CONNECTIONS = []# ...class Connection(object): def __init__(self, sock, address) passdef server_loop(): # ... sock, address = server_sock.accept() connection = Connection(sock, address) _CONNECTIONS.append(connection) try: # ... sock.close() finally: _CONNECTIONS.remove(connection) # XXX
Cross-reference
Sometimes, when we assign an instance attribute to an object, we need to assign the instance attribute to the instance attribute. as an instance attribute of the instance attribute, we can just look at the code:
class ConnectionHandler(object): def __init__(self, connection): self._conn = connectionclass Connection(object): def __init__(self, sock, address) self._conn_handler = ConnectionHandler(self) # XXX
The above code will generate a cross reference, which will confuse the interpreter, so that it can only be recycled by the second and third generations. this process may be slow.
To solve this problem, use weak references.
import weakrefclass ConnectionHandler(object): def __init__(self, connection): self._conn = connectionclass Connection(object): def __init__(self, sock, address) self._conn_handler = ConnectionHandler(weakref.proxy(self)) # XXX