analysis of memory leaks in Python
Introduction
I've been blind to the idea that Python will never have a memory leak, but seeing the online project growing memory footprint as the running time grows, I realized that my program was leaking memory before the logging module had been debug.
At present, there are other places caused by memory leaks. After a day of fighting, finally found the memory leak place, the current project run for a long time, in a small amount of business memory can return to just start when the memory footprint.
What's the situation without this trouble?
If your program just runs and quits, you don't have to go out of your way to find out if there is a memory leak, because Python frees up all of the memory it allocates when it exits, and if your program needs to run for a long time, look carefully for a memory leak.
Scene
How do I generate a memory leak? The project is a TCP server that creates a connection instance whenever there is a connection to manage, and the connection instance is occupied and not released each time it is disconnected. The reason for not being freed is certainly because there is a place where the reference to the connection instance is not freed, so over time, the connection creates the allocated memory, the connection disconnects and does not release the memory, so a memory leak is generated.
Debugging methods
Because you do not know where the specific cause of memory leaks, so be patient with a little debugging.
Because I know there is no release from the disconnect, I am constantly emulating the creation of the connection and then disconnecting the packets after sending them, and then using the following line of shells to observe the memory footprint:
Pid=50662;while true; Do PS aux | grep $PID | Grep-v grep | awk ' {print $ ' $} ' >> t; Sleep 1; Done
If you keep living after a certain amount of growth, it means that there is no leakage.
You can also view the object's reference count at the time the object is released, through Sys.getrefcount (obj). If the reference count changes to 2, the object is correctly recycled after it has jumped out of the namespace.
Cause
Two situations in a project cause the object not to be properly reclaimed:
Object references that are reclaimed before they are withdrawn
Cross Reference
Object references that are reclaimed before they are withdrawn
To track the connection, the connection object is placed in a list, and the list is recycled only when the program exits, and if it is not handled correctly, the assigned object will be recycled only when the program exits.
Global variables and class variables are only reclaimed when the program exits:
?
1 2 3 4 5 6 7 8 9 10 11 12 13-14 |
_connections = [] # ... class Connection (object): Def __init__ (self, sock, address) pass Def Server_loop (): # ... sock , address = server_sock.accept () connection = connection (sock, address) _connections.append (connection) # ... sock.close ( ) |
All the established connections are placed in the global variable _connections, and if the connection object is not reclaimed if it is not removed from the list (less references) when it is closed, then each connection object and the object referenced by the connection object will not be reclaimed.
It's also the same if you put objects in a class attribute, because class objects are assigned at the beginning of the program and are recycled when the program exits.
The workaround is to disassociate the object from the list (or other objects) when exiting (delete)
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16-17 |
_connections = [] # ... class Connection (object): Def __init__ (self, sock, address) pass Def Server_loop (): # ... sock , address = server_sock.accept () connection = connection (sock, address) _connections.append (connection) Try: # ... sock.cl OSE () Finally: _connections.remove (connection) # XXX |
Cross Reference
Sometimes when we assign an instance property to an object, we need to assign ourselves to the instance property as an instance property of the instance property, which is very awkward, look at the code:
?
1 2 3 4 5 6 7 8 |
Class Connectionhandler (object): Def __init__ (self, connection): Self._conn = Connection class Connection (object): Def __init__ (self, sock, address) Self._conn_handler = Connectionhandler (self) # XXX |
The code above will generate cross-references, and cross-references can confuse the interpreter, which can only be recycled in 2 and 3 generations, and this process can be slow.
The way to solve this problem is to use weak references
?
1 2 3 4 5 6 7 8 9 10 |
Import Weakref class Connectionhandler (object): Def __init__ (self, connection): Self._conn = Connection Class Connec tion (object): Def __init__ (self, sock, address) Self._conn_handler = Connectionhandler (self) # XXX |