Intro
I've been blindly thinking that Python will not have a memory leak, but seeing that the online project grows in memory with the increase in uptime, I realized that the program I wrote was in memory leaks, before the memory leaks caused by the debug logging module.
At present, there are other places that cause memory leaks. After a day of fighting, and finally found the memory leaks, the project has been running for a long time, in the small amount of business when the memory can be back to just start when the memory consumption.
What's the situation without such trouble?
If your program just runs, it's not going to take a lot of trouble to find out if there is a memory leak, because Python frees all the memory it allocates when it exits, and if your program needs to run for a long time, look carefully for a memory leak.
Scene
How to generate a memory leak? The project is a TCP server that creates a connection instance to manage every time a connection is made, and the connection instance is not released at each disconnection. The reason for not being released must be that there is somewhere where the reference to the connection instance is not released, so over time, the connection creates the allocated memory, the connection disconnects and the memory is not freed, so a memory leak occurs.
Debugging methods
Because do not know where the specific is caused by the memory leak, so be patient of a little debugging.
Knowing that the disconnection was not released, I kept simulating creating the connection and then sending some packets and then disconnecting, then observing the memory footprint through the following line of shells:
Pid=50662;while true; Do PS aux | grep $PID | Grep-v grep | awk ' {print $ ' "$6} ' >> t; Sleep 1; Done
If it is maintained after a certain amount of growth, there is no leakage.
You can also view the object's reference count when the object is freed by Sys.getrefcount (obj). If the reference count becomes 2, it means that the object is reclaimed correctly after it jumps out of the namespace.
Cause
Two scenarios in the project cause the object to not be properly reclaimed:
- Object references that were exited before being recycled
- Cross-references
Object references that were exited before being recycled
To track the connection, the connection object is placed in a list at the same time, and the list is recycled only when the program exits, and if it is not handled correctly, the allocated object will be recycled only when the program exits.
Global variables and class variables are only recycled when the program exits:
_connections = []# ... class Connection (object): Def __init__ (self, sock, address) passdef server_loop (): # ... sock, a ddress = server_sock.accept () connection = connection (sock, address) _connections.append (connection) # ... sock.close ()
All of the established connections are placed in the global variable _connections, and if you do not remove them from the list (by reducing the references) when you close them, the connection object will not be recycled, and the objects referenced by the connection object and the connection object are not recycled each time you establish a connection.
It is also the same if you put an object in a class property, because the class object is allocated at the beginning of the program and is recycled when the program exits.
The workaround is to dismiss the object (delete) from the list (or other object) when exiting
_connections = []# ... class Connection (object): Def __init__ (self, sock, address) passdef server_loop (): # ... sock, a ddress = server_sock.accept () connection = connection (sock, address) _connections.append (connection) Try: # ... Sock.close () Finally: _connections.remove (connection) # XXX
Cross-references
Sometimes when we assign an instance property to an object, we need to assign ourselves to an instance property as an instance property of an instance property, which is very awkward, and look at the code:
Class Connectionhandler (object): Def __init__ (self, Connection): self._conn = Connectionclass Connection (object): def __init__ (self, sock, address) Self._conn_handler = Connectionhandler (self) # XXX
The above code will produce cross-references, and cross-references will confuse the interpreter, which can then be recycled from Generation 2 and Generation 3, which can be slow.
The way to solve this problem is to use weak references
Import Weakrefclass Connectionhandler (object): Def __init__ (self, Connection): self._conn = Connectionclass Connection (object): Def __init__ (self, sock, address) Self._conn_handler = Connectionhandler (Weakref.proxy (self) ) # XXX