Remember the problem of debugging Python memory leaks

Source: Internet
Author: User
Tags virtual environment

Reprint: HTTP://WWW.JIANSHU.COM/P/2D06A1A01CC3

These two days, due to the company's needs, have written a server for receiving DICOM files (medical image files). After various coding-debuging-coding-debuging, finally on-line, on-line after the heart flattered, all normal.

The next day to work, the person in charge and I said the reception is too slow, the card is dying. I think it's a question of Python itself? (Programmer intrinsic Thinking) I'm curious to open the terminal input

ps -aux | grep python

Find Process ID



That is 21610

I have not passed a few pictures here to 78m, it appears to be a memory problem. In fact, the production environment occupies more, because the production environment is confidential, only in the test environment test less data, the production environment has once risen to 3.7g of memory consumption.

This is not decisive. I found that when a new file was uploaded, the memory consumption would increase and the initial assertion was that the memory occupied by the Dicom file related objects. The first task now is to find a debugging tool that can leak memory.

It may be doubtful that Python has a garbage collector as a dynamic type language and how can there be memory leaks? In fact, there may be memory leaks, there are several:

    1. The object is always referenced by global variables, and the global variable has a long life cycle.
    2. The garbage collector is disabled or set to the debug state, and the garbage-collected memory is not freed.
    3. is also very rare memory leak way is the problem today, I deal with this problem two genius debug out, now share to everyone. Sir please keep looking down.

When it comes to looking at Python memory leaks, there are actually a lot of them, now briefly

    • Gc:python built-in modules, functions less functional basic, simple to use, as a python developer inside the content must go through
    • Objgraph: Can draw the object reference graph, for the object kind is small, the structure relatively simple program applies, I this one library sets a library, the memory also uses so many,
    • Guppy: Can be used to count the objects inside the heap, it is more practical
    • Pympler: Can count the various types of memory inside the use, get the size of the object

Above these although useful but always do not have the idea, above these all need to change my source program, more laborious, the line code is not to say change can change, and they are weaker, and later found two powerful tools:

    • Tracemalloc: Very strong, you can see directly which (which) objects occupy the largest space, these objects are who, call stack is what kind of, Python3 directly built-in, python2 if the installation needs to compile
    • Pyrasite: A third-party library that penetrates into the running Python process to dynamically modify the data and code inside it (actually modifying the code is done by modifying the data)

I started with a great desire to use Tracemalloc, but python2 particularly unfriendly, need to recompile python, and can only be compiled with python2.7.8, compiled and not easy to embed in the virtual environment, head big, decisive change for the second one.

: Pyrasite need to run command under root user before using echo 0 >/proc/sys/kernel/yama/ptrace_scope before normal use

Pyrasite inside there is a tool called pyrasite-memory-viewer, function and Guppy is similar, but can be used for memory usage statistics and the reference relationship between objects to save snapshots, very easy to use and very powerful. Run

Pyrasite-memory-viewer <pid>

You can see that the most memory-intensive is dicomfilelike objects of this type. Have reached tens of thousands, it is intolerable.
For the time being, there may be two memory leaks above that could not reclaim this object. Open Pyrasite-shell

Pyrasite-shell <pid>

I'll go through.

Gc.isenabled ()

Judging if the GC is working, it turns out to be true, that is, working properly, and using Gc.setdebug (GC. STATUS) to set the GC to debug mode, and then Gc.collect () for garbage collection to discover that no more memory is released, the possibility of a second leak is denied.
Now look at the objects that cannot be freed in Gc.garbage, let me check to see if there are global variables pointing to them (this is most likely a list or a dictionary)

Gc.garbage can be seen stuffed with various dicomfilelike objects


So our goal is to find an object first and then look up at the same level of reference.

>>> d = gc.garbage[-1]# Grab a Dicomfilelike object>>> D<dicom.filebase.dicomfilelike Object at0x7f362c305390>>>> OBJS = gc.get_referrers (d)>>> Len (OBJS)8>>> Objs.remove (Gc.garbage)>>> Objs.remove (Locals ())>>> objs[0]# The output here is a large dictionary, including the Builtins, should be <pid> under the locals ().>>> objs[1]<bound method Dicomfilelike.write_leus of <dicom.filebase.dicomfilelike object at0x7f362c305390>>>>> objs[ 2]<bound method Dicomfilelike.read_leul of <dicom.filebase.dicomfilelike object at Span class= "Hljs-number" >0x7f362c305390>>>>> objs[ 3]<bound method Dicomfilelike.read_leus of <dicom.filebase.dicomfilelike object at Span class= "Hljs-number" >0x7f362c305390>>>>> objs[ 4]<bound method Dicomfilelike.write_leul of <dicom.filebase.dicomfilelike object at Span class= "Hljs-number" >0x7f362c305390>>>>> objs[ 5]<bound method Dicomfilelike.read_le_tag of <dicom.filebase.dicomfilelike object at 0X7F362C305390>>           

In fact, there are no more global variables pointing to this d, and it is found that the object address of the method is the same as the D, which indicates that the object is actually self-cyclic reference.

So Python can't possibly not support recycling of circular reference objects? Following this question, I checked the StackOverflow.

Does Python GC deal with Reference-cycles?

The first answer to this question is very clear, if the user does not customize the __del__ method of the class, the GC can reclaim the object with self-reference, but you will not be able to implement the __del__ method yourself.

This is the third possibility of a python memory leak.

Looking back at the source of Dicomfilelike, sure enough in the __INIT__ function defined a __del__ function, I used a monkey patch removed this method, the memory leak problem can be solved.

def monkey_patch_dicom():    """    修正dicom中DicomFileLike对象不释放内存问题    """    del dicom.filebase.DicomIO.__del__
Summarize

To the end of the entire debugging process, however, in fact the process has done a lot of twists and turns, in the Pyrasite will find a few references to Dicomfilelike object, it is not easy to identify, at first I thought is a global object reference Dicomfilelike , such as the list of what, and later found in fact is the locals () and Globals () dictionary, if the use of Pyrasite-memory-viewer saved data will find a large list of all the Dicomfilelike objects that are not recycled, Adjusting supervised half-day discovery is actually gc.garbage, good embarrassed, once let me once suspected is the first kind of leakage way, but how to find this object have not found. There are a few more see threads reached 140 +, and later found that the thread is actually a little relationship, the thread maintained at this number is stable.

Some of the other hack techniques used in this process are:
    • To view the number of threads for a process

      ps -o nlwp <pid>
    • Get objects dynamically based on the id/address of an object

      import ctypesobj = ctypes.cast(<addr_or_id>, ctypes.py_object).value
    • View the logs for garbage collection

      gc.set_debug(...)


Weidwonder
Links: http://www.jianshu.com/p/2d06a1a01cc3
Source: Pinterest
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.

Remember the problem of debugging Python memory leaks

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.