Extended Python module series (4) ---- handling of reference counting problems, python ----
Taking on the above, we found that when using Python C/C ++ API to expand the Python module, we always have to consider the reference counting problem in various places, if you are not careful, memory leakage may occur to the extended modules. The reference counting problem is the biggest headache for the C language to expand the Python module. It is necessary for programmers to fully understand every c api used, or even be familiar with the source code to precisely know when to add reference counting, when to subtract one.
This article is a translation article. I think the reference count in the source code is clearly explained, so it is translated into Chinese. Http://edcjones.tripod.com/refcount.html #
Summary:
The struct definition of a Python Object contains a reference count and Object type:
#define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; typedef struct _object { PyObject_HEAD } PyObject;
Python provides two sets of macro definitions related to reference count [object. h ]:
#define Py_INCREF(op) ( \ _Py_CHECK_THREAD_SAVE \ _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \ ((PyObject*)(op))->ob_refcnt++)
/* When the reference count is 0, the memory occupied by the object will be released */# define Py_DECREF (op) \ do {\ if (_ Py_CHECK_THREAD_SAVE \ _ Py_DEC_REFTOTAL _ Py_REF_DEBUG_COMMA \ -- (PyObject *) (op)-> ob_refcnt! = 0) \ _ Py_CHECK_REFCNT (op) \ else \ _ Py_Dealloc (PyObject *) (op); \} while (0)
Another group considers the case where the object is NULl:
#define Py_XINCREF(op) do { if ((op) == NULL) ; else Py_INCREF(op); } while (0)#define Py_XDECREF(op) do { if ((op) == NULL) ; else Py_DECREF(op); } while (0)
In Python, no one can really own an object and only have object references. An object'sReference countIt is defined as the number of referers of the object. When the object is no longer used, it is the responsibility to actively call Py_DECREF (). WhenReference countIf it is 0, Python may delete this object.
Every time you call Py_INCREF (), you must call Py_DECREF (). In C language, each malloc must eventually call free (). In reality, it is easy to forget the free memory allocated on the heap, and it is hard to detect the memory leakage problem without using tools, because the memory and virtual memory of modern machines are abundant, generally, memory leakage occurs on long-running server programs.
When the reference count of a pointer to a Python Object is increased by 1, this Object isProtected.
When to call Py_INCREF () and Py_DECREF ()
Returns a Python Object from a function.
Most Python objects are created using functions provided by the Python c api. In general, create a Python object like PyObject * Py_Something (arguments) and return it to the caller. In the Py_Something function, Py_INCREF is called for the Python object (not all functions are called ), when using the Python object returned by Py_Something, you must remember that the reference count of this object has been added to 1. when you no longer need this object, you need to call Py_DECREF ().
void MyCode(arguments) { PyObject* pyo; ... pyo = Py_Something(args);
The MyCode function calls Py_Something and has the responsibility to handle the reference count of pyo. After MyCode uses pyo, you must call Py_DECREF (pyo ).
However, if MyCode needs to return the pyo object, for example:
PyObject* MyCode(arguments) { PyObject* pyo; ... pyo = Py_Something(args); ... return pyo; }
In this case, MyCode should not call PY_DECREF (). In this case, MyCode transfers the reference count responsibility of the pyo object.
Note: If a function returns a None object, the C code should be like this: you must increase the reference count of the None object.
Py_INCREF(Py_None); return Py_None;
The most common case has been discussed so far, that is, when Py_Something is called to create a reference and pass the reference count responsibility to its caller. In the Python document, this is called the new reference. For example, the document describes:
PyObject* PyList_New(int len) Return value: New reference. Returns a new list of length len on success, or NULL on failure.
When a reference is called "INCREF", it is usually called "protected.
Sometimes, Python source code does not call Py_DECREF ().
PyObject * PyTuple_GetItem(register PyObject *op, register int i) { if (!PyTuple_Check(op)) { PyErr_BadInternalCall(); return NULL; } if (i < 0 || i >= ((PyTupleObject *)op) -> ob_size) { PyErr_SetString(PyExc_IndexError, "tuple index out of range"); return NULL; } return ((PyTupleObject *)op) -> ob_item[i]; }
This situation is called borrowing a reference.
PyObject* PyTuple_GetItem(PyObject *p, int pos) Return value: Borrowed reference. Returns the object at position pos in the tuple pointed to by p. If pos is out of bounds, returns NULL and sets an IndexError exception.
The reference of this object is unprotected.
In Python source code, functions that return the unprotected preferencess (borrowing a reference) include:
PyTuple_GetItem(),PyList_GetItem(),PyList_GET_ITEM(),PyList_SET_ITEM(),PyDict_GetItem(),PyDict_GetItemString(),PyErr_Occurred(),PyFile_Name(),PyImport_GetModuleDict(),PyModule_GetDict(),PyImport_AddModule(),PyObject_Init(),Py_InitModule(),Py_InitModule3(),Py_InitModule4(), andPySequence_Fast_GET_ITEM().
For PyArg_ParseTuple (), this function sometimes returns PyObject, which is included in the parameter of Type PyObject. For example, in sysmodule. c:
static PyObject * sys_getrefcount(PyObject *self, PyObject *args) { PyObject *arg; if (!PyArg_ParseTuple(args, "O:getrefcount", &arg)) return NULL; return PyInt_FromLong(arg->ob_refcnt); }
In the implementation of PyArg_ParseTuple source code, there is no reference count for arg, so arg is an unprotected object. When sys_getrefcount returns, arg should not be DECREF.
A complete function is provided to calculate the sum of integers in a list.
Example 1:
long sum_list(PyObject *list) { int i, n; long total = 0; PyObject *item; n = PyList_Size(list); if (n < 0) return -1; /* Not a list */ /* Caller should use PyErr_Occurred() if a -1 is returned. */ for (i = 0; i < n; i++) { /* PyList_GetItem does not INCREF "item". "item" is unprotected. */ item = PyList_GetItem(list, i); /* Can't fail */ if (PyInt_Check(item)) total += PyInt_AsLong(item); } return total; }
The item returned by PyList_GetItem () is of the PyObject type, and the reference count is not INCREF. Therefore, the DECREF Of the item is not performed after the function done.
Example 2:
long sum_sequence(PyObject *sequence) { int i, n; long total = 0; PyObject *item; n = PySequence_Length(sequence); if (n < 0) return -1; /* Has no length. */ /* Caller should use PyErr_Occurred() if a -1 is returned. */ for (i = 0; i < n; i++) { /* PySequence_GetItem INCREFs item. */ item = PySequence_GetItem(sequence, i); if (item == NULL) return -1; /* Not a sequence, or other failure */ if (PyInt_Check(item)) total += PyInt_AsLong(item); Py_DECREF(item); } return total; }
Different from example 1, in the source code implementation of PySequnce_GetItem (), INCREF is performed on the reference count of the returned item. Therefore, Py_DECREF (item) must be called in the done function ).
When do I not need to call INCREF?
1. For local variables in functions, if these local variables are pointers to PyObject objects, there is no need to increase the reference count of these local objects. Theoretically, when a variable points to an object, the reference count of the object will be + 1. When the variable leaves the scope, the reference count of the object will be-1, these two operations offset each other, and the number of final object references remains unchanged. The real reason for using the reference count is to prevent objects from being destroyed in advance when a variable points to it.
When do I need to call INCREF?
If there is any possibility of calling DECREF on an object, you must ensure that the object cannot be in the unprotected state.
1) if a reference is in unprotected, it may cause a subtle bug. A common case is to retrieve the element object from the list and operate on it without increasing its reference count. PyList_GetItem returns a borrowed reference, so the item is not protected. Some other operations may delete this object from the list (decrease its reference count, or release it ). This causes the item to become a suspension pointer.
bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); PyList_SetItem(list, 1, PyInt_FromLong(0L)); PyObject_Print(item, stdout, 0); /* BUG! */ }
This function is used to retrieve 0th item elements from the list (without increasing its reference count at this time), replace list [1] With an integer 0, and print the item. it looks normal. There is no problem, but it is not.
We followed the PyList_SetItem function process. The reference count of all elements in the list is protected. Therefore, when you replace the element of list [1], you must reduce the reference count of the original element. Assume that the original element list [1] is a user-defined class and implements the _ del _ method. If the reference count of an instance of this class is 1, when the reference count is reduced, the instance will be released and the _ del _ method will be called. The _ del _ method is the code written by the python user, SO _ del _ can be any python code, is it possible that some operations are performed to invalidate the reference count of list [0], for example, in the _ del _ method del list [0], if the reference count of list [0] is also 1, list [0] will be released, and the released item will be passed to the PyObject_print () function as a parameter again, unexpected behavior occurs.
There are many simple solutions:
no_bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_INCREF(item); /* Protect item. */ PyList_SetItem(list, 1, PyInt_FromLong(0L)); PyObject_Print(item, stdout, 0); Py_DECREF(item); }
This is a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his__del__()
Methods wocould fail...
2) Pass the PyObject object to the function. Generally, it is assumed that the reference count of the passed object is already protected. Therefore, you do not need to call Py_INCREF inside the function. However, if you want to survive the function exit, you can call Py_INCREF.
When you pass an object reference into another function, in general, the function borrows the reference from you -- if it needs to store it, it will use Py_INCREF() to become an independent owner.
PyDict_SetItem () is an example in which the reference count of key and value is added to the dictionary.
PyTuple_SetItem () and PyList_SetItem () are different from PyDict_SetItem (). They take over the objects passed to them (steal a reference ).
The prototype of PyTuple_SetItem is PyTuple_SetItem (atuple, I, item). If atuple [I] currently contains a PyObject, DECREF the PyObject and set atuple [I] to item. Item will not be INCREFed
If PyTuple_SetItem fails to insert an element, the reference count of the item is reduced. Similarly, PyTuple_GetItem does not increase the reference count of the returned item.
PyObject *t;PyObject *x;x = PyInt_FromLong(1L);PyTuple_SetItem(t, 0, x);
When x is passed as a parameter to the PyTuple_SetItem function, you must not call Py_DECREF because the reference count of x is not added in the implementation of PyTuple_SetItem () function. If you manually reduce the reference count of x at this time, the element item in tuple t has been released.
When tuple t is DECREFed, all its elements are DECREFed.
PyTuple_SetItem is designed to take into account a very common scenario: Create a new object to fill the tuple or list. For example, create a tuple (1, 2, "there "). You can do this using Python c api:
PyObject *t; t = PyTuple_New(3); PyTuple_SetItem(t, 0, PyInt_FromLong(1L)); PyTuple_SetItem(t, 1, PyInt_FromLong(2L)); PyTuple_SetItem(t, 2, PyString_FromString("three"));
Note: PyTuple_SetItem is the only method for setting the tuple element. Both PySequence_SetItem and PyObject_SetItem reject this because tuple is an unchangeable data type.
The interface for creating a list is similar to the interface for creating a tuple, PyList_New () and PyList_SetItem (). The difference is that the elements that fill the list can use PySequence_SetItem (), however, PySequence_SetItem increases the reference count of input items.
PyObject *l, *x; l = PyList_New(3); x = PyInt_FromLong(1L); PySequence_SetItem(l, 0, x); Py_DECREF(x); x = PyInt_FromLong(2L); PySequence_SetItem(l, 1, x); Py_DECREF(x); x = PyString_FromString("three"); PySequence_SetItem(l, 2, x); Py_DECREF(x);
Python believes in minimalism. The code for creating a tuple (list) and filling a tuple (list) can be simplified:
PyObject *t, *l; t = Py_BuildValue("(iis)", 1, 2, "three"); l = Py_BuildValue("[iis]", 1, 2, "three");
Two Examples:
Example 1:
PyObject* MyFunction(void) { PyObject* temporary_list=NULL; PyObject* return_this=NULL; temporary_list = PyList_New(1); /* Note 1 */ if (temporary_list == NULL) return NULL; return_this = PyList_New(1); /* Note 1 */ if (return_this == NULL) Py_DECREF(temporary_list); /* Note 2 */ return NULL; } Py_DECREF(temporary_list); /* Note 2 */ return return_this; }
Note1: The reference count of the object returned by PyList_New is 1.
Note2: Because temporary_list should not exist when the function exits, DECREFed is required before the function returns.
Example 2:
PyObject* MyFunction(void) { PyObject* temporary=NULL; PyObject* return_this=NULL; PyObject* tup; PyObject* num; int err; tup = PyTuple_New(2); if (tup == NULL) return NULL; err = PyTuple_SetItem(tup, 0, PyInt_FromLong(222L)); /* Note 1 */ if (err) { Py_DECREF(tup); return NULL; } err = PyTuple_SetItem(tup, 1, PyInt_FromLong(333L)); /* Note 1 */ if (err) { Py_DECREF(tup); return NULL; } temporary = PyTuple_Getitem(tup, 0); /* Note 2 */ if (temporary == NULL) { Py_DECREF(tup); return NULL; } return_this = PyTuple_Getitem(tup, 1); /* Note 3 */ if (return_this == NULL) { Py_DECREF(tup); /* Note 3 */ return NULL; } /* Note 3 */ Py_DECREF(tup); return return_this; }
Note1: If PyTuple_SetItem fails or the tuple reference count turns to 0, the reference count of the object created by PyInt_FromLong is also reduced.
Note2: PyTuple_GetItem does not increase the reference count of the returned object
Note3: MyFunction is not responsible for processing the reference count of temporary, and does not need DECREF temporary