From the Python source code, we need to analyze Python memory management and python memory management.
Python Memory Management Architecture (Objects/obmalloc. c ):
Copy codeThe Code is as follows:
_________________________
[Int] [dict] [list]... [string] Python core |
+ 3 | <----- Object-specific memory -----> | <-- Non-object memory --> |
_______________________________ |
[Python's object allocator] |
+ 2 | ######## Object memory #######| <------ Internal buffers ------> |
______________________________________________________________________ |
[Python's raw memory allocator (PyMem _ API)] |
+ 1 | <----- Python memory (under PyMem manager's control) ------> |
__________________________________________________________________
[Underlying general-purpose allocator (ex: C library malloc)]
0 | <------ Virtual memory allocated for the python process -------> |
0. interfaces provided by C language library functions
1. PyMem _ * is a simple encapsulation of malloc, realloc, and free in C and provides underlying control interfaces.
2. PyObject _ * family, advanced memory control interface.
3. Management interfaces related to object types
PyMem _*
PyMem _ family: low-level memory allocation interface (low-level memory allocation interfaces)
Python provides a simple encapsulation of malloc, realloc, and free in C:
Why do we have to do so many times:
- Different C implementations have different results for malloc (0), while PyMem_MALLOC (0) is converted to malloc (1 ).
- The mixed use of malloc and free implemented by C may cause potential problems. Python provides encapsulation to avoid this problem.
- Python provides macros and functions, but macros cannot avoid this problem. Therefore, writing extensions should avoid using macros.
Source code:
Include/pymem.h#define PyMem_MALLOC(n) ((size_t)(n) > (size_t)PY_SSIZE_T_MAX ? NULL \ : malloc((n) ? (n) : 1))#define PyMem_REALLOC(p, n) ((size_t)(n) > (size_t)PY_SSIZE_T_MAX ? NULL \ : realloc((p), (n) ? (n) : 1))#define PyMem_FREE free Objects/object.c/* Python's malloc wrappers (see pymem.h) */void *PyMem_Malloc(size_t nbytes){ return PyMem_MALLOC(nbytes);}...
In addition to the simple encapsulation of C, Python also provides four macros
PyMem_New and PyMem_NEW
PyMem_Resize and PyMem_RESIZE
They can perceive the type size.
#define PyMem_New(type, n) \ ( ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL : \ ( (type *) PyMem_Malloc((n) * sizeof(type)) ) )#define PyMem_Resize(p, type, n) \ ( (p) = ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL : \ (type *) PyMem_Realloc((p), (n) * sizeof(type)) )#define PyMem_Del PyMem_Free#define PyMem_DEL PyMem_FREE
Some of the following functions are still present at the same time as functions and macros, and macros are all uppercase characters After underscores.
PyObject _*
The PyObject _ * family is an advanced memory control interface (high-level object memory interfaces ).
Note:
- Do not mix with the PyMem _ * family !!
- Unless there are special requirements for rough management, you should always use PyObject _*
Source code
Include/objimpl.h#define PyObject_New(type, typeobj) \ ( (type *) _PyObject_New(typeobj) )#define PyObject_NewVar(type, typeobj, n) \ ( (type *) _PyObject_NewVar((typeobj), (n)) ) Objects/object.cPyObject *_PyObject_New(PyTypeObject *tp){ PyObject *op; op = (PyObject *) PyObject_MALLOC(_PyObject_SIZE(tp)); if (op == NULL) return PyErr_NoMemory(); return PyObject_INIT(op, tp);}PyVarObject *_PyObject_NewVar(PyTypeObject *tp, Py_ssize_t nitems){ PyVarObject *op; const size_t size = _PyObject_VAR_SIZE(tp, nitems); op = (PyVarObject *) PyObject_MALLOC(size); if (op == NULL) return (PyVarObject *)PyErr_NoMemory(); return PyObject_INIT_VAR(op, tp, nitems);}
They perform two operations:
- Memory Allocation: PyObject_MALLOC
- Some initialization objects: PyObject_INIT and PyObject_INIT_VAR
Initialization is nothing to see, but this MALLOC is a bit complicated...
PyObject _ {Malloc, Free}
This is very different from the three in PyMem _ *. It is very complicated!
void * PyObject_Malloc(size_t nbytes)void * PyObject_Realloc(void *p, size_t nbytes)void PyObject_Free(void *p)
Python often needs to create and destroy small objects when running programs. To avoid a large number of malloc and free operations, Python uses the memory pool technology.
- A series of arena (kb for each management) constitute a linked list of memory areas
- Each arena has multiple pools (4 kb each ).
- Each memory application is released in a pool.
Memory block per request
When the application size is 1 ~ The memory pool is used for memory between 256 bytes (the PyMem_Malloc we mentioned earlier will be used for applying for a memory pool of 0 or 257 bytes or more ).
During each application, the actually allocated space is aligned according to a certain number of bytes, and 8 bytes (for example, PyObject_Malloc (20) in the following table will be allocated 24 bytes ).
Copy codeThe Code is as follows:
Request in bytes Size of allocated block Size class idx
----------------------------------------------------------------
1-8 8 0
9-16 16 1
17-24 24 2
25-32 32 3
33-40 40 4
.........
241-248 248 30
249-256 256 31
0,257 and up: routed to the underlying allocator.
These parameters are controlled by macros:
#define ALIGNMENT 8 /* must be 2^N *//* Return the number of bytes in size class I, as a uint. */#define INDEX2SIZE(I) (((uint)(I) + 1) << ALIGNMENT_SHIFT)#define SMALL_REQUEST_THRESHOLD 256
Pool
Each applied memory block must be allocated in the pool. The size of a pool is 4 kb. Controlled by the following macros:
# Define SYSTEM_PAGE_SIZE (4*1024)
# Define POOL_SIZE SYSTEM_PAGE_SIZE/* must be 2 ^ N */
The header of each pool is defined as follows:
struct pool_header { union { block *_padding; uint count; } ref; /* number of allocated blocks */ block *freeblock; /* pool's free list head */ struct pool_header *nextpool; /* next pool of this size class */ struct pool_header *prevpool; /* previous pool "" */ uint arenaindex; /* index into arenas of base adr */ uint szidx; /* block size class index */ uint nextoffset; /* bytes to virgin block */ uint maxnextoffset; /* largest valid nextoffset */};
Note that there is a member szidx, which corresponds to the Size class idx in the last column in the previous list. This also shows a problem: each pool can only allocate a fixed size of memory blocks (for example, only 16 bytes are allocated, or only 24 bytes are allocated ...).
To allocate memory blocks of various sizes in the previous list, multiple pools are required. After the pool of the same size is allocated, a new pool is required. Multiple Pools form a linked list in turn
Arena
Multiple pool objects are managed using something called arena.
struct arena_object { uptr address; block* pool_address; uint nfreepools; uint ntotalpools; struct pool_header* freepools; struct arena_object* nextarena; struct arena_object* prevarena;};
The memory size controlled by arean is controlled by the following macros:
#define ARENA_SIZE (256 << 10) /* 256KB */
A series of arena form a linked list.
Reference count and garbage collection
In Python, the lifecycle of most objects is controlled by reference count, which enables dynamic memory management.
But the reference count has a fatal problem: circular reference!
To break the circular references, Python introduces the garbage collection technology.