The heap of Linux memory management

Source: Internet
Author: User

Article source--Green freezing point of Blog Park

Several times we analyzed the 4G virtual memory of user process in Linux system roughly divided into several parts, introduced the virtual storage management of data segment, code segment and so on in 3G user space, and analyzed the use of stack. This time, let's analyze another important part of virtual storage--the heap. In the previous introduction, we know that the compiler, the operating system is responsible for a large number of stack allocation management work. Whether it is statically allocated stack space or the user's dynamically allocated stack space, it is automatically released when the function returns. Heap usage is more flexible than the stack, allowing programmers to dynamically allocate and release, but also means that the use of heaps requires more careful programmer.

4.5 Heap of memory management

When learning "data structures", we know that heaps, stacks are basic data structures. But in memory management, although we often put the heap area and the stack area together, but in fact, they are different in many ways. Stack area is really a stack data structure, and by the computer hardware, operating system, as well as the compiler with the completion of computer operations is the basic data structures. In assembly language, we often say that the "stack", in fact, refers to the stack. The heap area actually refers to the memory area which is allocated dynamically during the program running, and its management is usually done in the function library. The heap is called because it is usually used to manage the allocated memory using the heap data structure. In other words, it can be managed with any data structure, even a simple linked list. The heap is used because the heap has its own advantages in speed, space utilization, and adjustability.

Related library functions for 4.5.1 Heap Management

In ISO C, there are three functions that allocate memory dynamically, namely:
void *malloc (size_t size);
void *calloc (size_t nmemb, size_t size);
void *realloc (void *ptr, size_t size);
Of these three library functions, the most common is malloc. The malloc function is called to allocate memory space of size, and the data for the memory space is not initialized. The return value is a pointer to the allocated space. Calloc is similar to malloc, except that it returns an array of NMEMB elements, each of which is a size bytes. That is, the nmemb*size size of the memory space is allocated, and the data in the space is initialized to 0.
ReAlloc is a wonderful function that changes the memory block pointed to by PTR to size bytes (PTR is returned by the previous Malloc,calloc,realloc function). If size is larger than the memory block that the previous PTR points to, an additional chunk of memory is allocated, and the new memory block is not initialized. If size is smaller than previous memory, a chunk of memory is deleted. The data in the old memory remains unchanged. If ptr==null, then realloc is equivalent to the malloc function, and if size==0, then realloc is equivalent to the free (PTR) function. The return value of the realloc should be paid special attention. The role of ReAlloc is to readjust the amount of memory the PTR points to, but the adjusted memory space and the original memory space may not be the same memory address. This means that the block of memory that the PTR points to is moved by size adjustment. So the address pointer returned by the REALLOC is re-assigned to PTR, i.e.:
ptr = ReAlloc (ptr,size);

The free function is a function that is used to release allocated memory:
void free (void *ptr);

Related system calls for 4.5.2 Heap management

The implementation of the MALLOC series functions is inseparable from the two basic calls provided in Linux:
int brk (void *end_data_segment);
void *sbrk (intptr_t increment);
The role of BRK:BRK () is the same as its name used to break the system's memory limit for process settings, which is used to set the bounds of the process. As mentioned earlier, the heap is growing from a virtual low address to a high address. BRK () is used to set the maximum heap fetch, which is the heap top. It's like a lid that moves up and down as the heap's distribution is released. In the memory space under this lid, the operating system is considered legal. There is also a sbrk () function associated with BRK (), SBRK () is not a system call, but a library function. SBRK (+/-n) means to increase/decrease N bytes for the current fetch limit.

void *mmap (void *addr, size_t len, int prot, int flags, int fildes, off_t off);
int Munmap (void *addr, size_t len);
The use of Mmap:mmap () is more flexible and more versatile than BRK (). The virtual memory address can be mapped to a file, shared memory, etc., so that users read and write files in a way to complete interprocess communication. Of course, the virtual storage address becomes legal after mapping. So in the heap allocation, often borrow mmap can add access to the process of virtual storage space, and do not need to read and write files and other requirements, so the general use of Anonymous mapping (map_anonymous) to complete. Munmap In contrast to what it does, it is often used to release mmap allocated by the virtual memory.

Internal management of the 4.5.3 heap

For programmers, it is primarily through malloc/free to use dynamically allocated memory. Malloc is implemented in many ways, using the GLIBC (Dlmalloc) implemented by Doug Lea and Wolfram Gloger, as well as malloc on Phkmalloc,solaris. Of course you can also fully implement a simple malloc yourself. Regardless of the implementation version, malloc contains two parts of the content: memory allocation and memory management.

4.5.3.1 Heap Space Memory allocation

When malloc () allocates memory, it first calls the above mentioned BRK () or mmap () to request a piece of memory from the operating system. In fact, let the operating system know that the memory of the virtual storage address is valid. Allocate the corresponding physical memory when using these virtual storage addresses instead of segmentation fault.
...
int *l = SBRK (0);
k=l+1023;
printf ("K=%d,at%p\n", *k,k);
...
Running the program will throw:
Segmentation fault

If you change to:
...
int *l = SBRK (0);
SBRK (1);
k=l+1023;
printf ("K=%d,at%p\n", *k,k);
...
The program will run normally and output:
K=100,at 0X804AFFC

The first code error occurs because the program accesses memory that is not yet allocated and exceeds the current heap limit. The second code uses SBRK (1) to dynamically allocate memory, so access succeeds. Note that although here sbrk (1), the current heap is only increased by 1 bytes on the surface. However, because the system's memory allocation is in pages, the current heap actually adds 4KB, so access to K = l+1023 is also legal.

BRK () and mmap (), although used in memory allocation, but each has its advantages, each time the BRK () of the virtual storage space is continuous, easy to merge, reuse, and more save page alignment wasted space, but may form a memory hole (see below), suitable for smaller memory allocation. Mmap () does not form voids like BRK (), but cannot be reused and merged. And the cost is related to the specific platform, and the allocated memory is initialized to 0, so it is suitable for the allocation of large space. In Dlmalloc, if malloc allocates less than 128KB of memory, use BRK () to increase the memory used by the process. If the allocated memory is greater than or equal to 128KB, use MMAP () to allocate memory (128KB This value is adjustable on different platforms).
Let's look at an example:
...
int *heap_var = malloc (sizeof (int)); Smaller memory block allocation requests
int *large_var = malloc (256*1024); Large Memory block allocation request
printf ("Address of Heap_var (heap):%p\n", Heap_var);
printf ("Address of Large_var (Heap):%p\n", Large_var);
...
The output is:
Address of Heap_var (heap): 0x804a008
Address of Large_var (Heap): 0xb7db2008

If you use the Strace command to trace, you can see that this code executes the following system call:
BRK (0x806b000) = 0x806b000
MMAP2 (NULL, 266240, prot_read| Prot_write, map_private| Map_anonymous,-1, 0) = 0xb7db2000
We can clearly see that for smaller memory allocations, BRK () system calls are used, and mmap system calls are used for large memory block allocation requests. And we found that the two addresses are far apart, so the heap is often divided into two parts, one is BRK allocated memory, usually at the low address. The other is the mmap allocated memory, also called the address map area, which is usually located at the high address. Of course, the allocation of memory with different system calls can also be mixed management, depending on the specific implementation.

Memory management of 4.5.3.2 heap space

The next step is to manage to allocate good memory with BRK and mmap. Because BRK (), mmap () is a system call, the overhead is larger if each call to malloc dynamically allocates memory executes a system call. Moreover, if the memory of each application is small, but the memory allocated by the system is a multiple of fixed size (usually 4KB, one page), so there will be a lot of waste. So malloc typically implements a memory heap to manage these memory, and malloc allocates memory in a number of chunk ways into the memory heap. Each time a user calls malloc to dynamically allocate memory, malloc will first look in the memory heap, if there is no suitable idle chunk in the memory heap, then use the BRK/MALLOC system call to allocate a chunk of memory, and then put the newly allocated chunk of memory into the memory heap, and generate a piece of the appropriate chunk block to return to the user. When the user releases chunk with free, the memory may not be freed immediately using the system call, but instead the freed chunk is added to the memory heap as an idle chunk, and other idle chunk are merged to facilitate reuse at the next allocation.

Generally, the released chunk is released using Munmap if it is marked as Mmap request. If it is BRK application, further determine whether the idle chunk under the heap roof is greater than 128KB, and if so, use BRK () to release. If it is less than 128KB, it is still maintained by the memory heap. The problem with the use of BRK () is that when BRK () frees up memory blocks below the top of the heap and memory blocks are not freed from the top of the heap. Then the release of this memory will not succeed, resulting in a memory hole.

In malloc, each block of chunk is assigned a data structure for administration, chunk head. How big is chunk head? Let's take a look at the situation in malloc (0).
...
int *heap_var = malloc (0);
int *heap_var1 = malloc (0);
printf ("Address of Heap_var:%p\n", Heap_var);
printf ("Address of Heap_var1:%p\n", heap_var1);
...
The output of this piece of code is:
Address of heap_var:0x804a008
Address of heap_var1:0x804a018
The position of the two points differs by 16 bytes, and it can be seen that for malloc (0), 16 bytes are allocated for chunk head, even if the chunk contains a memory size of 0. In the C99 standard, the return of malloc (0) is undefined. A very important information recorded in chunk head is the size of the current chunk. When malloc blocks a chunk, the size of malloc's memory is stored in chunk head, and when released, the Chunk_head of the corresponding block is found by the address pointer, thus knowing the chunk size to be released. That's why we need to specify the size of the allocated memory at malloc, and just give the address pointer to the memory release. If the pointer to free (p) is not obtained by malloc, then malloc will report segmentation fault, or./chunk:free (): Invalid pointer.

Use of 4.5.4 heap physical memory

The use of heaps, like the use of stacks, is the concept of virtual memory. The use of heap physical memory, as well as the stack, employs a deferred allocation strategy. The corresponding physical memory is allocated only when virtual storage is actually used. Such as:
...
int *large_var = malloc (4*1024*1024);
Free (Large_var);
...

Viewing/PROC/PID/STATM, the first column is the virtual memory size, and the second is the amount of physical memory used by the process, in pages (4k).
Before malloc: 342 78 63 1 0 27 0
After malloc; 1367 86 70 1 0 1052 0
After free: 342 85 70 1 0 27 0
As you can see, malloc was not used since Large_var, so although virtual memory increased by more than 1000 pages (about 4M), physical memory only added several pages.

If the program changes to:
...
int *large_var = malloc (4*1024*1024);
memset (large_var,0,4*1024*1024);
Free (Large_var);
...

Review the/PROC/PID/STATM again and the result is:
Before malloc: 343 78 63 1 0 28 0
After malloc: 1368 1110 70 1 0 1053 0
After free: 343 85 70 1 0 28 0
Because the allocated memory is used with memset, it not only adds more than 1000 pages to the virtual storage, but also adds more than 1000 pages to the physical memory accordingly.

4.5.5 Memory leaks
A very important issue in the use of heaps is "memory leaks". That is, malloc comes out of memory, after not used, the user failed to call free release in time. Because the virtual storage is not released, the corresponding physical memory is not released, and the accumulation of memory leaks will eventually drain all of the system's memory. In order to overcome the memory leak problem, Small Pointer, garbage collection and other technologies have been extensively researched and used. But the most effective way is to keep an eye on the problem while writing a program, and handle every malloc operation with care. But a "memory leak" is only a run-time problem, and when the process is finished, the operating system reclaims all the memory allocated to the process.

Summary:
1. Both heap and stack are operations and management of virtual storage.
2. System calls BRK () and mmap () are used to dynamically allocate virtual storage space, which means that these virtual storage addresses are legitimate, and the system should allocate physical memory instead of error when accessing.
3. The nature of the heap is the virtual storage space for dynamic applications. It is theoretically possible to manage this space in any way. But the data structure-"heap" is the most common one, so the space allocated is often called the heap.
4. Unlike stacks, the management of heaps is done in the User function library, and the Malloc/free function is the entrance to the heap.
5. Each allocated memory block size will be recorded, release only need to specify the memory address to be freed. This is why it is necessary to specify the size of malloc when it is free.
6. Heap and stack, the deferred allocation policy for physical memory is still used.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.