This is the actual cat ing distribution. Your results may be different depending on the kernel and the scheduled C library. The latest kernel (2.6.x) has tags, but they cannot be fully dependent.
Heap is basically not allocated to free space for program ing and stack, so it will reduce the available address space, that is, 3 GB minus all ing parts.
How does the map for program A look when it can't allocate more memory blocks? With a trivial change to pause the program (seeLoop. cAndLoop-calloc.c) Just before it exits, the final map is:
What does A look like when A cannot allocate memory blocks? Make a small adjustment to the program and pause it:
0009a000-0039d000 rwxp 0009a000 00:00 0 ---------> (allocated block)The six virtual memory regions VMA reflect memory requests. VMA is a group of memory pages with the same access permissions and can exist anywhere in the user space.
Now you may wonder, why are there six instead of a large area? There are two reasons. First, it is generally difficult to find such a large "hole" in the memory ". Second, the program will not apply for all memory at a time. Therefore, glibc splitters can be freely planned on available pages as needed.
Why am I saying it is on an available page? Memory allocation is based on the page size. This is not an OS limitation, but a feature of the memory management unit MMU. The page size is not certain. Generally, the x86 platform is 4 K. You can get it through the getpagesize () or sysconf () (_ SC _PAGESIZE parameter. Libc distributor manages all pages: it is divided into smaller blocks, assigned to processes, released, and so on. For example, if the program uses 4097 bytes, you need two pages, even though the distribution actually gives you the limit between 5-4 109 bytes.
With 65536 MB of memory and no swap partition, you have available pages. Right? Not all. You need to know that some memory areas are occupied by kernel code and data, and some need to be reserved for emergencies or high priority. Dmesg can display the following information:
$Dmesg | grep-n kernel
36: Memory: 255716 k/262080 k available (2083 k kernel code, 5772 k reserved,
637 k data, 172 k init, 0 k highmem)
171: Freeing unused kernel memory: 172 k freedThe init part of the kernel code and data used during initialization is 172 KB, which will be released by the kernel. This actually occupies 2083 + 5772 + 637 = 8492 bytes .. Actually, 2123 pages are missing. If more kernel features and modules are used, more will be consumed.
The data structure of another kernel is page buffering. Page buffering stores the content of the read block device. The more buffers, the less memory available. However, if the system memory is not enough, the kernel will reclaim the memory occupied by the buffer.
From the perspective of kernel and hardware, the following are very important:
Physical continuity of allocated memory cannot be guaranteed; they are only virtual continuity.
This illusion comes from the address conversion method. In a protected environment, users use virtual addresses, while hardware uses physical addresses. Page Directory and page table are converted. For example, two blocks starting from 0 and 4096 may actually map to 1024 and 8192 addresses.
This makes allocation easier. It is difficult to find continuous blocks. The kernel will find the desired block instead of the contiguous block, and will also adjust the page table to make it look virtual consecutive.
This also has a price. Because the memory block is not continuous, sometimes the buffer of CPU L1 and L2 is insufficient, and the virtual continuous memory is dispersed in different physical buffer lines, which slows down the continuous memory access.
Memory Allocation consists of two steps: Step 1: Expand the length of the memory area, and then allocate pages as needed. This is the on-demand paging. During the VMA extension process, the kernel only checks whether the request overlaps with the existing VMA and whether the range is within the user space. By default, the system will ignore the check for actual allocation.
Therefore, if your application can request and obtain 1 GB of memory, it is no surprise that you only have 16 m and 64Mswap. Everyone is satisfied with this optimistic approach. The kernel has corresponding parameters to adjust the overcommitment.
There are two types of pages: anonymous pages and file pages. When you create an mmap () file on the disk, the file page is generated. The anonymous page is from malloc (). They are not related to files. When the memory is insufficient, the inner core switches anonymous pages out and clears the file pages. In other words, anonymous pages consume swap partitions. The exception is that the mmap () file has the MAP_PRIVATE tag. At this time, file repair only occurs in the memory.
These help you understand how to extend swap as memory. Of course, accessing a page requires it to return to the memory.
Distributor insider
The actual work is completed by the glibc memory distributor. The distributor delivers the block to the program and removes it from the heap of the kernel.
The distributor is the manager, and the kernel is the worker. In this way, we can understand that the greatest efficiency comes from a good distributor rather than the kernel.
Glibc uses an allocator named ptmalloc. wolfram Gloger created it as a modified version of the original malloc library created by Doug Lea. the allocator manages the allocated blocks in terms of "chunks. "Chunks represent the memory block you actually requested, but not its size. there is an extra header added inside this chunk besides the user data.
Glibc uses ptmalloc as the distributor. Wolfram Gloger created this modified version to replace Doug Lea's malloc. The distributor uses chunk to manage all allocated blocks. The chunk represents the actually applied memory block, but it is not that size. There is an additional header information in the block.
The allocator uses two functions to get a chunk of memory from the kernel:
The Allocator uses two functions to obtain the corresponding memory chunk:
Of course, malloc () uses these functions only when there is no chunk in the current pool.
The demo-on whether to use brk () or mmap () requires one simple check. if the request is equal or larger than M_MMAP_THRESHOLD, the allocator uses mmap (). if it is smaller, the allocator callbrk (). by default, M_MMAP_THRESHOLD is 128KB, but you may freely change it by using mallopt ().
Using brk () or mmap () requires a simple check. If the request is greater than or equal to M_MMAP_THRESHOLD, the distributor uses mmap (). If it is smaller than, brk () is used (). By default, M_MMAP_THRESHOLD is 128 K and can be adjusted using mallopt.
In OOM, It is interesting how to release the memory of ptmalloc. Blocks allocated by mmap () are completely released after unmap () is released. blocks allocated by brk () are used as release tags, but they are still under the control of the distributor. If the size of another malloc () request is smaller than or equal to the free chunk. The distributor can merge multiple consecutive free chunks or split them to meet the requirements.
This means that a free chunk may be discarded because it cannot meet the request. The failure of the Free chunk merge will also accelerate the generation of OOM. This is also a sign of bad memory fragmentation.
RestoreWhat should I do if OOM occurs? The kernel terminates a process. Why? This is the only way to terminate further request memory. The kernel does not assume that the process can be terminated automatically. The only option is to kill the process.
How does the kernel know who to kill? The answer isMm/oom_kill.cSource code. The so-called OOM killer uses the badness function () to measure the score of an existing process. The highest score is the victim. The following are the scoring criteria:
VM Size. This is not the size of all allocation pages, but the total number of VMA owned by the process. The larger the size, the higher the score.
The VM Size of sub-processes is also important. This count is cumulative.
If the process priority is less than 0 (nice), the score is high.
Superuser processes are assumed to be more important and therefore have a low score.
When the process is running. The longer the time, the lower the score.
The process can be immune from direct hardware access.
Swapper, init, and other kernel threads are immune.
The process wins the election with the highest score and is then killed.
This mechanism is not perfect, but it is basically effective. Standard 1 and 2 clearly indicate the importance of the VMA size, rather than the actual number of pages. You may think that the VMA size may cause false alarms, but it does not. Badness () calls occur in the page assignment function. when only a few free pages fail to be recycled, this value is basically close to the number of pages owned by the process.
Why not count the actual number of pages? Because this requires more time and more locks, it also increases the overhead of quick judgment. Therefore, OOM is not perfect, and it may also cause an error.
The kernel uses the SIGTERM signal to notify the target process to shut down.
How to Reduce OOM risksSimple rule: do not allocate memory that exceeds the actual idle size. However, there are many factors that affect the results, so the strategy should be more refined:
Reduce fragments through ordered allocationAn advanced distributor is not required. You can reduce fragments by orderly allocation and release. Use the LIFO policy: the first release of the last allocation.
For example, the following code:
Void *;
Void * B;
Void * c;
............
A = malloc (1024 );
B = malloc (5678 );
C = malloc (4096 );
......................
Free (B );
B = malloc (12345 );
Can be changed:
A = malloc (1024 );
C = malloc (4096 );
B = malloc (5678 );
......................
Free (B );
B = malloc (12345 );
In this way, no vulnerability exists between chunks a and c. You can also consider using realloc () to adjust the size of the generated malloc () block.
The two examples demonstrate this effect. At the end of the program, the number of memory bytes allocated by the system (kernel and glibc distributor) and the actual amount of memory used will be reported. For example, on Kernel 2.6.11.1 and glibc2.3.3.27, 319858832 bytes (about 305 MB) are wasted without the fragmented1 parameter, and fragmented2 wastes 2089200 bytes (2 MB). 152 times!
You can further experiment and pass the results of various parameters. The parameter is the request size of malloc.
Adjust overcommit behavior of the kernelYou can change the behavior of the Linux kernel through/ProcFilesystem, as allowed ented inDocumentation/vm/overcommit-accountingIn the Linux kernel's source code. You have three choices when tuning kernel overcommit, expressed as numbers in/Proc/sys/vm/overcommit_memory:
You canDocumentation/vm/overcommit-accountingChange the Linux kernel behavior through the/proc directory configuration.
There are three options:
0 indicates that the default mode is used to determine whether to overcommit.
1 means always overcommit. Now you should know how dangerous it is.
2. Avoid overcommit. Adjustable/Proc/sys/vm/overcommit_ratio. The maximum commitment value is swap + overcommit_ratio * MEM.
Generally, it is enough by default, but Mode 2 provides better protection. Correspondingly, Mode 2 also requires you to carefully estimate program requirements. You certainly don't want the program to be executed because it cannot be executed. Of course, this can also avoid being killed.
Check NULL pointer after memory allocation, audit Memory leakageThis is a simple rule, but it is easy to ignore. Check NULL to know that the Allocator can expand the memory area, although it is not guaranteed to be able to allocate the required pages. Generally, you need to guarantee or deduct the allocation, depending on the situation. In combination with overcommit, malloc () will return NULL because it is unable to apply for a free page, thus avoiding OOM.
Memory leakage is unnecessary memory consumption. The application will no longer track leaked memory blocks, but the kernel will not recycle them, because the kernel thinks the program is still in use. Valgrind can be used to track this phenomenon.
Always query memory allocation statisticsThe Linux Kernel provides/proc/meminfo to find memory status information. The top free vmstat information is here.
What you need to check is free and reusable memory. Free to explain, but what is recyclable? This refers to buffer and page cache. When the memory is insufficient, the system can write it back to the disk for recovery.
$Cat/proc/meminfo
MemTotal: 255944 kB
MemFree: 3668 kB
Buffers': 13640 kB
Cached: 171788 kB
SwapCached: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 255944 kB
LowFree: 3668 kB
SwapTotal: 909676 kB
SwapFree: 909676 kBBased on the above output, the free virtual memory is MemFree + Buffers + Cached + SwapFree.
I failed to find any formalized C (glibc) function to find out free (including reclaimable) memory space. the closest I found is by using get_avphys_pages () or sysconf () (with the_ SC _AVPHYS_PAGES parameter ). they only report the amount of free memory, not the free + reclaimable amount. I cannot find a formal C function to find free (including recyclable) memory space. The closest is get_avphys_pages () or sysconf () (add the _ SC _AVPHYS_PAGES parameter). They only report the total amount of free memory rather than the amount of free memory that can be recycled.
This means that for accurate information, you need to parse/proc/meminfo and calculate it yourself. If you are lazy, refer to procps source code. It includes the ps top free tool.
Experiments on other memory SplittersDifferent splitters use different methods to manage the memory chunk. Hoard is an example. Emery Berger from the University of mascript usetts uses it for high-performance memory allocation. It is used in multi-threaded programs and introduces the concept of heap per CPU.
Use a 64-bit PlatformUsers who need a larger user address space can consider 64-bit computing. The kernel no longer uses the 3:1 method to separate VMS. Therefore, it is suitable for machines larger than 4 GB memory.
This has nothing to do with the extended address. For example, INTEL's PAE allows 32-bit processors to address 64 GB memory. The address is a physical address and has nothing to do with the user. In the virtual address area, users still use 3 GB. The excess memory can be accessed, but not all can be mapped to the address space. Areas that cannot be mapped are not available.
Consider using the packaging type in the StructurePacked attributes can help to squeeze the size of structs, enums, and unions. this is a way to save more bytes, especially for array of structs. here is a declaration example: the packaged attributes can compress the size of struct enum and union. This can save a lot of money on struct.
Struct test
{
Char;
Long B;
} _ Attribute _ (packed ));This trick is that it causes non-alignment of each row and consumes more CPU cycles. Alignment means that the address of the variable is an integer multiple of the original address of the data type. The data-based Access frequency is slower, but the relevance between sorting and buffering is considered.
Use ulimit () in user processes ()
You can use ullmit-v to limit the memory address space of mmap. After the upper limit is reached, mmap () and malloc () will return 0, so OOM will not start. It is very useful for multi-user systems, because it can prevent the innocent.