Linux VM Run Parameters-Overcommit related parameters

Source: Internet
Author: User
Tags documentation goto

First, preface

Can finally enter the world of Linux kernel memory management, but where to start is a problem, when faced with a complex system, sometimes do not know how to start. Adhering to the "all-people-oriented" principle, I finally chose to look at the kernel's memory management from a userspace perspective. The first series of articles selected the topic of VM Run parameters. Execute LS/PROC/SYS/VM command, you can see all the VM running parameters, this article selected the Overcommit related parameters to introduce.

The code for this article comes from the 4.0 kernel.

Second, background knowledge

To understand this type of parameter, first understand what is committed virtual memory? Engineers using version management tools are familiar with the meaning of commit, which is to submit their own updates to the code repository, and for this scenario, it is actually the request of each process to submit its own virtual address space. Although we always claim that each process has its own separate address space, but vegetarian, these address spaces are virtual addresses, like a mirror flower, water in the month. When a process needs memory (for example, allocating memory through BRK), the process obtains only a virtual address from the kernel instead of the actual physical address, and the process does not acquire physical memory. The actual physical memory is only when the process actually accesses the newly acquired virtual address, resulting in a "page fault" exception, thus entering the process of assigning the actual physical address, that is, assigning the actual page frame and establishing the page table. The system then returns the address that generated the exception and re-executes the memory access, as if nothing had happened. Therefore, it seems that the allocation of virtual memory and physical memory is split, does this mean that the process can arbitrarily request virtual address space? No, after all, virtual memory needs to be physical memory as a support, if the allocation of too much virtual memory, and a disproportionate amount of physical RAM, performance will be affected. For this situation, we call it overcommit.

Three, the parameter introduction

1, Overcommit_memory. Overcommit_memory This parameter is used to control the kernel's strategy for overcommit. The values that the parameter can set include:

#define OVERCOMMIT_GUESS 0
#define OVERCOMMIT_ALWAYS 1
#define OVERCOMMIT_NEVER 2

Overcommit_always says the kernel does not limit overcommit, regardless of how many address space applications The process commits, go ahead,do what do you like, but the consequences need your own responsibility. Overcommit_never is another extreme, never overcommit. Overcommit_guess's strategy, like its name, is "you guess", how naughty the setting is (further described later in the Code analysis). BTW, I don't really like the name of this parameter, more accurate naming should be similar to vm_overcommit_policy something, probably the reason for history, Linux kernel has always been to keep this symbol.

2, Overcommit_kbytes and Overcommit_ratio

Overcommit_always can be very willful, always allow the appearance of overcommit phenomenon, but overcommit_never not, this strategy, the system does not allow the emergence of overcommit. But to check the overcommit, how to judge, there must be a standard bar, this standard can be seen from the Vm_commit_limit function:

unsigned long vm_commit_limit (void)
{
unsigned long allowed;

if (sysctl_overcommit_kbytes)
allowed = Sysctl_overcommit_kbytes >> (page_shift-10);--convert count units from KB to PAGE
Else
Allowed = ((Totalram_pages-hugetlb_total_pages ())
* sysctl_overcommit_ratio/100);
Allowed + = Total_swap_pages;

return allowed;
}

Overcommit Standard has two ways to set, one is the direct definition of overcommit_kbytes, when the standard value is overcommit_kbytes+total_swap_pages. What is Total_swap_pages? Here's a little bit about the page-frame reclaim mechanism.

In terms of the allocation strategy for virtual memory and physical memory, the allocation of Inux kernel to the virtual address space is relatively lenient (although there is a overcommit mechanism), but kernel the physical memory request for the user space (creating a user-space process, User space program malloc (that is, heap allocation), user space process stack allocation, etc.) is very stingy (by the Way: memory management module on the memory from the kernel application is generous, the kernel Engineer's pride arises, hehe ~ ~), always a variety of obstacles, There's no way to allocate physical memory until the last minute. The idea behind this mechanism is to use memory better, that is to say, under limited physical memory resources, you can try to make more user space processes run up. If the physical address and the virtual address space is a one by one mapping, then the number of processes that can be started in the system is bound to be limited, the number of memory processes can request is also limited, your program has to often face memory allocation failed issue. If you want to solve this problem, you need to map a smaller physical memory space onto a virtual address space consisting of a larger set of individual user processes. What to do, the simplest way is to "pay Paul". Thanks to the program is inherently local principle, you can let the core can be demolished East wall, but Rob Peter (swap out) is also a technical work, not all the process of virtual space can be dismantled. For example, the contents of the body of the program can be disassembled, because the contents of the memory in the disk is supported by the program, when needed again (pay Paul), you can reload from the program file on disk. Not all process address space is file-backup, heap, stack the virtual address segments of these processes are not disk files corresponding to, that is, the legendary anonymous page. For anonymous page, if we create swap file or swap device, then these anonymous page can also be swapped to disk, and load into memory when needed.

OK, we go back to total_swap_pages this variable, it is actually the system can swap anonymous page to disk size, if we create 32MB swap file or swap device, then TOTAL_SWAP_ Pages is (32m/page size).

Another standard for the Overcommit standard (used when the overcommit_kbytes is set to 0) is related to the page frame that the system can use. Not how much physical memory in the system, how many totalram_pages, in fact, a lot of page is not used, such as the body of the Linux kernel itself, data segments can not be counted into totalram_pages, There are also some system reserve page does not count, the final totalram_pages is actually the system can manage the total amount of memory allocated. Overcommit_ratio is a percentage of the number, 50 means that 50% of totalram_pages can be used, and of course there are a number of total_swap_pages to consider, as described above.

There is also a small detail related to huge page, the traditional 4K page and huge page selection is also a balance problem. The normal page can manage memory segments flexibly and waste less. However, it is not suitable for the management of large segments of virtual memory (because the TLB side is limited to create a large number of page tables, resulting in TLB miss, which affects performance), huge page and normal page are reversed. The kernel can support both of these mechanisms at the same time, but they are managed separately. The parameters we describe in this section are all related to the normal page, so subtract hugetlb_total_pages when you calculate the allowed page.

3, Admin_reserve_kbytes and User_reserve_kbytes

Do anything to leave leeway, do not push yourself to the desperate. Both of these parameters prevent the memory management module from forcing itself into a desperate situation.

Above, we refer to the mechanism of pay Paul, but in some cases the mechanism does not work properly. For example, when process a accesses its own memory, page fault appears, and through scan, other processes (B, C, D ... "East wall" is removed and assigned to process A to allow a to function properly. It is important to note that the "Rob Peter" is not so simple and may require disk I/O operations (e.g., flush the dirty page cache to disk). However, the system soon dispatched to the B process, and the B process immediately required the east wall just dismantled, what to do? b process immediately need to allocate physical memory, if there is no free memory, this time can only start the scan process, continue to find a new east wall. In extreme cases, it is possible to remove the west wall just mended, this time, the overall system performance will be significantly reduced, sometimes, the user clicks a button, it is likely to half a day to respond.

In this case, of course, the user wants to recover, such as kill that consumes a lot of memory of the process. This operation also requires memory (which requires a fork process), so the system retains user_reserve_kbytes memory in order to allow the user to escape the desperate situation.

For Gnu/linux systems that support multiple users, the recovery system may require root for completion, which requires the retention of a certain amount of memory to support root user login operations, root trouble shooting (using commands such as Ps,top), Find out the process of the riot and kill it. These memory values reserved for root user operation are defined in the Admin_reserve_kbytes parameter.

Iv. Code Analysis

User space process when using memory (more accurately allocate virtual memory, in fact, the user space can not touch the allocation of physical memory, that is the domain of the kernel), the kernel will call the __vm_enough_memory function to verify whether this virtual memory can be allocated, the code is as follows:

int __vm_enough_memory (struct mm_struct *mm, long pages, int cap_sys_admin) {
......
if (sysctl_overcommit_memory = = overcommit_always)------(1)
return 0;

if (sysctl_overcommit_memory = = overcommit_guess) {
Free = Global_page_state (nr_free_pages);
Free + = Global_page_state (nr_file_pages);
Free-= Global_page_state (NR_SHMEM);

Free + = Get_nr_swap_pages ();
Free + = Global_page_state (nr_slab_reclaimable); ---(2)


if (free <= totalreserve_pages)------------------(3)
Goto error;
Else
free-= totalreserve_pages;


if (!cap_sys_admin)----------------------(4)
Free-= Sysctl_admin_reserve_kbytes >> (page_shift-10);

if (Free > pages)------------------------(5)
return 0;

Goto error;
}

allowed = Vm_commit_limit (); -------------------(6)
if (!cap_sys_admin)
Allowed-= Sysctl_admin_reserve_kbytes >> (page_shift-10); --Explanation of the reference (4)


if (mm) {----------------------------(7)
Reserve = Sysctl_user_reserve_kbytes >> (page_shift-10);
Allowed-= min_t (long, MM->TOTAL_VM/32, reserve);
}

if (percpu_counter_read_positive (&vm_committed_as) < allowed)-----(8)
return 0;

......
}

(1) Overcommit_always is spicy freedom, as you overcommit, as long as you like. Return 0 indicates that there is currently sufficient virtual memory resources.

(2) overcommit_guess is actually to let the kernel itself according to the current situation to judge, so first into the gathering information phase, to see how many free page frame (nr_free_pages tag, located in buddy system of free List), these are high-quality resources that can be used without any overhead. Nr_file_pages is the page frame used by page cache, which is mainly caused by the user-space process reading and writing files. These caches are added to speed up system performance, so if you are working directly to disk, these page caches are essentially free. However, here is a special case is NR_SHMEM, it is mainly used for inter-process share memory mechanism, these SHMEM page frame can not be considered free, so subtract. The Get_nr_swap_pages function returns the number of free "page frames" on swap file or swap device. Essentially, the disk space on swap file or swap device is for anonymous page to do maneuvers, in fact, the "page frame" Here is not the real page frame, we call the Swap page good. The Get_nr_swap_pages function returns the number of free swap page. The number of the free Swap page is also counted in this case because the page frame that you are using can be out of the swap page, so it's a little more expensive. As for nr_slab_reclaimable, it should be included in the free page, because SLAB objects have already labeled their reclaim, of course, the free page.

(3) to explain that totalreserve_pages takes too long, we skip here, but this is a number of page frames that will allow the system to run, so we'll subtract from Totalreserve_pages. If the current free page number is less than totalreserve_pages, then of course the VM request is rejected.

(4) If it is a normal process, you also need to keep the free page of admin_reserve_kbytes so that the root user can log in and perform recovery operations when the problem happens.

(5) The most critical comes, compared to the number of page for this application, and the current "free" (the reason is quoted and the number of really free page frame), if after retaining enough page frame, There are also enough page to satisfy this allocation, then approve the allocation of this VM.

(6) Starting from here, enter the Overcommit_never processing. From the Vm_commit_limit function can obtain a basic judgment Overcommit standard, of course, according to the specific circumstances of the adjustment, such as said Admin_reserve_kbytes.

(7) If it is the process of user space, we also want to be able to restore the user from the desperate situation and keep some page frame, the number of specific reservations need to consider two factors, one is the total virtual memory of a single process, a user-defined runtime parameters User_reserve _kbytes. More specific considerations can be referred to https://lkml.org/lkml/2013/3/18/812, here will not repeat.

(8) The allowed variable holds the upper limit for judging Overcommit, Vm_committed_as saves the number of virtual memory that have been applied (including this time) in the current system. If more than this limit to judge Overcommit, this time the application of virtual memory failed.

V. References

1, documentation/vm/overcommit-accounting

2, Documentation/sysctl/vm.txt

Linux VM Run Parameters-Overcommit related parameters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.