Linux VM Run Parameters-Oom-related parameters

Last Update:2018-02-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, preface

This article is a second article describing the operating parameters of Linux virtual memory, mainly about Oom-related parameters. In order to understand the Oom parameter, the second chapter simply describes what is oom. If this noun is not pressure on you, you can directly into the third chapter, this chapter is to describe the specific parameters, in addition to describing the specific parameters, we quoted some specific kernel code, the code of this article from the 4.0 kernel, if interested, can be combined with code to read, in order to reduce space, the code in the article is truncated version. According to the Convention, the last chapter is the reference literature, the reference of this article is from the Linux kernel documentation directory, the directory has a large number of documents can be consulted, each one deserves to savor.

Ii. What is Oom

Oom is the abbreviation for out of memory, although Linux kernel has a lot of RAM management skills (recovering from cache, swap out, etc.) to meet the VM memory requirements of a variety of application space, but when your system is not properly configured to make a small horse cart, The Linux kernel runs very slowly and is run out of memory and cannot be allocated when assigning a page frame at a certain point in time. To deal with this situation should be the system administrator first, he needs to add memory to the system, but for kernel, when faced with Oom, we can not panic, according to the Oom parameters to deal with the corresponding.

Three, oom parameters

1, Panic_on_oom

When kernel encounters an oom, there are two options:

(1) Produce kernel panic (is dead to you to see).

(2) Actively face life, choose one or a few of the most "fit" process, start oom killer, kill those selected processes, release memory, let the system bravely live.

Panic_on_oom This parameter is the control of how the system reacts when it encounters an oom. When this parameter equals 0, the choice is to actively face life and start Oom killer. When the parameter equals 2, the kernel panic is forced into either case. When Panic_on_oom equals other values, it means to differentiate between specific situations, which can be panic for certain situations, and some cases to start oom killer. In kernel's code, enum Oom_constraint is a parameter that further describes the Oom state. There are always a variety of situations when the system encounters Oom, as defined in kernel:

Enum Oom_constraint {
Constraint_none,
Constraint_cpuset,
Constraint_memory_policy,
CONSTRAINT_MEMCG,
};

For Uma, oom_constraint is always constraint_none, indicating that the system has no constraints on the appearance of Oom, do not think too much, is not enough memory. In the case of NUMA, it is possible to attach additional constraints that cause the system to experience an oom state, and in fact, there is sufficient memory in the system. These constraints include:

(1) Constraint_cpuset. Cpusets is a mechanism in kernel that allows a set of CPU and memory node resources to be assigned to a specific set of processes. At this point, if an oom occurs, only to indicate that the process can allocate memory of the node, the whole system has a lot of memory node, the other node may have sufficient memory resources.

(2) Constraint_memory_policy. Memory policy is the policy module for how to control the allocation of each memory node resource in a NUMA system. User-space programs (numa-aware programs) can develop policies for the entire system, for a specific process, and for a specific VMA of a particular process, through the memory policy API. Having an oom may also be caused by the addition of a memory policy constraint, in which case it might seem a bit inappropriate to panic the entire system.

(3) constraint_memcg. MEMCG is the memory control Group,cgroup this thing is too complex, here is not suitable to say, cgroup in the memory subsystem is the control system memory resource allocation controller, the popular will be a set of processes in the use of limited to a range. When this group of memory usage exceeds the upper limit it will be oom, in which case the oom is the constraint_memcg type of oom.

OK, we'll take a look at the kernel code after we get the basics. The sysctl_panic_on_oom variable in the kernel corresponds to the/proc/sys/vm/panic_on_oom, and the main judgment logic is as follows:

void Check_panic_on_oom (enum oom_constraint constraint, gfp_t gfp_mask,
int order, const nodemask_t *nodemask)
{
if (likely (!sysctl_panic_on_oom))----0 indicates the start of Oom killer, so return directly
Return
if (sysctl_panic_on_oom! = 2) {----2 is mandatory panic, not 2, you can also discuss
if (constraint! = Constraint_none)---in the case of Cpuset, memory policy, MEMCG constraints
Return Oom, can consider not panic, but start Oom killer
}
Dump_header (NULL, gfp_mask, order, NULL, nodemask);
Panic ("Out of Memory:%s panic_on_oom is enabled\n",
Sysctl_panic_on_oom = = 2? "Compulsory": "System-wide");---I'm dying to see you.
}

2, Oom_kill_allocating_task

When the system chooses to start the Oom killer and tries to kill some of the processes, it encounters the problem: Which one is the "right" process? The system can have the following options:

(1) who triggered the oom to kill who?

(2) who is the most "bad" to kill?

Oom_kill_allocating_task This parameter is the control of the selection path, when the parameter equals 0 when the selection (2), otherwise select (1). The specific code can be referenced in the __out_of_memory function, specifically as follows:

static void __out_of_memory (struct zonelist *zonelist, gfp_t Gfp_mask,
int order, nodemask_t *nodemask, bool Force_kill) {

......
Check_panic_on_oom (constraint, gfp_mask, order, mpol_mask);

if (sysctl_oom_kill_allocating_task && current->mm &&
!oom_unkillable_task (Current, NULL, Nodemask) &&
Current->signal->oom_score_adj! = oom_score_adj_min) {
Get_task_struct (current);
Oom_kill_process (Current, gfp_mask, order, 0, TotalPages, NULL,
Nodemask, "Out of Memory (Oom_kill_allocating_task)");
Goto out;
}

......
}

Of course, you can not say kill to kill, or to consider whether the user space process (cannot kill kernel threads), whether unkillable task (such as the INIT process can not kill), whether the user space through the set parameters (Oom_score_adj) to prevent kill the task. If everything is ready, then call oom_kill_process to kill the current process.

3, Oom_dump_tasks

When the system's memory is in an oom state, either panic or the Oom killer, as the system administrator, you want to keep the next clue and find the root cause of oom, such as all the user-space processes in the dump system for memory-related information, including: Process identity information, The total virtual memory information used by the process, the process actually uses physical RAM (we also call Rss,resident Set Size, not just the physical memory that our program uses, but also the memory consumed by the shared library), the page table information for the process, and so on. When you get this information, it helps to understand the truth after the phenomenon (Oom).

When set to 0, the memory information of the various processes described in the previous paragraph will not be printed. In a large system, there are thousands of processes that print the memory information of each task one at a time, potentially causing performance problems (knowing that it was already oom). When set to a value other than 0, Dump_tasks is called in the following three cases to print the memory status of all tasks in the system:

(1) due to oom causing kernel panic

(2) failed to find a suitable "bad" process

(3) When you find the right fit and kill it.

4, Oom_adj, Oom_score_adj and Oom_score

These parameters are precisely related to the specific process, so they are located in the/proc/xxx/directory (XXX is the process id). Assuming we choose to kill the process in the event of an oom situation, a natural problem arises: which one to kill? The kernel algorithm is very simple, that is the score (Oom_score, note that the parameter is read only), find the highest scores on OK. So how do we calculate the score? You can refer to the Oom_badness function in the kernel:

unsigned long oom_badness (struct task_struct *p, struct Mem_cgroup *memcg,
Const nodemask_t *nodemask, unsigned long totalpages)
{......

Adj = (long) p->signal->oom_score_adj;
if (adj = = oom_score_adj_min) {----------------------(1)
Task_unlock (P);
Return 0;---------------------------------(2)
}

Points = Get_mm_rss (p->mm) + get_mm_counter (p->mm, mm_swapents) +
Atomic_long_read (&p->mm->nr_ptes) + MM_NR_PMDS (p->mm);---------(3)
Task_unlock (P);

if (Has_capability_noaudit (p, cap_sys_admin))-----------------(4)
Points-= (points * 3)/100;

Adj *= totalpages/1000;----------------------------(5)
points + = adj;

Return points > 0? points:1;
}

(1) Scoring a task (Oom_score) is mainly composed of two parts, part of which is the system score, mainly based on the memory usage of the task. The other part is the user score, that is, the Oom_score_adj, the task's actual score needs to consider two aspects of the score. If the user sets the Oom_score_adj of the task to Oom_score_adj_min (-1000), then the Oom killer is actually forbidden to kill the process.

(2) return here 0 is to inform Oom killer, the process is "good process", do not kill it. We can see from the back that the lowest score is 1 points when actually calculating the scores.

(3) As mentioned earlier, the system score is to look at the physical memory consumption, mainly three parts, the RSS part, swap file or swap device on the memory situation and the page table occupied memory.

(4) The root process has 3% of the memory usage privileges, so subtract those memory usage.

(5) The user can adjust the Oom_score, how to operate it? The value range of the Oom_score_adj is -1000~1000,0 indicates that the user does not adjust the oom_score, and negative values indicate that a discount is subtracted from the actual score, and positive values indicate that the task is to be punished, that is, to increase the oom_score of the process. In the actual operation, you need to allocate memory according to the memory allocation time to calculate (if there is no memory allocation constraints, then all the available memory in the system, if the system supports Cpuset, then the allocated memory here is the actual amount of the Cpuset value). The Oom_badness function has an incoming parameter, totalpages, which is the upper-bound value of the memory that was allocated at that time. The actual fractional value (points) is adjusted according to the OOM_SCORE_ADJ, for example, if Oom_score_adj is set to 500, then the actual score is 50 percent (base is totalpages), That is, the actual memory used by the task is subtracted by half of the upper-bound value of the allocated memory.

After understanding the Oom_score_adj and oom_score, it should be the dust settles, Oom_adj is an old interface parameter, its function is similar to OOM_SCORE_ADJ, in order to be compatible, currently still retains this parameter, when operation this parameter, Kernel in fact will be converted into OOM_SCORE_ADJ, interested students can learn by themselves, here is no longer detailed.

Iv. Reference Documents

1, Documentation/vm/numa_memory_policy.txt

2, Documentation/sysctl/vm.txt

3, Documentation/cgroup/cpusets.txt

4, Documentation/cgroup/memory.txt

5, Documentation/filesystems/proc.txt

Linux VM Run Parameters-Oom-related parameters

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux VM Run Parameters-Oom-related parameters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux VM Run Parameters-Oom-related parameters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support