Oom killer mechanism in Linux

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In Linux, there is an out of memory killer feature, which will jump out when the system memory is used up and selectively kill some processes to release some memory. I believe that the majority of migrant workers who are engaged in programming on the Linux server have more or less met each other ). A typical scenario is that one day the machine suddenly fails to be pinged, but ssh cannot be connected. The reason is that the sshd process was killed by OOM killer (burst into tears ). After the machine is restarted, you can view the system logs and find the bloody out of memory: killed process ××and out of memory: killed process 〇. A mess is terrible.

I was abused once by OOM killer some time ago. This is the case. In a simple distributed system, symmetric nodes such as A, B, and C process the data spit out by node X and then insert the result to node y. The result is a large load. ABC cannot handle it. The data is accumulated in the memory, and the memory is exhausted, forcing OOM killer to clean up the situation. sshd is killed. What's more, after restarting, it is found that, syslogd is also killed and killed before sshd. You can only find a part of the log. Bloody. This also raises the problem in the design of the distributed system: think of the above design, to prevent a node in ABC from suddenly hanging up, then, the load on other nodes suddenly increases, and the node is also suspended (chained ).

Back to the question, what kind of mechanism is OOM killer in Linux? When will it jump out and choose those processes to start.

When do I jump out?

First, let's look at the first question. When will it jump out. Does it jump out when malloc returns NULL? No, there is a paragraph in the manpage of malloc:

By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc () returns non-null there is no guarantee
That the memory really is available. This is a really bad bug. In
Case it turns out that the system is out of memory, one or more processes
Will be killed by the infamous OOM killer. In case Linux is
Employed under circumstances where it wocould be less desirable to suddenly
Lose some randomly picked processes, and moreover the kernel version
Is sufficiently recent, one can switch off this overcommitting
Behavior using a command like:

# Echo 2>/proc/sys/Vm/overcommit_memory

The above section tells us that the non-null pointer returned by malloc in Linux does not necessarily mean that the memory to be pointed to is available. In Linux, the program can apply for more memory than the available system memory, this feature is called overcommit. This is to optimize the system, because not all programs apply for memory and use it immediately. When you use it, the system may have recycled some resources. Unfortunately, when you use the overcommit for your memory and the system has no resources, oom killer jumps out.

There are three overcommit policies in Linux (refer to the kernel documentation: Vm/overcommit-accounting), which can be configured in/proc/sys/Vm/overcommit_memory. Take values 0, 1 and 2. The default value is 0.

0: heuristic strategy. A serious overcommit will not be able to succeed. For example, you suddenly applied for TB of memory. A slight overcommit will be allowed. In addition, the value of root overcommit is slightly higher than that of normal users.

1: Always allow overcommit. This policy is suitable for applications that cannot withstand memory allocation failures, such as some scientific computing applications.

2: overcommit is always disabled. In this case, the memory allocated by the system will not exceed the swap + Ram * coefficient (/proc/sys/Vm/overcmit_ratio, 50% by default, you can adjust). If so many resources are used up, any subsequent attempt to apply for memory will return an error, which usually means no new programs can be run at this time.

Supplement (to be verified): In this article: Memory overcommit in Linux, the author mentioned that the heuristic policy takes effect only when the smack or SELinux module is enabled, in other cases, the policy is always allowed.

Select the process policy after jumping out.

Well, as long as overcommit exists, oom killer may jump out. So what is the target selection policy after OOM killer jumps out? What we expect is that useless and memory-consuming programs are shot.

In Linux, this selection policy is constantly evolving. As a user, we can set some values to influence OOM killer's decision-making. In Linux, each process has an OOM weight./Proc/<Pid>In/oom_adj, the values are-17 to + 15. The higher the value, the easier it is to be killed.

Oom killer eventually passes/Proc/<Pid>/Oom_score to determine which process is killed. This value includes the memory consumption, CPU time (utime + stime), survival time (uptime-start time), andThe more memory consumption, the higher the score, the lower the survival time. In short, the general strategy is: the minimum loss of work, the maximum amount of memory released at the same time does not hurt the innocent use of a large memory of the process, and the number of killed processes as few as possible.

In addition, Linux calculates half of the memory consumed by sub-processes to the parent process at the same time. In this way, the processes with many sub-processes should be careful.

Of course there are other strategies. You can refer to the article: Taming the OOM killer and when Linux runs out of memory.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Oom killer mechanism in Linux

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Oom killer mechanism in Linux

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support