Detailed analysis of the oom mechanism of the Linux kernel

Source: Internet
Author: User
Tags system log

Http://blog.chinaunix.net/uid-29242873-id-3942763.html

The Linux kernel has a mechanism called Oom Killer (Out-of-memory Killer), which monitors those processes that consume too much memory, especially those that consume large amounts of memory in a flash, and the kernel kills the process in order to prevent memory exhaustion. Typical situation is: one day a machine suddenly ssh telnet, but can ping, that is not the network fault, because the sshd process was Oom killer killed (many times encountered such a situation of suspended animation). Viewing the system log after restarting the machine/var/log/messages will find an out-of-Memory:kill process 1865 (SSHD) similar error message.

Prevents the important system process triggering (OOM) mechanism from being killed: You can set the parameter/proc/PID/oom_adj to 17, which temporarily shuts down the oom mechanism of the Linux kernel. The kernel calculates a score for each process using a specific algorithm to determine which process to kill, and the Oom score for each process can be found in the/proc/PID/oom_score. Our operations are generally protected by sshd and some management agents.

Protecting a process from being killed by the kernel can be done like this:

Click ( here) to collapse or open

ECHO-17 >/proc/$PID/oom_adj

How to prevent sshd from being killed, you can do this:

Click ( here) to collapse or open

Pgrep-f "/usr/sbin/sshd" | While read Pid;do echo-17 >/proc/$PID/oom_adj;done

It is more secure to add such a scheduled task to a scheduled task:

Click ( here) to collapse or open

#/etc/cron.d/oom_disable

*/1**** root pgrep-f "/usr/sbin/sshd" | While read Pid;do echo-17 >/proc/$PID/oom_adj;done

To avoid a restart failure, you can write to the/etc/rc.d/rc.local

Click ( here) to collapse or open

ECHO-17 >/proc/$ (pidof sshd)/oom_adj

As for why 17 is used instead of other values (the default is 0), this is defined by the Linux kernel, and the kernel source code is known:
To linux-3.3.6 version of the kernel source code for example, the path is linux-3.6.6/include/linux/oom.h, read the kernel source Oom_adj can be nice value 15 to 16, of which 15 the largest-16 min,- 17 The use of oom is prohibited. Oom_score is calculated for the N-time of 2, where n is the Oom_adj value of the process, so the higher the Oom_score score, the higher the kernel will kill.


Of course, you can also disable the oom mechanism by modifying kernel parameters

Click ( here) to collapse or open

# sysctl-w Vm.panic_on_oom=1
Vm.panic_on_oom = 1//1 means off, default = 0 means open oom

# sysctl-p
to verify the effect of the oom mechanism, we might as well do a test.

First look at my system's existing memory size, yes 96G more, physically more than the value of the view.

Look at the current process is the largest, top view, I currently only run two Java program process, respectively 4.6G, and then the Redis process to eat 21M,ISCSI services accounted for the 32M,GDM accounted for 25m, the other process is a few m just.

Now I write a C by myself called Bigmem program, I specify that the program allocates memory 85G

Click ( here) to collapse or open

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <string.h>
  4. #define PAGE_SZ (1<<12)
  5. int main () {
  6. int i;
  7. int GB = 85; Allocate memory size in gigabytes
  8. for (i = 0; i < ((unsigned long) gb<<30)/PAGE_SZ; ++i) {
  9. void *m = malloc (PAGE_SZ);
  10. if (!m)
  11. Break
  12. memset (M, 0, 1);
  13. }
  14. printf ("Allocated%lu mb\n", ((unsigned long) I*PAGE_SZ) >>20);
  15. GetChar ();
  16. return 0;
  17. }
Oh, the effect is obvious, and then after the implementation of the top view, ranked in the first place is my bigmem,res is physical memory, has eaten full 85G.

Continue to observe, when the Bigmem stability is maintained at 85G a time, the kernel will automatically kill its process, the growth process has not been killed, if you do not want to be killed can be executed

Click ( here) to collapse or open

Pgrep-f "Bigmem" | while read PID; Do echo-17 >/proc/$PID/oom_adj;done

Before and after executing the above command, obviously will compare the effect, can realize the kernel oom mechanism's actual function.



If you feel the trouble of writing C code, I tell you that another simplest test to trigger Oom is to set the Oom_adj of a process to 15 (maximum), which is most likely to trigger. Then execute the following command:

Click ( here) to collapse or open

    1. echo F >/proc/sysrq-trigger//' F '-would call Oom_kill to kill a memory hog process.
Below I'll trigger Mysqld's oom look:


It is important to note that this test, just simulating oom, does not really kill the process

Click ( here) to collapse or open

    1. Ps-ef | grep mysqld | Grep-v grep
View the MySQL process and discover that it still exists


Attention:

1.kernel-2.6.26 Previous version of the Oomkiller algorithm is not accurate enough, RHEL 6.x version of 2.6.32 can solve this problem.

2. The child process inherits the Oom_adj of the parent process.

3.OOM is not suitable for resolving memory leak problems.

4. Sometimes free view also has sufficient memory, but still triggers oom, because the process may occupy a special memory address space.

Detailed analysis of the Linux kernel oom mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.