This section describes the running and structure of OOM killer.
The out of memory (OOM) killer function in Linux serves as the final means to ensure the memory. After the system memory or swap zone is exhausted, it can send signals to the process and forcibly terminate the process.
This function can ensure that the memory processing process is repeated even if the memory cannot be released to prevent system stagnation. You can also find out the processes that consume excessive memory. This section describes the 2.6 kernel OOM killer.
Confirm run and log
During system verification or load testing, sometimes a running process is terminated or the SSH connection is suddenly disconnected, and a new login attempt cannot be connected.
In this case, you need to view the log. Sometimes the following kernel information is output.
1. PID: 4629, COMM: stress not tainted 2.6.26 #3
2. Call trace:
3. [<ffffff80265a2c>] oom_kill_process + 0x57/0 x1dc
4. [<ffffff80238855>] _ capable + 0x9/0x1c
5. [<ffffff80265d39>] badness + 0x16a/0x1a9
6. [<ffffff80265f59>] out_of_memory + 0x1e1/0x24b
7. [<ffffff80268967>] _ alloc_pages_internal + 0x320/0 x3c2
8. [<ffffff802726cb>] handle_mm_fault + 0x225/0x708
9. [<ffffff8047514b>] do_page_fault + 0x3b4/0x76f
10. [<ffffff80473259>] error_exit + 0x0/0x51
11. node 0 DMA per-CPU:
12. CPU 0: Hi: 0, btch: 1 USD: 0
13. CPU 1: Hi: 0, btch: 1 USD: 0
14 ....
15. Active: 250206 inactive: 251609 dirty: 0 writeback: 0 unstable: 0
16. Free: 3397 Slab: 2889 mapped: 1 pagetables: 2544 bounce: 0
17. node 0 DMA free: 8024kb min: 20kb low: 24kb high: 28kb active: 8kb inactive: 180kb present: 7448kb Pa
18. ges_scanned: 308 all_unreclaimable? Yes
19. lowmem_reserve []: 0 2003 2003
20 ....
21. node 0 DMA: 6 * 4kb 4 * 8kb 2 * 16kb 2 * 32kb 5 * 64kb 1 * 128kb 3 * 256kb 1 * 512kb 2 * 1024kb 2 * 2048kb 0*4096 K
22. B = 8024kb
23. node 0 dma32: 1 * 4kb 13 * 8kb 1 * 16kb 6 * 32kb 2 * 64kb 2 * 128kb 1 * 256kb 1 * 512kb 0 * 1024kb 0 * 2048kb 1*40
24. 96kb = 5564kb
25. 29 Total pagecache pages
26. Swap cache: Add 1630129, delete 1630129, find 2279/2761
27. Free swap = 0kb
28. Total swap = 2048248kb
29. Out of memory: Kill process 2875 (sshd) score 94830592 or a child
30. Killed process 3082 (sshd)
Finally, out of memory (insufficient memory) appears ). This indicates that OOM killer is running. The reason why the sshd cannot be reconnected is that it is terminated by OOM killer. You cannot log on without restarting sshd.
Oom killer terminates the process to ensure idle memory. Next we will introduce how to select this process.
Process Selection Method
When OOM killer runs out of memory, it will view all processes and calculate scores for each process. Send the signal to the process with the highest score.
Score Calculation Method
There are many considerations for OOM killer scores. First, confirm the following 1 ~ for each process ~ Calculate the score after nine items.
1. First, the score is calculated based on the virtual memory size of the process. The virtual memory size can be confirmed by running the ps command vsz or/proc/<pid>/status vmsize note 2. For processes that are consuming virtual memory, the initial score is high. The Unit is 1 kb as one score. Processes that consume 1 GB of memory have a score of about 1 000.
2. if the process is executing a swapoff system call, set the score to the maximum value (the maximum value of unsigned long ). This is because the behavior of disabling swap is the opposite of Eliminating Memory insufficiency, and will immediately use it as the OOM killer object process.
3. If it is a parent process, half of the memory size of all sub-processes is used as the score.
4. Adjust the score based on the CPU usage time and startup time of the process. This is because the process that runs for a long time or is engaged in more work is more important, and the score must be kept low.
First, divide the score by the square root of the CPU usage time (in 10 seconds. If the CPU usage time is 90 seconds, because the unit is 10 seconds, the score is divided by the square root "3" of 9 ". In addition, the score can be adjusted based on the start time of the process. Divide the score by the square root of the start time (in the unit of 1000 seconds. For a process that runs continuously for seconds, divide the score by the square root of the square root "4" of the square root "2" of the 16 ". The longer the process runs, the more important it is.
TIPS: although the remarks in the source code are written in units of 10 s and 1000 s, the bitwise operation is actually calculated in units of 8 and 1024.
5. For processes with lower priority through nice commands, double the score. Set nice-N to 1 ~ The score of the 19 command doubles.
6. Privileged processes are generally important, so the score is set to 1/4.
7. Capability cap_sys_rawio 3 is set through capset (3) and so on. The score is 1/4. Processes that directly operate on the hardware are considered as important processes.
8. For cgroup, if the process only allows Memory nodes that are completely different from the Memory nodes allowed by the process prompting OOM killer to run, the score is 1/8.
9. Finally, adjust the score through the oom_adj value of the proc file system.
According to the above rules, all processes are scored and sigkill is sent to the process with the highest score (by Linux 2.6.10, when cap_sys_rawio is set, sigterm is sent, if no setting is set, send sigkill ).
You can use/proc/<pid>/oom_score to confirm the score of each process.
However, the init (PID 1) process cannot be an OOM killer object. When a process that becomes an object contains a child process, it first sends a signal to its child process.
After sending a signal to a process that becomes an object, if there is a process that shares the same memory space with the object process even if the thread group (tgid) is different from the whole thread that references the system, it also sends signals to these processes.
About OOM killer's proc file system
The following describes the proc file system related to oom killer.
1./proc/<pid>/oom_adj
Set/proc/<pid>/oom_adj to adjust the score. The value range is-16 ~ 15. The positive value is easily selected by OOM killer. Negative values are less likely. For example, if 3 is specified, the score is changed to 23 times; if-5 is specified, the score is changed to 1/25.
-17 is a special value. If it is set to-17, the signal sent by OOM killer will be disabled (from Linux 2.6.12 to support setting-17 ).
When OOM killer is running, the following command can be executed to exclude sshd from the object for remote login.
1. # Cat/proc/'cat/var/run/sshd. Pi'/oom_score
2. 15
3. # echo-17>/proc/'cat/var/run/sshd. Pi'/oom_adj
4. # tail/proc/'cat/var/run/sshd. PID '/OOM _*
5. ==>/proc/2278/oom_adj <=
6.-17
7. ==>/proc/2278/oom_score <=
8. 0/* score to 0 */
You can use/proc // oom_adj from Linux 2.6.18. The content is recorded in documentation/filesystems/proc.txt.
1./proc/sys/Vm/panic_on_oom
When you set/proc/sys/Vm/panic_on_oom to 1, you can avoid sending process signals when running OOM killer, but cause a major fault to the kernel.
1. # Echo 1>/proc/sys/Vm/panic_on_oom
2./proc/sys/Vm/oom_kill_allocating_task
From Linux 2.6.24 onwards, the proc file system has oom_kill_allocating_task. If you set a value other than 0, the process running OOM killer will receive signals. The score calculation process for all processes is omitted here.
1. # Echo 1>/proc/sys/Vm/oom_kill_allocating_task
In this way, you do not need to refer to all processes, but do not consider the priority of the process and root permissions, and only send signals.
1./proc/sys/Vm/oom_dump_tasks
Starting from Linux 2.6.25, when oom_dump_tasks is set to a value other than 0, the process list information will be added to the output of OOM killer runtime.
The following is a configuration example.
1. # Echo 1>/proc/sys/Vm/oom_dump_tasks
The list is shown as follows. You can use dmesg or syslog To confirm the list.
1. [pid] uid tgid total_vm rss cpu oom_adj name
2. [1] 0 1 2580 1 0 0 init
3. [500] 0 500 3231 0 1-17 udevd
4. [2736] 0 2736 1470 1 0 0 syslogd
5. [2741] 0 2741 944 0 0 0 klogd
6. [2765] 81 2765 5307 0 0 0 running-daemon
7. [2861] 0 2861 944 0 0 0 acpid
8 ....
9. [3320] 0 3320 525842 241215 1 0 Stress
10./proc/<pid>/oom_score_adj
/Proc/<pid>/oom_score_adj has been installed since Linux 2.6.36 and will be replaced with/proc/<pid>/oom_adj. For more information, see documentation/feature-removal-schedules.txt. That is, the current setting is/proc/<pid>/oom_adj. The transformed value in the kernel is also set for/proc/<pid>/oom_score_adj.
/Proc/<pid>/oom_score_adj can be set to-1000 ~ Value Between 1000. When it is set to-1000, the process is excluded from the forced termination of OOM killer.
In Versions later than kernel 2.6.36, the following information is output only once.
1. # dmesg
2 ......
3. udevd (60):/proc/60/oom_adj is deprecated, please use/proc/60/oom_score_adj instead.
4 ......
Rhel5 features
Running OOM killer in rhel5 is more cautious than running it in the upstream kernel. Oom killer calculates the number of calls and runs only when the number of calls exceeds a certain number within a certain period of time.
1. When OOM killer exceeds 5 seconds from the last call to the next call, the call times are recalculated. This is to avoid terminating the process only because of sudden memory loads.
2. The number of calls is not counted when the call is called within one second after the count is changed to 0.
3. OOM killer does not run when the number of calls is less than 10. Oom killer started to think that the memory was insufficient when calling 10 times.
4. If OOM killer does not run for less than five seconds, oom killer does not run again. Therefore, the maximum running frequency is 5 seconds. This is to prevent unnecessary continuous termination of multiple processes. It also means that the process waiting to receive the OOM killer signal is terminated (the memory is released.
5. Once OOM killer is run, the number of calls will return to 0.
That is to say, it runs only when OOM killer is called up for more than 10 times in a row within 5 seconds.
These restrictions are originally available until Linux 2.6.10. Therefore, these restrictions need to be implemented in RHEL4 Based on Linux 2.6.9. These restrictions have been removed from the current upstream kernel.
Run RHEL4
View the running status of OOM killer in RHEL4 (Linux 2.6.9. In the following example, when the memory and swap zone are both 2 GB, use the load testing tool stress to deliberately consume the memory.
Stress is a tool used to load memory, CPU, and disk I/O. You can increase the load for one of the three items, or at the same time. If the stress receives a signal during running, It outputs the information and terminates it.
1. # wget-T0-C http://weather.ou.edu /~ APW/projects/stress/stress-1.0.0.tar.gz
2. # tar zxvf stress-1.0.0.tar.gz
3. # Cd stress-1.0.0
4. #./configure; Make; make install
5. # stress -- VM 2 -- VM-bytes 2g -- VM-keep/* two processes consume 2 GB of memory respectively */
6. Stress: info: [17327] dispatching HOGS: 0 CPU, 0 Io, 2 Vm, 0 HDD
7. Stress: fail: [17327] (416) <-- worker 17328 got signal 15/* receive sigterm signal */
8. Stress: Warn: [1, 17327] (418) now reaping child worker Processes
9. Stress: fail: [1, 17327] (452) failed run completed in 70 s
The console screen is displayed as follows.
1. Oom-killer: gfp_mask = 0xd0
2. mem-Info:
3 ....
4. Swap cache: Add 524452, delete 524200, find 60/102, race 0 + 0
5. Free swap: 0 KB/* the remaining swap zone is 0 */
6. 524224 pages of RAM/* 1 page 4 kb, so the memory size is 2 GB */
7. 10227 reserved pages/* Memory reserved within the kernel */
19212 pages shared
253 pages swap cached
10. Out of memory: killed process 17328 (stress)./* processes terminated by signal */
Oom killer cannot be disabled in the upstream kernel, while OOM killer can be disabled in RHEL4 through/proc/sys/Vm/Oom-kill.
1. # Echo 0>/proc/sys/Vm/Oom-kill
Or
1. #/sbin/sysctl-w vm. Oom-kill = 0
If disabled, oom killer will not send signals, but will output the above memory information.
Rhel5 running
The method for confirming the running of OOM killer in rhel5 (Linux 2.6.18) is the same as that in RHEL4.
1. # stress -- VM 2 -- VM-bytes 2g -- VM-keep
2. Stress: info: [11779] dispatching HOGS: 0 CPU, 0 Io, 2 Vm, 0 HDD
3. Stress: fail: [11779] (416) <-- worker 11780 got signal 9/* sigkill */
4. Stress: Warn: [1, 11779] (418) now reaping child worker Processes
5. Stress: fail: [1, 11779] (452) failed run completed in 46 s
The console screen is as follows. Added backtracing output for running OOM killer to facilitate debugging.
1. Call trace:
2. [<ffffff800bf551>] out_of_memory + 0x8e/0x321
3. [<ffffffff8000f08c>] _ alloc_pages + 0x22b/0x2b4
4 ....
5. [<ffffffff800087fd>] _ handle_mm_fault + 0x208/0 xe04
6. [<ffffff80065a6a>] do_page_fault + 0x4b8/0x81d
7. [<ffffff800894ad>] default_wake_function + 0x0/0xe
8. [<ffffff80039dda>] tty_ldisc_deref + 0x68/0 x7b
9. [<ffffff8005cde9>] error_exit + 0x0/0x84
10. mem-Info:
11 ....
12. Swap cache: Add 512503, delete 512504, find 90/129, race 0 + 0
13. Free swap = 0kb
14. Total swap = 2048276kb
15. Free swap: 0 KB
16. 524224 pages of RAM
17. 42102 reserved pages
18. 78 pages shared
19. 0 pages swap cached
20. Out of memory: killed process 11780 (stress ).
Run rhel6
The OOM killer Calculation Method in rhel6.0 is basically no different from rhel5. The rhel6 series will not run with caution as described in rhel5 features. It runs basically the same as the upstream kernel.
Summary
This section describes the structure and various settings of OOM killer. When the system runs abnormally, check Syslog and so on. If the output of OOM killer is available, we can see that there was a memory shortage.