公司的一款產品(linux平台),最近一段時間經常出現莫名其妙的死機,開始懷疑是某個虛擬設備的驅動有問題,後來修改了代碼還是會死機,再後來我就寫了個指令碼,每隔一個小時將系統的各種資訊寫到記錄檔,直到再次死機後分析日誌發現,系統的可用記憶體從開始的900M多逐漸減少,直到最後一次記錄顯示可用記憶體為100M左右,那麼死機是不是由於記憶體耗盡引起的呢,還不能確定,我決定寫個小程式來測一下,核心在記憶體耗盡時會是怎樣的狀況。
#include
int main()
{
char *p = NULL;
int count = 1;
while(1){
p = (char *)malloc(1024*1024*10);
if(!p) {
printf("malloc error!/n");
return -1;
}
memset(p, 0, 1024*1024*10);
printf("malloc %dM memory/n",10*count++);
usleep(500000);
}
}
把這段程式分別在兩個版本的linux平台上跑,得到的結果卻完全不同:
平台1: Red Hat Linux release 8.0(2.4.18-14),實體記憶體 1G,當程式malloc到2890M 時被系統的oom killer幹掉,Out of Memory: Killed process 6448 (loop_malloc)
平台2: Red Hat Enterprise Linux WS release 3(2.4.21-50.EL),實體記憶體 1G,當程式malloc到1460M時,系統死翹翹了,ssh和http都無法訪問,但是可以ping通
而出現死機的產品就是用的平台2版本的核心,這麼說來由於記憶體耗盡導致死機的可能性越來越大了,只是不明白為什麼oom killer在平台2上沒有起作用,是沒有被調用呢,還是沒有找到合適的進程來殺?這恐怕要看核心源碼了,鑒於我對核心方面的無知,只好拜託蘿蔔同學幫忙了。。
從另一方面看,可用記憶體在短短2天減少了這麼多很是有記憶體泄露的嫌疑,由於原來指令碼收集的資訊有限,於是又修改了指令碼,增加記錄每個進程的VSZ和RSS,這次發現果然有一個進程的VSZ和RSS超大,並且在不斷增長,用valgrand 測了一下果然是記憶體泄露,這樣情況就比較明朗了,某進程記憶體泄露->記憶體耗光->死機,下面把這幾天搜集的資料整理一下。
【關於/proc/meminfo】
MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
MemFree: The sum of LowFree+HighFree
Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached
SwapCached: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn't need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)
Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
Inactive: Memory which has been less recently used. It is more eligible to be reclaimed for other purposes
HighTotal:
HighFree: Highmem is all memory above ~860MB of physical memory Highmem areas are for use by userspace programs, or for the pagecache. The kernel must use tricks to access this memory, making it slower to access than lowmem.
LowTotal:
LowFree: Lowmem is memory which can be used for everything that highmem can be used for, but it is also availble for the kernel's use for its own data structures. Among many other things, it is where everything from the Slab is allocated. Bad things happen when you're out of lowmem.
SwapTotal: total amount of swap space available
SwapFree: Memory which has been evicted from RAM, and is temporarily on the disk
Slab: in-kernel data structures cache
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in 'vm.overcommit_memory').The CommitLimit is calculated with the following formula: CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap For example, on a system with 1G of physical RAM and 7G of swap with a `vm.overcommit_ratio` of 30 it would yield a CommitLimit of 7.3G. For more details, see the memory overcommit documentation in vm/overcommit-accounting.
Committed_AS: The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which malloc()'s 1G of memory, but only touches 300M of it will only show up as using 300M of memory even if it has the address space allocated for the entire 1G. This 1G is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in 'vm.overcommit_memory'), allocations which would exceed the CommitLimit (detailed above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
【關於free】
在Linux下查看記憶體我們一般用free命令:
[root@scs-2 tmp]# free
total used free shared buffers cached
Mem: 3266180 3250004 16176 0 110652 2668236
-/+ buffers/cache: 471116 2795064
Swap: 2048276 80160 1968116
區別:第二行(mem)的used/free與第三行(-/+ buffers/cache) used/free的區別。這兩個的區別在於使用的角度來看,第一行是從OS的角度來看,因為對於OS,buffers/cached 都是屬於被使用,所以他的可用記憶體是16176KB,已用記憶體是3250004KB,其中包括,核心(OS)使用+Application(X, oracle,etc)使用的+buffers+cached.
第三行所指的是從應用程式角度來看,對於應用程式來說,buffers/cached 是等於可用的,因為buffer/cached是為了提高檔案讀取的效能,當應用程式需在用到記憶體的時候,buffer/cached會很快地被回收。所以從應用程式的角度來說,可用記憶體=系統free memory+buffers+cached。如上例:2795064=16176+110652+2668236
我們通過free命令查看機器空閑記憶體時,會發現free的值很小。這主要是因為,在linux中有這麼一種思想,記憶體不用白不用,因此它儘可能的cache和buffer一些資料,以方便下次使用。但實際上這些記憶體也是可以立刻拿來使用的。所以,空閑記憶體=free+buffers+cached=total-used
【關於/proc/ pid/status】
我們可以通過ps –aux或者top查看某個進程佔用的虛擬記憶體VSZ和實體記憶體RSS,也可以直接查看/proc/pid/status檔案得到這些資訊。
VmSize(KB) 任務虛擬位址空間的大小 (total_vm-reserved_vm),其中total_vm為進程的地址空間的大小,reserved_vm:進程在預留或特殊的記憶體間的物理頁
VmLck(KB) 任務已經鎖住的實體記憶體的大小。鎖住的實體記憶體不能交換到硬碟 (locked_vm)
VmRSS(KB) 應用程式正在使用的實體記憶體的大小,就是用ps命令的參數rss的值 (rss)
VmData(KB) 程式資料區段的大小(所佔虛擬記憶體的大小),存放初始化了的資料; (total_vm-shared_vm-stack_vm)
VmStk(KB) 任務在使用者態的棧的大小 (stack_vm)
VmExe(KB) 程式所擁有的可執行虛擬記憶體的大小,程式碼片段,不包括任務使用的庫 (end_code-start_code)
VmLib(KB) 被映像到任務的虛擬記憶體空間的庫的大小 (exec_lib)
VmPTE 該進程的所有頁表的大小,單位:kb
【關於oom killer】
Out-of-Memory (OOM) Killer,就是一層保護機制,用於避免 Linux 在記憶體不足的時候不至於出太嚴重的問題,把無關緊要的進程殺掉。
在 32 位CPU 架構下定址是有限制的。Linux 核心定義了三個地區:
# DMA: 0x00000000 - 0x00999999 (0 - 16 MB)
# LowMem: 0x01000000 - 0x037999999 (16 - 896 MB) - size: 880MB
# HighMem: 0x038000000 - <硬體特定>
什麼時候會觸發oom killer?根據我的搜查,大概就兩種情況:
1 當 low memory 被耗盡的時候,即使high memory還有很大的空閑記憶體
2 low memory裡都是片段,請求不到連續的記憶體地區
通常的問題是high memory很大仍然會觸發oom killer,或者由於片段觸發oom killer,解決辦法:
1、升級至64位的 Linux 版本,這是最好的解決方案。
2、如果是32位的 Linux 版本,最好的解決辦法就是應用 hugemem kernel,還有一個解決方案設定 /proc/sys/vm/lower_zone_protection 值為250或更高。
# echo "250" > /proc/sys/vm/lower_zone_protection
設定成啟動載入,在 /etc/sysctl.conf 中加入
vm.lower_zone_protection = 250
3、最無力的解決辦法,就是禁用 oom-killer ,這可能會導致系統掛起,所以要謹慎使用。
使 oom-killer 關閉/開啟:
# echo "0" > /proc/sys/vm/oom-kill
# echo "1" > /proc/sys/vm/oom-kill
使設定啟動時生效,需要在 /etc/sysctl.conf 中加入
vm.oom-kill = 0
而我的問題是oom killer在該被觸發的時候沒有被觸發,所以這些方法對我沒用-_-|
【關於overcommit_memory】
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
The overcommit percentage is set via `vm.overcommit_ratio'.
The current overcommit limit and amount committed are viewable in
/proc/meminfo as CommitLimit and Committed_AS respectively.
------------------------------------------------------------------------------------------
#echo 2>/proc/sys/vm/overcommit_memory
#echo 0>/proc/sys/vm/overcommit_ratio
-------------------------------------------------------------------------------------------
實際測試:
overcommit_memory ==2 ,實體記憶體使用完後,開啟任意一個程式均顯示“記憶體不足”;
overcommit_memory ==1,會從buffer中釋放較多實體記憶體,適合大型科學應用軟體,但oom-killer機制仍然起作用;
overcommit_memory ==0,系統預設設定,釋放實體記憶體較少,使得oom-killer機制運作很明顯。