關於oom kiler

來源:互聯網
上載者:User

公司的一款產品(linux平台),最近一段時間經常出現莫名其妙的死機,開始懷疑是某個虛擬設備的驅動有問題,後來修改了代碼還是會死機,再後來我就寫了個指令碼,每隔一個小時將系統的各種資訊寫到記錄檔,直到再次死機後分析日誌發現,系統的可用記憶體從開始的900M多逐漸減少,直到最後一次記錄顯示可用記憶體為100M左右,那麼死機是不是由於記憶體耗盡引起的呢,還不能確定,我決定寫個小程式來測一下,核心在記憶體耗盡時會是怎樣的狀況。
  
  #include 
  
  int main()
  {
   char *p = NULL;
   int count = 1;
   
   while(1){
   p = (char *)malloc(1024*1024*10);
   if(!p) {
   printf("malloc error!/n");
   return -1;
   }
   memset(p, 0, 1024*1024*10);
   printf("malloc %dM memory/n",10*count++);
   usleep(500000);
   }
  }
  
  把這段程式分別在兩個版本的linux平台上跑,得到的結果卻完全不同:

  平台1: Red Hat Linux release 8.0(2.4.18-14),實體記憶體 1G,當程式malloc到2890M 時被系統的oom killer幹掉,Out of Memory: Killed process 6448 (loop_malloc)
  
  平台2: Red Hat Enterprise Linux WS release 3(2.4.21-50.EL),實體記憶體 1G,當程式malloc到1460M時,系統死翹翹了,ssh和http都無法訪問,但是可以ping通
  
  而出現死機的產品就是用的平台2版本的核心,這麼說來由於記憶體耗盡導致死機的可能性越來越大了,只是不明白為什麼oom killer在平台2上沒有起作用,是沒有被調用呢,還是沒有找到合適的進程來殺?這恐怕要看核心源碼了,鑒於我對核心方面的無知,只好拜託蘿蔔同學幫忙了。。
  
  從另一方面看,可用記憶體在短短2天減少了這麼多很是有記憶體泄露的嫌疑,由於原來指令碼收集的資訊有限,於是又修改了指令碼,增加記錄每個進程的VSZ和RSS,這次發現果然有一個進程的VSZ和RSS超大,並且在不斷增長,用valgrand 測了一下果然是記憶體泄露,這樣情況就比較明朗了,某進程記憶體泄露->記憶體耗光->死機,下面把這幾天搜集的資料整理一下。
  
【關於/proc/meminfo】
  
  MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
  MemFree: The sum of LowFree+HighFree
  Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so)
  Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached
  SwapCached: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn't need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)
  Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
  Inactive: Memory which has been less recently used. It is more eligible to be reclaimed for other purposes
  HighTotal:
  HighFree: Highmem is all memory above ~860MB of physical memory Highmem areas are for use by userspace programs, or for the pagecache. The kernel must use tricks to access this memory, making it slower to access than lowmem.
  LowTotal:
  LowFree: Lowmem is memory which can be used for everything that highmem can be used for, but it is also availble for the kernel's use for its own data structures. Among many other things, it is where everything from the Slab is allocated. Bad things happen when you're out of lowmem.
  SwapTotal: total amount of swap space available
  SwapFree: Memory which has been evicted from RAM, and is temporarily on the disk
  Slab: in-kernel data structures cache
  CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),this is the total amount of memory currently available to be allocated on the system. This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in 'vm.overcommit_memory').The CommitLimit is calculated with the following formula: CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap For example, on a system with 1G of physical RAM and 7G of swap with a `vm.overcommit_ratio` of 30 it would yield a CommitLimit of 7.3G. For more details, see the memory overcommit documentation in vm/overcommit-accounting.
  Committed_AS: The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which malloc()'s 1G of memory, but only touches 300M of it will only show up as using 300M of memory even if it has the address space allocated for the entire 1G. This 1G is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in 'vm.overcommit_memory'), allocations which would exceed the CommitLimit (detailed above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
  
【關於free】
  
  在Linux下查看記憶體我們一般用free命令:
  [root@scs-2 tmp]# free
   total used free shared buffers cached
  Mem: 3266180 3250004 16176 0 110652 2668236
  -/+ buffers/cache: 471116 2795064
  Swap: 2048276 80160 1968116
  
  區別:第二行(mem)的used/free與第三行(-/+ buffers/cache) used/free的區別。這兩個的區別在於使用的角度來看,第一行是從OS的角度來看,因為對於OS,buffers/cached 都是屬於被使用,所以他的可用記憶體是16176KB,已用記憶體是3250004KB,其中包括,核心(OS)使用+Application(X, oracle,etc)使用的+buffers+cached.

  第三行所指的是從應用程式角度來看,對於應用程式來說,buffers/cached 是等於可用的,因為buffer/cached是為了提高檔案讀取的效能,當應用程式需在用到記憶體的時候,buffer/cached會很快地被回收。所以從應用程式的角度來說,可用記憶體=系統free memory+buffers+cached。如上例:2795064=16176+110652+2668236

  我們通過free命令查看機器空閑記憶體時,會發現free的值很小。這主要是因為,在linux中有這麼一種思想,記憶體不用白不用,因此它儘可能的cache和buffer一些資料,以方便下次使用。但實際上這些記憶體也是可以立刻拿來使用的。所以,空閑記憶體=free+buffers+cached=total-used
  
【關於/proc/ pid/status】
  
  我們可以通過ps –aux或者top查看某個進程佔用的虛擬記憶體VSZ和實體記憶體RSS,也可以直接查看/proc/pid/status檔案得到這些資訊。
  VmSize(KB) 任務虛擬位址空間的大小 (total_vm-reserved_vm),其中total_vm為進程的地址空間的大小,reserved_vm:進程在預留或特殊的記憶體間的物理頁
  VmLck(KB) 任務已經鎖住的實體記憶體的大小。鎖住的實體記憶體不能交換到硬碟 (locked_vm)
  VmRSS(KB) 應用程式正在使用的實體記憶體的大小,就是用ps命令的參數rss的值 (rss)
  VmData(KB) 程式資料區段的大小(所佔虛擬記憶體的大小),存放初始化了的資料; (total_vm-shared_vm-stack_vm)
  VmStk(KB) 任務在使用者態的棧的大小 (stack_vm)
  VmExe(KB) 程式所擁有的可執行虛擬記憶體的大小,程式碼片段,不包括任務使用的庫 (end_code-start_code)
  VmLib(KB) 被映像到任務的虛擬記憶體空間的庫的大小 (exec_lib)
  VmPTE 該進程的所有頁表的大小,單位:kb
  
【關於oom killer】
  
  Out-of-Memory (OOM) Killer,就是一層保護機制,用於避免 Linux 在記憶體不足的時候不至於出太嚴重的問題,把無關緊要的進程殺掉。
  
  在 32 位CPU 架構下定址是有限制的。Linux 核心定義了三個地區:
  # DMA: 0x00000000 - 0x00999999 (0 - 16 MB) 
  # LowMem: 0x01000000 - 0x037999999 (16 - 896 MB) - size: 880MB
  # HighMem: 0x038000000 - <硬體特定> 
  
  什麼時候會觸發oom killer?根據我的搜查,大概就兩種情況:
  1 當 low memory 被耗盡的時候,即使high memory還有很大的空閑記憶體 
  2 low memory裡都是片段,請求不到連續的記憶體地區
  
  通常的問題是high memory很大仍然會觸發oom killer,或者由於片段觸發oom killer,解決辦法:
  1、升級至64位的 Linux 版本,這是最好的解決方案。 
  2、如果是32位的 Linux 版本,最好的解決辦法就是應用 hugemem kernel,還有一個解決方案設定 /proc/sys/vm/lower_zone_protection 值為250或更高。
  # echo "250" > /proc/sys/vm/lower_zone_protection 
  設定成啟動載入,在 /etc/sysctl.conf 中加入 
  vm.lower_zone_protection = 250 
  3、最無力的解決辦法,就是禁用 oom-killer ,這可能會導致系統掛起,所以要謹慎使用。 
  使 oom-killer 關閉/開啟: 
  # echo "0" > /proc/sys/vm/oom-kill 
  # echo "1" > /proc/sys/vm/oom-kill 
  使設定啟動時生效,需要在 /etc/sysctl.conf 中加入 
  vm.oom-kill = 0 
  
  而我的問題是oom killer在該被觸發的時候沒有被觸發,所以這些方法對我沒用-_-|
  
【關於overcommit_memory】
  
  The Linux kernel supports the following overcommit handling modes
  0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
  1 - Always overcommit. Appropriate for some scientific applications.
  2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.
  
  The overcommit policy is set via the sysctl `vm.overcommit_memory'.
  The overcommit percentage is set via `vm.overcommit_ratio'.
  The current overcommit limit and amount committed are viewable in
  /proc/meminfo as CommitLimit and Committed_AS respectively.
  ------------------------------------------------------------------------------------------
  #echo 2>/proc/sys/vm/overcommit_memory
  #echo 0>/proc/sys/vm/overcommit_ratio
  -------------------------------------------------------------------------------------------
  實際測試:
  overcommit_memory ==2 ,實體記憶體使用完後,開啟任意一個程式均顯示“記憶體不足”;
  overcommit_memory ==1,會從buffer中釋放較多實體記憶體,適合大型科學應用軟體,但oom-killer機制仍然起作用;
  overcommit_memory ==0,系統預設設定,釋放實體記憶體較少,使得oom-killer機制運作很明顯。

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.