AIX6.1/11.2.0.3資料庫上關於SWAP的一個BUG,aix6.111.2.0.3
昨天去南京某客戶那裡調優新上線的業務資料庫,在查看alert.log日誌時發現在過去的一段時間裡,每過幾個小時或間隔一段時間,就會報類似以下的內容:
Thu Aug 21 09:01:26 2014WARNING: Heavy swapping observed on system in last 5 mins.pct of memory swapped in [8.42%] pct of memory swapped out [2.16%].Please make sure there is no memory pressure and the SGA and PGAare configured correctly. Look at DBRM trace file for more details.
Thu Aug 21 14:56:27 2014WARNING: Heavy swapping observed on system in last 5 mins.pct of memory swapped in [5.40%] pct of memory swapped out [8.63%].Please make sure there is no memory pressure and the SGA and PGAare configured correctly. Look at DBRM trace file for more details.
......
Sat Oct 18 22:13:48 2014WARNING: Heavy swapping observed on system in last 5 mins.pct of memory swapped in [7.76%] pct of memory swapped out [0.33%].Please make sure there is no memory pressure and the SGA and PGAare configured correctly. Look at DBRM trace file for more details.
客戶的環境是IBM P570,AIX 6.1,安裝了Oracle 11.2.0.3單一實例資料庫,實體記憶體64G,僅僅分配了20G給SGA,採用memory自動管理
查閱了一下MOS,發現是AIX平台上的一個bug,相關文檔為:[1508575.1]
對應的資料庫和平台:
Oracle Database - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]IBM AIX on POWER Systems (64-bit)
癥狀:
There is new warning message in alert.log in 11.2.0.3 similar to
WARNING: Heavy swapping observed on system in last 5 mins.
pct of memory swapped in [2.08%] pct of memory swapped out [0.12%].
Please make sure there is no memory pressure and the SGA and PGA
are configured correctly. Look at DBRM trace file for more details.
On AIX platform this message can be seen even when there is no virtual memory swapping at all. --實體記憶體足夠,而且根本沒有使用swap交換空間
You may compare the vmstat from AIX level with DBRM trace file entries to see the differences.
原因:
The issue is caused by unpublished Bug:14731911.
Swap usage messages are based on statistics that do not reflect the actual usage.
The v$osstat does not reflect proper stats for the swap space paging.
解決方案:
Apply Patch:11801934 on top of your IBM AIX on POWER Systems (64-bit) platform.
P.S: Bug is port-specific. --這個bug是針對連接埠指定的平台的
The issue is fixed in patchset 11.2.0.4 and release 12.1. --說是在12.1的patch中修複了,但實際上12.1還是會有這個問題,會有ora-700錯誤,詳見文檔:[ID 1919850.1]
來看一下BUG:14731911的描述:
類型 |
B - Defect |
已在產品版本中修複 |
|
嚴重性 |
2 - Severe Loss of Service |
產品版本 |
11.2.0.3 |
狀態 |
96 - Closed, Duplicate Bug |
平台 |
212 - IBM AIX on POWER Systems (64-bit) |
建立時間 |
2012-10-8 |
平台版本 |
6.1 |
更新時間 |
2014-10-11 |
基本 Bug |
11801934 |
資料庫版本 |
11.2.0.3 |
影響平台 |
Port-Specific |
產品源 |
Oracle |
與此 Bug 相關的知識, 補丁程式和 Bug |
產品線 |
Oracle Database Products |
系列 |
Oracle Database Suite |
地區 |
Oracle Database |
產品 |
5 - Oracle Database - Enterprise Edition |
Hdr: 14731911 11.2.0.3 RDBMS 11.2.0.3 VOS PRODID-5 PORTID-212 11801934Abstract: FALSE SWAP WARNING MESSAGES PRINTED TO ALERT.LOG ON AIX *** 10/08/12 04:52 am *** BUG TYPE CHOSEN =============== Code SubComponent: Virtual Operating System ====================================== DETAILED PROBLEM DESCRIPTION ============================ Oracle process seems to check wrong OS local statistic (which include also FILESYSTEM caching etc.) Alert log shows WARNING: Heavy swapping observed on system in last 5 mins. pct of memory swapped in [2.08%] pct of memory swapped out [0.12%]. Please make sure there is no memory pressure and the SGA and PGA are configured correctly. Look at DBRM trace file for more details. but this is not reflected at OS level. DIAGNOSTIC ANALYSIS =================== 1. nmon shows virtual memory swapping does not occur at all - see attached file --nmon根本沒有監控到swap動作 2. Oracle Database Server is 11.2.0.3 and contains fix for 10220118 3. Server configuration real mem: 144GB lowest value of fre memory : 87,65 GB --剩餘記憶體充足 4. DBRM seems to use a wrong OS statistics - trace file is attached WORKAROUND? =========== No TECHNICAL IMPACT ================ Wrong diagnostic analyze. Message is bothering customer's DBA when in fact the warning message is misleading RELATED ISSUES (bugs, forums, RFAs) =================================== http://myforums.oracle.com/jive3/thread.jspa?threadID=1104581 10220118 HOW OFTEN DOES THE ISSUE REPRODUCE AT CUSTOMER SITE? ==================================================== Always DOES THE ISSUE REPRODUCE INTERNALLY? ==================================== No EXPLAIN WHY THE ISSUE WAS NOT TESTED INTERNALLY. ================================================ Unavailable Data Volume IS A TESTCASE AVAILABLE? ======================== No Link to IPS Package: ==================== not available
DBRM(Database Resource Manager)是11gR2中新特性中出現的後台進程,會在alert.log警示日誌中反映OS作業系統最近5分鐘是否有劇烈的swap活動,而在AIX平台上,由於BUG:14731911的存在,oracle的這個進程謊報了記憶體進行了swapin和swapout動作。我們知道,只有當實體記憶體真的不夠用的情況下,才會去用swap(通常會配置成實體記憶體的2倍),而swap是非常耗費效能的(從物理磁碟讀寫)。但是個人認為這個bug的危害性並不大,僅僅只是在alert.log日誌中報了一個WARNING,並沒有因為這個影響導致對資料庫更加負面的影響,因此是否打補丁到11.2.0.4就見仁見智了,如果想讓alert.log平安無事,那麼就可以升級一下patch。當然了,如果真的是因為OS記憶體吃緊造成的swap動作,就要區別對待了,因為此時的確會對資料庫造成嚴重影響。要區分是否真的記憶體不足而非系統誤判,那麼主要還是通過nmon,topas,vmstat等監控工具來進行分析(linux下還可以用free監控)
對於AIX平台,其實還有另一個bug,只不過是unpublished base bug,而不是port-specific bugAIX Platform
If your Platform is IBM-AIX then this is not the only possible reason for this alert log message.
For IBM AIX on POWER Systems (64-bit), there is also next known port-specific bug:
Bug 14731911 - FALSE SWAP WARNING MESSAGES PRINTED TO ALERT.LOG ON AIX
with unpublished base bug:
Bug 11801934 : WRONG PAGE-IN AND PAGE-OUT OS VM STATS IN AIX.
在vmware平台中的這個WARNING資訊,如果不是bug引起,則很有可能和ora-04031/ora-04030相關,這個就嚴重多了
VMWare
Under VMWare, the messages may perhaps indicate a more serious issue, even when no memory related ORA-4031/ORA-4030 errors are reported.
Under circumstances, an instance in a virtual machine may be simply terminated by PMON due to error 471 without further errors in the alert log.
The OS logs may in such case report an out of memory condition like below:
[root@vmh ~]# grep Kill /var/log/messages*
/var/log/messages-20140629:Jun 27 18:29:06 vmh-msfc-dodp02 kernel: [1895074.304941] Out of memory: Kill process 42094 (oracle) score 391 or sacrifice child
/var/log/messages-20140629:Jun 27 18:29:06 vmh-msfc-dodp02 kernel: [1895074.305203] Killed process 42094, UID 303, (oracle) total-vm:189081588kB, anon-rss:27412kB, file-rss:109612
通常解決OS記憶體swap問題有以下幾種方案:
1. 診斷是否存在記憶體泄露的進程,解決記憶體泄露2. 調優SGA/PGA,減少oracle對記憶體的佔用3. 利用/proc/sys/vm/drop_caches,暫時釋放一些cache的記憶體(Linux)4. 調整系統VM記憶體管理參數, 例如Linux上sysctl.conf中的以下幾個參數:
vm.min_free_kbytes:Raising the value in /proc/sys/vm/min_free_kbytes will cause the system to start reclaiming memory at an earlier time than it would have before.
vm.vfs_cache_pressure:At the default value of vfs_cache_pressure = 100 the kernel will attempt to reclaim dentries and inodes at a “fair” rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.
vm.swappiness:default 60,Apparently /proc/sys/vm/swappiness on Red Hat Linux allows the admin to tune how aggressively the kernel swaps out processes'memory. Decreasing the swappiness setting may result in improved Directory performance as the kernel holds more of the server process in memory longer before swapping it out.
設定以下值,以減少OOM(Out Of Memory)的可能性:
# Oracle-Validated setting for vm.min_free_kbytes is 51200 to avoid OOM killer
vm.min_free_kbytes = 51200
vm.swappiness = 40
vm.vfs_cache_pressure = 200
aix61較aix53有什變化,升級到61後,應用程式是否需要全部重新編譯,oracle資料庫是否需要重新設定等
版本跨度較大,應用程式理論上是需要重編譯的並進行運行測試的。
Oracle的資料庫要根據具體版本去查詢官方的認證情況,看看是否可以支援Aix 6.1,有可能需要升級Oracle資料庫版本或者安裝相應補丁
AIX61 怎來升級系統補丁,怎上傳補丁到AIX系統中,協助我,給一個詳細的步驟
1.去IBM官方網站下載相關作業系統補丁
2.採用ftp方式,上傳到aix系統下,比如tmp目錄裡。可以採用命令列方式或者使用FTP工具
3.cd 進入此目錄
4.smitty installp 選擇update to latestxxxxxxx這一欄,升級的目錄選擇(.)為目前的目錄