check_km文檔沒有看到如何修改passive check的間隔時間,不過觀察發現icinga裡面有一個變數check_interval可以設定在service裡面。
在check_mk_objects.cfg裡面的CPU load設定裡添加這個變數:
define service { use check_mk_passive_perf host_name StaticFileServer service_description CPU load check_command check_mk-cpu.loads check_interval 0.05}
因為單位是分鐘,這裡用0.05來表示3秒間隔。
然後重新啟動icinga
service icinga restart
web頁面裡面顯示間隔為3秒。
如果要改變所有的service的監控間隔,可以修改conf.d/check_mk_templates.cfg檔案中的名為check_mk_default的service:
# Template used by all other check_mk templates define service { name check_mk_default register 0 active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 0 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 0 retain_status_information 1 retain_nonstatus_information 1 notification_interval 0 is_volatile 0 normal_check_interval 0.05 retry_check_interval 0.05 max_check_attempts 1 notification_options u,c,w,r,f,s notification_period 24X7 check_period 24X7}
上面將normal_check_ineterval和retry_check_interval修改成了0.05分鐘。
再修改icinga.cfg檔案:
command_check_interval=1s
external_command_buffer_slots=32768
加上日誌:
log_external_commands=1log_passive_checks=1
重新啟動後看日誌:
用grep命令把對某個伺服器的cpuload監控日誌過濾出來:
./icinga.log:127508:[1369051243] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127538:[1369051247] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127585:[1369051252] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127631:[1369051257] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127677:[1369051262] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127724:[1369051267] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127770:[1369051272] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127832:[1369051278] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127878:[1369051283] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127909:[1369051287] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:127955:[1369051292] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:128002:[1369051297] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:128048:[1369051302] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs./icinga.log:128125:[1369051309] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
可以看到基本上後面幾條都是間隔4-6秒。已經達到了修改的目的。
剛才是全域的設定,所有服務的檢查都改成了3s間隔,但是如果僅僅改動一個service的間隔可以嗎?我嘗試了把下面的配置單獨放在一個service中,而全域的配置仍然為1分鐘:
normal_check_interval 0.05 retry_check_interval 0.05
日誌中顯示仍然為60秒間隔,儘管web頁面上已經顯示3s.
| Service normal/retry check interval |
3s/3s |
結論:
1. 目前只找到全域的修改方式,對某個service修改無效。
2. 伺服器CPU load現在沒有什麼壓力,所以還看不出實際的效果。還需要壓力測試來證明。