1.1.1. DELL伺服器硬體監控及DELL系統管理工具OMSA介紹
本文介紹採用使用Nagios和OMSA監控DELL伺服器的硬體健康狀態,Nagios監控的方式是NRPE模式,需要配置check_openmanage指令碼和安裝DELL的OMSA工具。
使用OpenManage和Nagios監控DELL伺服器硬體部署手冊:
http://folk.uio.no/trondham/software/check_openmanage.html
1) OMSA是什麼
OMSA是Dell Openmanage Server Administrator的縮寫。
Dell OpenManage Server Administrator (OMSA) provides acomprehensive, one-to-one systems management solution in two ways: from anintegrated, web browser-based graphical user interface (GUI) and from a commandline interface (CLI) through the operating system. Server Administrator isdesigned for system administrators to manage systems locally and remotely on anetwork. It allows system administrators to focus on managing their entirenetwork by providing comprehensive one-to-one systems management.
2) 安裝OMSA
DELL OpenManage的yum源地址:
http://linux.dell.com/repo/hardware/Linux_Repository_14.04.00/
配置OMSA的yum源:
建立/etc/yum.repos.d/dell-omsa-repository.repo源檔案:
或者
執行命令自動設定:
wget -q -O -http://linux.dell.com/repo/hardware/Linux_Repository_14.04.00/bootstrap.cgi |bash
安裝OMSA:
yum install srvadmin-all
配置OMSA服務開機自動:
/opt/dell/srvadmin/sbin/srvadmin-services.sh enable
啟動OMSA的服務:
/opt/dell/srvadmin/sbin/srvadmin-services.sh start
查看omsa的服務狀態:
/opt/dell/srvadmin/sbin/srvadmin-services.sh status
dell_rbu(module) is running
ipmidriver is running
dsm_sa_datamgrd(pid 1331 1197) is running
dsm_sa_eventmgrd(pid 1381) is running
dsm_sa_snmpd(pid 1440) is running
dsm_om_shrsvcd(pid 1508) is running...
dsm_om_connsvcd(pid 1562) is running...
查看omsa的服務的監聽狀態:
# netstat -npae | egrep -iv'mysql|ssh|xinetd|udevd|crond|syslogd|upstart|auditd'
我們注意到dsm_om_connsvc服務監聽了TCP協議的1311連接埠,並且提供了http訪問的功能。
3) 卸載OMSA工具包中的web組件
由於本案例中只需要使用OMSA工具包監控硬體健康狀態的功能,不需要使用OMSA提供的Web管理功能,所以為了避免由於系統防火牆或者web管理不當導致的問題,我們在本例中卸載掉OMSA的web組件。
查看OMSA組件的監聽情況:
# netstat -npae | egrep -iv 'mysql|ssh|xinetd|udevd|crond|syslogd|upstart|auditd'
ActiveInternet connections (servers and established)
ProtoRecv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 :::1311 :::* LISTEN 0 656427 1563/dsm_om_connsvc
ActiveUNIX domain sockets (servers and established)
ProtoRefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 6555231197/dsm_sa_datamgr /opt/dell/srvadmin/var/lib/openmanage/.ipc/dcsmilpipea
unix 2 [ ACC ] STREAM LISTENING 655525 1197/dsm_sa_datamgr/opt/dell/srvadmin/var/lib/openmanage/.ipc/dcsmilpipep
unix 2 [ ACC ] STREAM LISTENING 655527 1197/dsm_sa_datamgr/opt/dell/srvadmin/var/lib/openmanage/.ipc/dcsmilpipeu
unix 2 [ ACC ] STREAM LISTENING 655770 1508/dsm_om_shrsvcd/opt/dell/srvadmin/var/lib/openmanage/shrsvc/dsm_om_shrsvc
unix 2 [ ACC ] STREAM LISTENING 655772 1508/dsm_om_shrsvcd/opt/dell/srvadmin/var/lib/openmanage/shrsvc/omintf5e4
unix 2 [ ] STREAM CONNECTED 656423 1563/dsm_om_connsvc
unix 2 [ ] STREAM CONNECTED 656015 1563/dsm_om_connsvc
unix 3 [ ] STREAM CONNECTED 655972 1197/dsm_sa_datamgr/opt/dell/srvadmin/var/lib/openmanage/.ipc/dcsmilpipea
unix 3 [ ] STREAM CONNECTED 655971 1563/dsm_om_connsvc
unix 3 [ ] STREAM CONNECTED 655650 1197/dsm_sa_datamgr/opt/dell/srvadmin/var/lib/openmanage/.ipc/dcsmilpipea
unix 3 [ ] STREAM CONNECTED 655649 1440/dsm_sa_snmpd
unix 3 [ ] STREAM CONNECTED 655589 1197/dsm_sa_datamgr/opt/dell/srvadmin/var/lib/openmanage/.ipc/dcsmilpipea
unix 3 [ ] STREAM CONNECTED 655588 1381/dsm_sa_eventmg
OMSA組件的dsm_om_connsvc服務監聽了TCP協議的1311連接埠。
查看dsm_om_connsvc服務調用的程式:
lsof -p 1563 # 1563是dsm_om_connsvc服務進程的PID
查看dsm_om_connsvc調用的程式檔案屬於哪個rpm包:
# rpm -qf /opt/dell/srvadmin/lib64/openmanage/apache-tomcat/lib/tomcat-api.jar
srvadmin-tomcat-7.4.0-4.97.1.el6.x86_64
# rpm -qf /opt/dell/srvadmin/lib64/openmanage/jre/lib/jce.jar
srvadmin-jre-7.4.0-4.98.1.el6.x86_64
關閉OMSA的服務:
/opt/dell/srvadmin/sbin/srvadmin-services.sh stop
卸載srvadmin-tomcat和srvadmin-jre軟體包:
# rpm -e srvadmin-tomcat-7.4.0-4.97.1.el6.x86_64
error: Failed dependencies:
srvadmin-tomcat = 7.4.0 is needed by (installed)srvadmin-webserver-7.4.0-4.1.1.el6.x86_64
# rpm -e srvadmin-webserver-7.4.0-4.1.1.el6.x86_64
error: Failed dependencies:
srvadmin-webserver = 7.4.0 is needed by (installed)srvadmin-all-7.4.0-4.1.1.el6.x86_64
卸載srvadmin-webserver軟體包:
# rpm -e --nodeps srvadmin-webserver-7.4.0-4.1.1.el6.x86_64
# rpm -e srvadmin-tomcat-7.4.0-4.97.1.el6.x86_64
# rpm -e srvadmin-jre-7.4.0-4.98.1.el6.x86_64
刪除appache-tomcat的目錄:
# rm -rf /opt/dell/srvadmin/lib64/openmanage/apache-tomcat
啟動OMSA的服務:
# /opt/dell/srvadmin/sbin/srvadmin-services.sh start
StartingSystems Management Device Drivers:
Startingdell_rbu: [ OK ]
Startingipmi driver: Already started [ OK ]
StartingSystems Management Data Engine:
Startingdsm_sa_datamgrd: [ OK ]
Startingdsm_sa_eventmgrd: [ OK ]
Startingdsm_sa_snmpd: [ OK ]
Starting DSMSA Shared Services: [ OK ]
測試check_openmanage指令碼:
./check_openmanage -d
輸出的檢查項跟卸載OMSA的web組件之前是一樣的。
查看卸載OMSA的web組件之後,OMSA的服務監聽情況:
# netstat -npae | egrep -iv'mysql|ssh|xinetd|udevd|crond|syslogd|upstart|auditd'
此時,OMSA的服務只進行了Unix domain sockets的監聽,而Unix domain socket只用於本作業系統中,進程之間的通訊,比如check_openmanage指令碼調用OMSA的服務進行DELL伺服器硬體健康狀態檢查。
至此,OMSA的Web組件卸載完畢。
4) 安裝check_openmanage安裝包
:
http://folk.uio.no/trondham/software/check_openmanage.html#download
下載check_openmanage工具包:
wget http://folk.uio.no/trondham/software/files/check_openmanage-3.7.11.tar.gz
測試check_openmanage工具:
tar zxf check_openmanage-3.7.11.tar.gz
cd check_openmanage-3.7.11
./check_openmanage -d
./check_openmanage
如果提示"Storage Error",則加上--no-storage參數:
./check_openmanage --no-storage
# check_openmanage會檢查50~60項左右DELL服務的資訊
5) 配置NRPE
編輯/usr/local/nagios/etc/nrpc.cfg檔案,添加:
command[check_dell_openmanage]=/path/to/check_openmanage
或者
command[check_dell_openmanage]=/path/to/check_openmanage--no-storage
把check_openmanage指令碼拷貝到/usr/local/nagios/libexec/目錄:
cp check_openmanage-3.7.11/check_openmanage/usr/local/nagios/libexec/
測試命令:
check_nrpe -H IP -c check_dell_openmanage
6) 注意事項
check_openmanage是perl指令碼,所以作業系統必須已經安裝了perl。
7) 伺服器沒有外網怎麼辦
如果伺服器沒有外網,可以考慮在有外網的機器上做一個iptables的nat映射,把只有內網的伺服器配置的yum映射到公網,或者在機房內部署一個yum源;
8) 還可以用什麼方法
如果不使用OMSA和check_openmanage來監控硬體健康狀態,還可以使用ipmitool來監控,不過需要自己開發指令碼。
9) 適用什麼環境使用
如果在使用DELL廠商的server,都建議使用。
10) 關於OMSA組件安全性的補充說明
Dell OpenManage Server Administrator (OMSA) 7.1及更早版本在實現上存在XSS漏洞,可允許遠程攻擊者注入Web指令碼或HTML。DELL廠商已經發布了升級補丁來修複這個安全問題,請到廠商的首頁下載,詳見本文最後一部分的“本文相關資料的參考連結”部分的“OMSA組件安全補丁的”。
11) DELL廠商提供的其它系統管理工具
DELL還提供用於Microsoft System Center管理組態工具、OracleEntreprise Manager 12c外掛程式,以及支援HP和IBM的外掛程式等工具。詳情請登入DELL官方網站,依次進入Support--> Drivers & Downloads --> 選擇伺服器類型--> System Managements模組。
12) DELL提供的用於OracleEnterprise Manager 12c的外掛程式說明
Dell OpenManage Plug-in v1.0 for Oracle Enterprise Manager12c
Dell OpenManage Plug-in for Oracle Enterprise Managerprovides a proactive approach to data center management that delivers featuresfor monitoring Dell server, storage, and networking infrastructures inenvironment managed by Oracle Enterprise Manager (OEM). It also supportsmapping of database workload to Dell hardware for quicker fault detection andconsole launch of Dell devices to perform troubleshooting, configuration, andmanagement activities. It protects customer’s existing investment in OEM consoleand helps in ease of integration and management of Dell devices.
13) 什麼是Unix domain socket
A Unixdomain socket or IPC socket (inter-process communication socket) is a datacommunications endpoint for exchanging data between processes executing withinthe same host operating system. While similar in functionality to named pipes,Unix domain sockets may be created as connection?mode(SOCK_STREAM or SOCK_SEQPACKET) or as connectionless (SOCK_DGRAM), while pipesare streams only. Processes using Unix domain sockets do not need to share acommon ancestry. The API for Unix domain sockets is similar to that of anInternet socket, but it does not use an underlying network protocol forcommunication. The Unix domain socket facility is a standard component of POSIXoperating systems.Unixdomain sockets use the file system as their address name space. They arereferenced by processes as inodes in the file system. This allows two processesto open the same socket in order to communicate. However, communication occursentirely within the operating system kernel.In addition to sending data, processes may send filedescriptors across a Unix domain socket connection using the sendmsg() andrecvmsg() system calls.
14) 本文相關資料的參考連結
自訂check_openmanage的閥值:
http://dreamway.blog.51cto.com/1281816/1048274
omreport命令使用:
http://www.sxszjzx.com/~t096/manual/sc/Dosa/CLI/report.htm
DELL廠商網址:
http://www.dell.com/support/drivers/us/en/04/ProductSelector/Select/FamilySelection?CategoryPath=all-products%2Fesuprt_ser_stor_net%2Fesuprt_poweredge&Family=PowerEdge&DisplayCrumbs=Product%2520Type%40%2CServers%252C%2520Storage%252C%2520%2526%2520Networking%40%2CPowerEdge&rquery=na
DELL廠商OMSA的yum源地址:
http://linux.dell.com/repo/hardware/Linux_Repository_14.04.00/
使用OpenManage和Nagios監控DELL伺服器硬體部署手冊:
http://folk.uio.no/trondham/software/check_openmanage.html
OMSA組件安全補丁的:
http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-r710?driverId=5JDN0&osCode=WNET&fileId=3082293694
http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-r710?driverId=PCXMR&osCode=WNET&fileId=3082295344
http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-r710?driverId=JJMWP&osCode=WNET&fileId=3082295338
DELL提供的用於Oracle Enterprise Manager 12c的外掛程式地址:
http://www.dell.com/support/drivers/us/en/04/DriverDetails/Product/poweredge-r710?driverId=XKRM6&osCode=WS8R2&fileId=3356540401&languageCode=en&categoryId=SM
DELL廠商提供的其它管理工具的網址:
http://www.dell.com/support/drivers/us/en/04/ProductSelector/Select/FamilySelection?CategoryPath=all-products%2Fesuprt_ser_stor_net%2Fesuprt_poweredge&Family=PowerEdge&DisplayCrumbs=Product%2520Type%40%2CServers%252C%2520Storage%252C%2520%2526%2520Networking%40%2CPowerEdge&rquery=na
Unix domain socket的解釋說明:
http://en.wikipedia.org/wiki/Unix_domain_socket