The company uses nagios monitoring to collect the required data through the nagios client and send it to the nagios server. The problem currently is that some of our machines are in another data center, such as China Eastern Airlines, client installation and Internet access are not allowed. In order to better detect the server status, you can send a text message or email to inform the O & M personnel to handle the problem, in consultation with developers, the short message and email interfaces are opened, and the server status is monitored through scripts. In case of a fault, alarms are sent through scripts.
Target Analysis:
Required monitoring resources:
1. Number of login users
2. CPU load
3. Service detection
4. Hard Disk Space root partition, application partition, and backup partition)
5. Memory Resources
SMS and email Interfaces
Upload to Attachment
Script:
#! /Bin/bash # monitor user logon to Usermonitor () {LoginUser = 'uptime | awk' {print $6} ''if [$ LoginUser-ge 2] then Critical =" the number of users logged on to the system exceeds 1: $ LoginUser (s). Check the number of operators. "Status = 0 else echo" loginuser OK "status = 1 fi} # monitor memory MemMonitor () {MemTotal = 'free-m | grep Mem | awk-F: '{print $2}' | awk' {print $1} ''memfree = 'free-m | grep cache | awk NR = 2 | awk' {print $4} ''memfreeb = 'awk' BEGIN {printf "%. 2f % \ n ", '$ MemFree/$ MemTotal \ * 100'} ''memfrees = 'awk' BEGIN {printf" %. f ", '$ MemFree/$ MemTotal \ * 100'} 'if [$ MemFreeS-lt 10] then Critical =" the available system memory is less than 10%, and the actual available memory is: $ MemFreeB, Please handle. "Status = 0 elif [$ MemFreeS-lt 20] then Warning =" the available system memory is less than 20%. The actual available memory is $ MemFreeB. Please refer. "WarningT =" memory alarm "status = 1 else echo" Mem OK "status = 2 fi} # monitor the partition space size DiskMonitorG () {# Root partition DiskGB = 'df-h | awk NR = 2 | awk '{print $5} ''DiskGS = 'df-h | awk NR = 2 | awk '{print $5}' | awk-F % '{print $1}' if [$ DiskGS-gt 90] then Critical = "the root partition usage exceeds 90%, $ DiskGB is actually used. Please handle it. "Status = 0 elif [$ DiskGS-gt 80-a $ DiskGS-lt 90] then Warning =" the root partition usage exceeds 80%. Actually, $ DiskGB is used. Please refer. "WarningT =" root partition alarm "status = 1 else echo" DiskGB OK "status = 2 fi} DiskMonitorA () {# application partition ApplyB = 'df-h | awk NR = 4 | awk '{print $5} ''ApplyS = 'df-h | awk NR = 4 | awk '{print $5}' | awk-F % '{print $1}' if [$ ApplyS-gt 90] then Critical = "the application partition usage exceeds 90%, $ ApplyB is actually used. Please handle it. "status = 0 elif [$ ApplyS-gt 80-a $ ApplyS-lt 90] then Warning =" the application partition usage exceeds 80%. Actually, $ ApplyB is used. Please refer. "WarningT =" application partition alarm "status = 1 else echo" Apply OK "status = 2 fi} # monitor CPU load () {CPULoad1 = 'uptime | awk' {print $10} '| awk-F. '{print $1}' 'cpuload2 = 'uptime' if [$ CPULoad1-gt 5] then Critical = "the CPU load is too high, even if it is processed. $ CPULoad2 "status = 0 elif [$ CPULoad1-gt 3-a $ CPULoad1-lt 5] then Warning =" CPU load Warning, $ Warning "WarningT =" CPU load alarm "status = 1 else echo" cpu OK "status = 2 fi} # monitor service status ServerMonitor () {# service status monitoring timeout = 10 makfails = 2 fails = 0 success = 0 while true do/usr/bin/wget -- timeout = $ timeout -- tries = 1 http: // 192.168.20.84/-q-O/dev/null if [$? -Ne 0] then let fails = fails + 1 success = 0 else fails = 0 let success = 1 fi if [$ success-ge 1] then exit 0 fi if [$ fails- ge 1] then Critical = "TMS application service fault, please handle it urgently! "Echo $ Critical | mutt-s" service down "hao.lulu@chinaebi.com exit-1 fi done} # Send alert SMS, alert email for n in Usermonitor MemMonitor DiskMonitorG DiskMonitorA CPULoad ServerMonitor do $ n if [$ status-eq 0] then curl "http: // 172.20.36.118/app/tms. do? TranCode = TM0311 & content = $ Critical "elif [$ status-eq 1] then curl" http: // 172.20.36.118/app/tms. do? TranCode = TM0310 & title = $ WarningT & content = Warning "else echo" OK "fidone
This article from the "Flying birds wings" blog, please be sure to keep this http://haolulu.blog.51cto.com/3164472/1244267