Shell script-monitor system resources and send SMS alerts

Source: Internet
Author: User

The company uses nagios monitoring to collect the required data through the nagios client and send it to the nagios server. The problem currently is that some of our machines are in another data center, such as China Eastern Airlines, client installation and Internet access are not allowed. In order to better detect the server status, you can send a text message or email to inform the O & M personnel to handle the problem, in consultation with developers, the short message and email interfaces are opened, and the server status is monitored through scripts. In case of a fault, alarms are sent through scripts.


Target Analysis:


Required monitoring resources:

1. Number of login users

2. CPU load

3. Service detection

4. Hard Disk Space root partition, application partition, and backup partition)

5. Memory Resources

SMS and email Interfaces


Upload to Attachment


Script:

#! /Bin/bash # monitor user logon to Usermonitor () {LoginUser = 'uptime | awk' {print $6} ''if [$ LoginUser-ge 2] then Critical =" the number of users logged on to the system exceeds 1: $ LoginUser (s). Check the number of operators. "Status = 0 else echo" loginuser OK "status = 1 fi} # monitor memory MemMonitor () {MemTotal = 'free-m | grep Mem | awk-F: '{print $2}' | awk' {print $1} ''memfree = 'free-m | grep cache | awk NR = 2 | awk' {print $4} ''memfreeb = 'awk' BEGIN {printf "%. 2f % \ n ", '$ MemFree/$ MemTotal \ * 100'} ''memfrees = 'awk' BEGIN {printf" %. f ", '$ MemFree/$ MemTotal \ * 100'} 'if [$ MemFreeS-lt 10] then Critical =" the available system memory is less than 10%, and the actual available memory is: $ MemFreeB, Please handle. "Status = 0 elif [$ MemFreeS-lt 20] then Warning =" the available system memory is less than 20%. The actual available memory is $ MemFreeB. Please refer. "WarningT =" memory alarm "status = 1 else echo" Mem OK "status = 2 fi} # monitor the partition space size DiskMonitorG () {# Root partition DiskGB = 'df-h | awk NR = 2 | awk '{print $5} ''DiskGS = 'df-h | awk NR = 2 | awk '{print $5}' | awk-F % '{print $1}' if [$ DiskGS-gt 90] then Critical = "the root partition usage exceeds 90%, $ DiskGB is actually used. Please handle it. "Status = 0 elif [$ DiskGS-gt 80-a $ DiskGS-lt 90] then Warning =" the root partition usage exceeds 80%. Actually, $ DiskGB is used. Please refer. "WarningT =" root partition alarm "status = 1 else echo" DiskGB OK "status = 2 fi} DiskMonitorA () {# application partition ApplyB = 'df-h | awk NR = 4 | awk '{print $5} ''ApplyS = 'df-h | awk NR = 4 | awk '{print $5}' | awk-F % '{print $1}' if [$ ApplyS-gt 90] then Critical = "the application partition usage exceeds 90%, $ ApplyB is actually used. Please handle it. "status = 0 elif [$ ApplyS-gt 80-a $ ApplyS-lt 90] then Warning =" the application partition usage exceeds 80%. Actually, $ ApplyB is used. Please refer. "WarningT =" application partition alarm "status = 1 else echo" Apply OK "status = 2 fi} # monitor CPU load () {CPULoad1 = 'uptime | awk' {print $10} '| awk-F. '{print $1}' 'cpuload2 = 'uptime' if [$ CPULoad1-gt 5] then Critical = "the CPU load is too high, even if it is processed. $ CPULoad2 "status = 0 elif [$ CPULoad1-gt 3-a $ CPULoad1-lt 5] then Warning =" CPU load Warning, $ Warning "WarningT =" CPU load alarm "status = 1 else echo" cpu OK "status = 2 fi} # monitor service status ServerMonitor () {# service status monitoring timeout = 10 makfails = 2 fails = 0 success = 0 while true do/usr/bin/wget -- timeout = $ timeout -- tries = 1 http: // 192.168.20.84/-q-O/dev/null if [$? -Ne 0] then let fails = fails + 1 success = 0 else fails = 0 let success = 1 fi if [$ success-ge 1] then exit 0 fi if [$ fails- ge 1] then Critical = "TMS application service fault, please handle it urgently! "Echo $ Critical | mutt-s" service down "hao.lulu@chinaebi.com exit-1 fi done} # Send alert SMS, alert email for n in Usermonitor MemMonitor DiskMonitorG DiskMonitorA CPULoad ServerMonitor do $ n if [$ status-eq 0] then curl "http: // 172.20.36.118/app/tms. do? TranCode = TM0311 & content = $ Critical "elif [$ status-eq 1] then curl" http: // 172.20.36.118/app/tms. do? TranCode = TM0310 & title = $ WarningT & content = Warning "else echo" OK "fidone


This article from the "Flying birds wings" blog, please be sure to keep this http://haolulu.blog.51cto.com/3164472/1244267

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.