One, monitoring server system load situation:
1, use the uptime command to view the current load situation (1 minutes, 5 minutes, 15 minutes average load situation)
# uptime
15:43:59 up 186 days, 20:04, 1 user, load average:0.01, 0.02, 0.00
Rule of thumb for system load: (detail reference: http://blog.csdn.net/skyline_loafer/article/details/26940539)
(1) The main observation "15 minutes system load", it as the normal operation of the computer indicators.
(2) If within 15 minutes (the system load divided by the number of CPU cores) The average load is greater than 1.0, indicating that the problem persists, not a temporary phenomenon.
(3) When the system load continues to be greater than 0.7, you must start investigating where the problem is and prevent the situation from deteriorating.
(4) When the system load continues to be greater than 1.0, you must find a solution to this value down.
(5) When the system load reaches 5.0, it indicates that your system has a serious problem, long time no response, or close to the freezing.
2. View the total number of cores of the server CPU
# grep-c ' model name '/proc/cpuinfo
3, intercept the server 1 minutes, 5 minutes, 15 minutes of the load situation
# Uptime | awk ' {print $8,$9,$10,$11,$12} ' (Note: Here are 1 symbols on the left!!!) )
Load average:0.01, 0.02, 0.00
4. View the average load of 15 minutes
# Uptime | awk ' {print $} '
(using ' {print $} ' is not accurate enough, and if you use awk to take the 12th field, the result may be empty.) and use the $NF table to output the last paragraph of the content)
# Uptime | awk ' {print $NF} '
5, write the system load monitoring script file:
# vim/scripts/load-check.sh
- #!/bin/bash
- #使用uptime命令监控linux系统负载变化
- #取系统当前时间 (write files in append >>)
- Date >>/scripts/datetime-load.txt
- #提取服务器1分钟, 5-minute, 15-minute load conditions
- Uptime | awk ' {print $8,$9,$10,$11,$12} ' >>/scripts/load.txt
- #逐行连接上面的时间和负载相关行数据 (re-write file > each time)
- Paste/scripts/datetime-load.txt/scripts/load.txt >/scripts/load_day.txt
# chmod a+x/scripts/load-check.sh
6, write the system load result file mail Send script:
# vim/scripts/sendmail-load.sh
- #!/bin/bash
- #把系统负载监控生成的load_day. txt files are sent to the user by mail
- #提取本服务器的IP地址信息
- ip= ' Ifconfig eth0 | grep "inet addr" | Cut-f 2-d ":" | Cut-f 1-d "" '
- #提取当前日期
- today= ' date-d "0 Day" +%y year%m month%d "
- #发送系统负载监控结果邮件
- echo "This is the system load monitoring report for the $IP server $today, please download the attachment. " | Mutt-s "System load monitoring report for the $IP server $today"-a/scripts/load_day.txt [email protected]
# chmod a+x/scripts/sendmail-load.sh
7, write the system load monitoring script file:
# vim/scripts/load-warning.sh
- #!/bin/bash
- #使用uptime命令监控linux系统负载变化
- #提取本服务器的IP地址信息
- ip= ' Ifconfig eth0 | grep "inet addr" | Cut-f 2-d ":" | Cut-f 1-d "" '
- #抓取cpu的总核数
- cpu_num= ' grep-c ' model name '/proc/cpuinfo '
- #抓取当前系统15分钟的平均负载值
- load_15= ' Uptime | awk ' {print $NF} '
- #计算当前系统单个核心15分钟的平均负载值, the result is less than 1.0 when the front single digit is 0.
- Average_load= ' echo 'scale=2; A= $load _15/$cpu _num;if (Length (a) ==scale (a)) print 0;print a "| BC '
#取上面平均负载值的个位整数
- Average_int= ' echo $average _load | Cut-f 1-d "." `
- #设置系统单个核心15分钟的平均负载的告警值为0.70 (i.e. alarm when using more than 70%).
- load_warn=0.70
- #当单个核心15分钟的平均负载值大于等于1.0 (that is, single-digit integer greater than 0), direct email alarm, if less than 1.0 two times comparison
- if (($average _int > 0)); Then
- echo "$IP Server 15-minute system average load of $average_load, exceeding the alert value of 1.0, please immediately handle!!! " | Mutt-s "$IP Server system load critical alarm!!! "[Email protected]
- Else
- #当前系统15分钟平均负载值与告警值进行比较 (1 is returned when the alarm value is greater than 0.70, and 0 is returned if it is less than)
- load_now= ' expr $average _load \> $load _warn '
- #如果系统单个核心15分钟的平均负载值大于告警值0.70 (return value is 1), send an email to the administrator
- if (($Load_now = = 1)); Then
- echo "$IP server 15 minutes of system average load reached $average _load, exceeding the alert value of 0.70, please timely processing. " | Mutt-s "$IP Server system load Alarm" [email protected]
- Fi
- Fi
# chmod a+x/scripts/load-warning.sh
8, join the task plan: The system load is detected every 10 minutes, there is an alarm immediately send mail (10 minutes to detect), every morning 8 points to send a system load Check report
# CRONTAB-E
- */10 * * * */scripts/load-check.sh
- */10 * * * */scripts/load-warning.sh
- 0 8 * * */scripts/sendmail-load.sh
(Original digest from: http://huangrs.blog.51cto.com/2677571/788379/)
Linux Server System load monitoring-shell script