Using Monit to monitor processes and system status in Linux

Source: Internet
Author: User
Tags dba fpm memory usage syslog time 0 unique id cpu usage inode usage

But the reality is brutal, many software itself stability to be promoted, the machine hardware resources upgrade will touch the cost, therefore in the cluster environment, has the redundancy, makes the implementation simple service restart becomes the most realistic choice.

This is not a difficult thing in itself, there are many ways to implement it, such as adding action or commands to Zabbix or nagios alarms, or writing scripts to execute on a scheduled task.

But what this article is going to introduce is a tool dedicated to doing such things: Monit.
Its biggest feature is that the configuration file is easy to read, while supporting process and system State monitoring, and flexible to provide a variety of detection methods, cycles, and alarm and response (restart Service, execute command, etc.)

System environment:
Os:centos 6.4 x86_64 Minimal

Specific configuration:
1. Install Epel Warehouse

The code is as follows Copy Code
# yum Install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

2. Install Monit Package

The code is as follows Copy Code
# yum Install Monit

3. Configuration Monit general parameters, including the opening of the HTTP statistics interface, mail alarm, etc.

The code is as follows Copy Code

# vim/etc/monit.conf

###############################################################################


# # Monit Control file


###############################################################################


##


# # Comments begin with a ' # ' and extend through the ' end of the '. Keywords


# # are case insensitive. All path ' s must is fully qualified, starting with '/'.


##


# # Below You'll find examples of some frequently used statements. For


# # Information about the control file and a complete list of statements and


# # options, please have a look in the Monit manual.


##


##


###############################################################################


# # Global Section


###############################################################################


##


# # Start Monit in the background (run as a daemon):


#


Set Daemon # Check services at 2-minute intervals


With start delay # Optional:delay the ' I Check by 4-minutes ' (by


# # Default Monit check immediately after monit start)


#


#


# # Set syslog logging with the ' daemon ' facility. If The facility option is


# # omitted, Monit'll use ' user ' facility by default. If you are want to log to


# a standalone log file instead, specify the full path to the log file


#


# set logfile syslog facility Log_daemon


#


#


### Set The location of the Monit ID file which stores the unique ID for the


### Monit instance. The ID is generated and stored on the start Monit. By


### default the file is placed in $HOME/.monit.id.


#


Set Idfile/var/run/monit/.monit.id


# www.111cn.net


### Set The location of the Monit state file which saves monitoring


### on each cycle. By default the ' file is ' placed in $HOME/.monit.state. If


### the state file was stored on a persistent filesystem, monit'll recover


### the monitoring state across reboots. If It is on temporary filesystem, the


### state is lost on reboot which May is convenient in some situations.


#


Set Statefile/var/run/monit/.monit.state


#


# # Set The list of mail servers for alert delivery. Multiple servers May


# # specified using a comma separator. By default Monit uses port 25-it is


# # possible to override this with the PORT option.


#


Set mailserver localhost


# set MailServer Mail.bar.baz, # Primary MailServer


# Backup.bar.baz Port 10025, # Backup mailserver on port 10025


# localhost # fallback relay


#


#


# # By default Monit would drop alert events if no mail servers are available.


# # IF You are want to keep the alerts for later delivery retry, can use the


# # EventQueue statement. The base directory where undelivered alerts would be


# # stored is specified by the BASEDIR option. You can limit the maximal queue


# # Size using the slots option (if omitted, the queue is limited by


# # available in the ' Back ' end filesystem).


#


Set EventQueue


Basedir/var/run/monit # Set the base directory where events would be stored


# slots # Optionally limit the queue size


#


#


# # Send Status and events to M/monit informations about M/monit


# # http://mmonit.com/).


#


# set Mmonit http://monit:monit@192.168.1.10:8080/collector


#


#


# # Monit By default uses the following alert mail format:


##


# #--8<--


# # from:monit@ $HOST # Sender


# # Subject:monit Alert-event: $EVENT Service: $SERVICE # Subject


##


# # Event: $EVENT Service: $SERVICE #


##                                           #


# # Date: $DATE #


# # Action: $ACTION #


# # Host: $HOST # Body


# # Description: $DESCRIPTION #


##                                           #


# # Your Faithful Employee, #


# # Monit #


# #--8<--


##


# # can override this message format or parts of it, such as subject


# # or sender using the Mail-format statement. Macros such as $DATE, etc.


# # are expanded at runtime. For example, to override the sender, use:


#


Set Mail-format {


From:monit@heylinux.com


Subject: [$SERVICE] $EVENT


Message


[$SERVICE] $EVENT

Date: $DATE
Action: $ACTION
Host:heylinux.com
Description: $DESCRIPTION

Your Faithful Employee,


Monit}


#


#


# # You can set alert recipients whom'll receive alerts If/when a


# # Service defined in this file has errors. Alerts May is restricted on


# # Events by using a filter as in the second example below.


#


Set Alert guosuiyu@foxmail.com


# Set Alert Sysadm@foo.bar # Receive all Alerts


# Set Alert Manager@foo.bar only on {timeout} # receive just service-


# # Timeout Alert


#


#


# # Monit has an embedded Web server which can is used to view status of


# # Services monitored and manage services from a Web interface. The


# # Monit Wiki If you are want to enable SSL for the Web server.


#


Set httpd port 2812 and


Use address localhost # accept connection from localhost


Allow localhost # allow localhost to connect to the server and


# Allow Admin:monit # require user ' admin ' with password ' monit '


# Allow @monit # allow users of group ' Monit ' to connect (rw)


# Allow @users readonly # allow users of group ' users ' to connect readonly


#


#


###############################################################################


# # Services


###############################################################################


##


# # Check General system resources such as load average, CPU and memory


# # Usage. Each test specifies a resource, conditions and the action


# # performed should a test fail.


#


# Check System Myhost.mydomain.tld


# if LOADAVG (1min) > 4 then alert


# if LOADAVG (5min) > 2 then alert


# If memory usage > 75% then alert


# If CPU usage (user) > 70% then alert


# If CPU usage (System) > 30% then alert


# If CPU usage (wait) > 20% then alert


#


#


# # Check A file for existence, checksum, permissions, UID and GID. In addition


# # to alert recipients in the global section, customized alert can is sent to


# # Additional recipients by specifying a, local alert handler. The service may


# # is grouped using the GROUP option. More than one group can is specified by


# # repeating the ' group name ' statement.


#


# Check file Apache_bin with PATH/USR/LOCAL/APACHE/BIN/HTTPD


# if failed checksum and


# expect the sum 8f7f419955cefa0b33a2ba316cba3659 then Unmonitor


# if failed permission 755 then Unmonitor


# if failed uid root then Unmonitor


# if failed GID root then Unmonitor


# alert Security@foo.bar on {


# checksum, permission, UID, GID, unmonitor


#} with the Mail-format {subject:alarm!}


# Group Server


#


#


# # Check That ' a process is running ', in the case Apache, and that it respond


# # to HTTP and HTTPS requests. Check its resource usage such as CPU and memory,


# # and number of children. If the process is not running, Monit'll restart


# # It by default. In case the service is restarted very often and the


# # problem remains, it is possible to disable monitoring using the TIMEOUT


# # statement. This service depends on another service (Apache_bin) which


# # is defined above.


#


# check process Apache with Pidfile/usr/local/apache/logs/httpd.pid


# Start program = '/etc/init.d/httpd start ' with timeout seconds


# Stop program = '/etc/init.d/httpd Stop '


# If CPU > 60% for 2 cycles then alert


# If CPU > 80% for 5 cycles then restart


# if Totalmem > 200.0 MB for 5 cycles then restart


# if children > then restart


# if LOADAVG (5min) greater than for 8 cycles then stop


# if failed host www.tildeslash.com Port protocol http


# and request '/somefile.html '


# then restart


# if failed port 443 type TCPSSL protocol HTTP


# with timeout seconds


# then restart


# if 3 restarts within 5 cycles then timeout


# depends on Apache_bin


# Group Server


#


#


# # Check filesystem permissions, UID, GID, Space and inode usage. Other services,


# # such as databases, may depend on this resource and a automatically graceful


# # cascaded to them before the filesystem would become full and data


# # Lost.


#


# check filesystem Datafs with PATH/DEV/SDB1


# Start program = '/bin/mount/data '


# Stop program = '/bin/umount/data '


# if failed permission 660 then Unmonitor


# if failed uid root then Unmonitor


# if failed GID disk then Unmonitor


# If spaces usage > 80% for 5 times within cycles then alert


# If space usage > 99% then stop


# if Inode usage > 30000 then alert


# if Inode usage > 99% then stop


# Group Server


#


#


# # Check A file ' s timestamp. In the example, we test if a file is older


# # than minutes and assume something is wrong if it not updated. Also,


# # If the file size exceed a given limit, execute a script


#


# Check file database with Path/data/mydatabase.db


# if failed permission then alert


# if failed UID data then alert


# if failed GID data then alert


# if timestamp > minutes then alert


# if size > MB then exec "/my/cleanup/script" as UID DBA and GID DBA


#


#


# # Check Directory permission, UID and GID. An event is triggered if the


# # Directory does not belong to the ' user with uid 0 and GID 0. In addition,


# # The permissions have to match the octal description of 755 (in the chmod (1)).


#


# Check Directory bin with Path/bin


# if failed permission 755 then Unmonitor


# if failed UID 0 then Unmonitor


# if failed GID 0 then Unmonitor


#


#


# # Check A remote host availability by issuing a ping test and check the


# # Content of a response from a Web server. Up to three pings are sent and


# # Connection to a port and the application level network the check is performed.


#


# Check host MyServer with address 192.168.1.1


# if failed ICMP type echo count 3 with timeout 3 seconds then alert


# if failed port 3306 protocol MySQL with timeout seconds then alert


# if failed URL http://user:password@www.foo.bar:8080/?querystring


# and content = = ' action= ' J_security_check '


# then alert


#


#


###############################################################################


# # Includes


###############################################################################


##


# # It is possible to include additional configuration parts from other files or


# # directories.


#


# include/etc/monit.d/*


#


#

# Include All Files from/etc/monit.d/
include/etc/monit.d/*

4. For example configuration for nginx,php-fpm,mysql and root partition dosage monitoring
In the configuration file in step 3, you can see that there are many configuration examples in the annotated code that are sufficient for reference.
Here, to share with you the relevant monitoring items I created in my VPS:

The code is as follows Copy Code

# Vim/etc/monit.d/nginx

Check process Nginx with Pidfile/webserver/nginx/run/nginx.pid
Start program = "/webserver/init.d/nginx start" with timeout seconds
Stop program = "/webserver/init.d/nginx Stop"
If failed host heylinux.com Port protocol http
With timeout seconds
Then restart
If 3 restarts within 5 cycles then timeout
Group webserver
# VIM/ETC/MONIT.D/PHP-FPM

Check process php-fpm with Pidfile/webserver/php/logs/php-fpm.pid
Start program = "/webserver/init.d/php-fpm start" with timeout seconds
Stop program = "/WEBSERVER/INIT.D/PHP-FPM Stop"
If CPU > 80% for 5 cycles then restart
If Loadavg (5min) greater than 4 for 5 cycles then restart
If 3 restarts within 5 cycles then timeout
Group webserver
# Vim/etc/monit.d/mysql

Check process MySQL with Pidfile/webserver/mysql/run/mysqld.pid
Start program = "/webserver/init.d/mysqld start" with timeout seconds
Stop program = "/webserver/init.d/mysqld Stop"
If failed port 3306 protocol MySQL
With timeout seconds
Then restart
If 3 restarts within 5 cycles then timeout
Group webserver
# Vim/etc/monit.d/rootfs

Check filesystem Rootfs with Path/dev/xvde
If spaces usage > 80% for 5 times within cycles then alert
Group OS

5. Start Monit

The code is as follows Copy Code
# Mkdir/var/run/monit
#/etc/init.d/monit Start

6. Simulate Nginx process failure, test monit response behavior and alarm
Stop Nginx Process

The code is as follows Copy Code

#/webserver/init.d/nginx Stop
#/webserver/init.d/nginx Status

Nginx is stopped

Watch log output

The code is as follows Copy Code

# Tailf/var/log/monit

[CST APR 01:11:55] Error:skipping/var/run/monit/.monit.id-unknown data format
[CST Apr 01:11:55] error:aborting event/var/run/monit/.monit.state-invalid size 5
[CST APR 01:41:56] Error: ' Nginx ' process is not running
[CST Apr 01:41:56] Info: ' Nginx ' trying to restart
[CST Apr 01:41:56] Info: ' nginx ' Start:/webserver/init.d/nginx

Check to see if Nginx is started by monit

The code is as follows Copy Code

#/webserver/init.d/nginx Status

Nginx (PID 22419 22417) is running ...

To view the messages received, a total of two


Message that prompts for a service exception


Messages that prompt service recovery

View the status of all Monit monitoring items

The code is as follows Copy Code

# Monit Status

The Monit daemon 5.1.1 uptime:1h 8m

FileSystem ' Rootfs '


Status accessible


monitoring status monitored


Permission 660


UID 0


GID 6


FileSystem Flags 0x1000


Block size 4096 B


Blocks Total 2580302 [10079.3 MB]


Blocks free for non superuser 1800023 [7031.3 MB] [69.8%]


Blocks free Total 1931088 [7543.3 MB] [74.8%]


Inodes Total 655360


Inodes free 607619 [92.7%]


Data collected Sat APR 12 02:17:58 2014

Process ' PHP-FPM '


Status Running


monitoring status monitored


PID 13768


Parent PID 1


Uptime 6h 14m


Children 5


Memory kilobytes 3124


Memory Kilobytes Total 220032


Memory percent 0.5%


Memory percent Total 36.3%


CPU percent 0%


CPU percent Total 5.8%


Data collected Sat APR 12 02:17:58 2014

Process ' Nginx '


Status Www.111cn.net Running


monitoring status monitored


PID 22417


Parent PID 1


Uptime 36m


Children 1


Memory Kilobytes 1244


Memory Kilobytes Total 29256


Memory percent 0.2%


Memory percent Total 4.8%


CPU percent 0%


CPU percent Total 0%


Port response time 0.144s to heylinux.com:80 [HTTP via TCP]


Data collected Sat APR 12 02:17:58 2014

Process ' MySQL '


Status Running


monitoring status monitored


PID 21502


Parent PID 21026


Uptime 1h 13m


Children 0


Memory Kilobytes 44988


Memory Kilobytes Total 44988


Memory percent 7.4%


Memory percent Total 7.4%


CPU percent 0.2%


CPU percent Total 0.2%


Port response time 0.001s to localhost:3306 [MYSQL via TCP]


Data collected Sat APR 12 02:17:58 2014

System ' Ec2-tokyo.localdomain '


Status Running


monitoring status monitored


Load average [0.12] [0.08] [0.03]


CPU 5.9%us 0.5%sy 0.4%wa


Memory usage 314260 KB [51.9%]


Data collected Sat APR 12 02:17:58 2014

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.