Java server-side monitoring scheme (Ganglia Nagios Java chapter)

Source: Internet
Author: User
Tags bind http digest authentication json php file php and php script static class throw exception

Java server-Side monitoring scenarios (I. Summary of the article)

Instead of octopress, write articles more simple than before, blog access up a lot faster, with this to the past a period of time to summarize the technical accumulation. As a start, I'm going to write a series of articles summarizing some of the work that I did at the company last year on Java server-side monitoring.
Monitoring is a vital part of any service-side application. A system in the running process is too prone to failure, network, storage, system load, software bugs, any point of the problem may affect the entire system of stable operation, therefore, monitoring is essential. A complete system monitoring program to help us from two aspects:
Continuously check the stability of various services, the first time to notify the relevant personnel
Record the operating indicators of the system to help operators to fully understand the operation of the system, so as to prevent the situation
For the first aspect, in fact, is to achieve the first time the fault alarm, so that the system in the middle of the night when the problem can be a message to the poor Yun-Wei classmate called up to restore services. This matter, Nagios is basically the best solution. It has a wide variety of plug-ins, and writing custom plug-ins is extremely simple, can easily monitor from the operating system to the application of all aspects. Its configuration file is simple and easy to understand, and very powerful, a little familiar can easily according to their own needs to configure.
For the second aspect, Nagios is not particularly suitable. It works in a period of time to check whether the service is normal, unable to record a number of indicators, not convenient to observe the system changes in various indicators. Of course, some plug-ins seem to be able to do this, but we chose a different solution: between graphite and Ganglia, I finally chose Ganglia.
Ganglia is a system monitoring project initiated by the University of California, Berkeley, designed for large-scale high-performance computing clusters, with sophisticated technology implementations such as RRD and XML, with small single node overhead, a fairly reliable fault-tolerant mechanism, and the easy extension and inclusion of custom Metric. One of the more famous users of Ganglia is Wikipedia, where you can visit their Ganglia examples to see how the Wikipedia cluster works.
Ganglia + Nagios, this is our monitoring system for the choice of solutions. But one problem with this is that both have their own surveillance agent,nagios need Npre,ganglia to Gmond. It is obvious that installing two monitoring systems on each machine is not good, installation maintenance trouble does not say, also does not conform to a system to do only one thing principle.
Fortunately this question has long been thought of, the method is very simple, between the two establishes a bridge, lets the Nagios can use the Ganglia data directly to be good. There are many solutions, which I will detail in the Nagios section of this series.
Ganglia and Nagios can directly monitor the system level of health such as Cpu,load, disk, memory, and so on, but for Java applications, we also need additional means to collect data. The standard solution for Java monitoring is that many run-time parameters of the JMX,JVM itself are exposed through JMX, such as memory, GC and other related parameters, and developers can simply customize the MBean to expose the parameters of the application itself. With tools such as Jmxtrans, we can periodically get JMX data for a JVM instance and send it to Ganglia or graphite backend. In the Java and JMX sections, I'll explain them in more detail.
A Java library called Metrics has recently been found to facilitate developers to write application monitoring related code, and it directly supports the Ganglia and graphite backend. This would be a more concise alternative if JMX was not required.
Here's a brief introduction to the Ganglia, Nagios, and Java monitoring in more detail.

Two. Ganglia

Ganglia is a system monitoring project initiated by the University of California, Berkeley, designed for large-scale high-performance computing clusters, with sophisticated technology implementations such as RRD and XML, with small single node overhead, a fairly reliable fault-tolerant mechanism, and the easy extension and inclusion of custom Metric. One of the more famous users of Ganglia is Wikipedia, where you can visit their Ganglia examples to see how the Wikipedia cluster works.
1. Installation
Our server environment is Redhat Enterprise Linux 6.4 x86_64 Edition, this article is based on this release to introduce the installation and configuration. Other distributions or architectures are similar.
Ganglia consists of three components: Gmond, Gmetad and Ganglia-web. Where Gmond is the core process, responsible for sending and receiving metric, each associated node needs to install Gmond. Gmetad is a service that aggregates metric data, which is typically installed on a single machine, and can be installed on multiple machines if high availability is to be considered. Ganglia-web is a web interface based on PHP, which is typically installed on only one machine.
The default gmond is done in multicast mode so that each gmond in the same network receives metric data for all nodes in the cluster. This allows any one of the monitoring nodes to be hung and can be switched to any other surviving node. But our cluster is small, not to take this solution, but in a unicast way, by a node to receive all the data, and only this node installed Gmetad and Ganglia-web. This is simpler, and the stability in practice is quite sufficient.
As a result, Gmetad may not be needed in the manner of deployment, and the sections on Gmetad in the following steps can be ignored.
1.1 Installation Dependencies
First make sure that the following packages are properly installed, which are available in the official source (if you do not purchase the RHEL authorization, you can configure the use of the CentOS source, also can be used).

Yum-y Install apr-devel apr-util check-devel cairo-devel
Pango-devel Libxml2-devel Rpmbuild Glib2-devel
Dbus-devel Freetype-devel Fontconfig-devel gcc-c++
Expat-devel Python-devel Libxrender-devel Pcre-devel
Perl-extutils-makemaker
The second need to install Libconfuse, this library although in the source has, but the version looks relatively old, I was found through the RPM find the direct installation of RPM. Note that both Libconfuse and libconfuse-devel need to be installed because they also need to be used when compiling Ganglia.
Finally need to install Rrdtools, this source is not, can only compile their own installation. In its official website download the latest version of the source code, after decompression, the following steps to compile the installation can be.

./configure--PREFIX=/USR
Make
sudo make install
1.2 Compiling Ganglia
Download the latest version of the source from the Ganglia website, after decompression, follow the steps to compile the installation:

./configure--with-gmetad
Make
sudo make install
Press This configuration Ganglia will be installed under/usr/local, whose configuration file is under/USR/LOCAL/ETC.
After installation, it is highly recommended to copy the service scripts of the Ganglia two components Gmond and Gmetad to/ETC/INIT.D to facilitate the use of service and chkconfig to manage Ganglia services. These two scripts are located in the source directory of Gmond/gmond.init and Gmetad/gmetad.init. Before copying, notice that the values of both the GMOND and Gmetad variables in both scripts are the correct executable path, and they are all under/usr/sbin, to be modified to/usr/local/sbin.

sudo cp Gmond/gmond.init/etc/init.d/gmond
sudo cp Gmetad/gmetad.init/etc/init.d/gmetad
2. Configure
Note that all configuration items are not mentioned here, but only the key ones are listed. In the actual operation, please modify the default configuration file directly, which has detailed comments to explain the configuration items.
2.1 Gmond Configuration
The Gmond configuration file is in/usr/local/etc/gmond.conf.
The configuration of the cluster name, primarily name and owner, is customizable. All Gmond are configured to be the same under the same cluster.

/*
* The cluster attributes specified is used as part of the <CLUSTER>
* tag that would wrap all hosts collected by this instance.
*/
Cluster {
Name = "My-cluster"
Owner = "Jerry"
Latlong = "Unspecified"
url = "Unspecified"
}
Host configuration. Note that this only gives each host a name, but also can be arbitrarily named, but the proposed direct utility hostname.

/* The host section describes attributes of the host, like the location * *
Host {
Location = "Host1"
}
Multicast mode configuration
This is the default and basically does not need to modify the configuration file, and all nodes are configured the same. The advantage of this model is that all nodes have complete data on the Gmond, and it is convenient to gmetad any one of them to get all the monitoring data of the whole cluster.
One of the possible modifications is the MCAST_IF parameter, which specifies the multicast network interface. If there are more than one network card, to fill in the corresponding intranet interface.

/* Feel free to specify as many udp_send_channels as. Gmond
Used to only support has a single channel * *
Udp_send_channel {
Bind_hostname = yes # highly recommended, soon to be default.
# This option tells Gmond to use a source address
# That's resolves to the machine ' s hostname. Without
# This, the metrics could appear to come
# interface and the DNS names associated with
# Those IPs is used to create the rrds.
Mcast_join = 239.2.11.71
mcast_if = em2
Port = 8649
TTL = 1
}

/* You can specify as many udp_recv_channels as as. */
Udp_recv_channel {
Mcast_join = 239.2.11.71
mcast_if = em2
Port = 8649
bind = 239.2.11.71
Retry_bind = True
# Size of the UDP buffer. If you are are handling lots of metrics you really
# should bump it up to e.g 10MB or even higher.
# buffer = 10485760
}
Unicast mode configuration
The receiving Channel configuration on the monitoring machine. We use UDP unicast mode, very simple. Our cluster has part of the machine in another room, so listen to 0.0.0.0, if the entire cluster is in an intranet, it is recommended to bind only the intranet address. If you have a firewall, open the associated port.

/* Alternative UDP Channel * *
Udp_recv_channel {
bind = 0.0.0.0
Port = 8648
}
The sent Channel configuration on the monitored node. Also very simple, the host fills in the IP of the monitor node, the port fills in the listening port configured above. Where the TTL is set to 1, because we are sent directly to the target node, without passing through the middle of the gmond forwarding.

/* Feel free to specify as many udp_send_channels as. Gmond
Used to only support has a single channel * *
Udp_send_channel {
Bind_hostname = yes # highly recommended, soon to be default.
# This option tells Gmond to use a source address
# That's resolves to the machine ' s hostname. Without
# This, the metrics could appear to come
# interface and the DNS names associated with
# Those IPs is used to create the rrds.
Host = 192.168.221.9
Port = 8648
TTL = 1
}
Next is the metric configuration that is collected and sent by default. This is reserved by default. The default collection of metric includes: CPU, Load, Memory, Disk, and network, which basically covers all the parameters of the operating system level. The following is a configuration fragment that needs to be configured in this format if we are to add a custom monitoring metric:

Collection_group {
Collect_every = 40
Time_threshold = 180
Metric {
Name = "Disk_free"
Value_threshold = 1.0
title = "Disk Space Available"
}
Metric {
Name = "Part_max_used"
Value_threshold = 1.0
title = "Maximum Disk space Used"
}
}
2.2 Gmetad Configuration
The Gmetad configuration file is located in/usr/local/etc/gmetad.conf. One of the most important configuration items is data_source:

Data_source "My-cluster" localhost:8648
If you are using the default 8649 port, the port portion can be omitted. If you have more than one cluster, you can specify multiple data_source, one for each row.
Finally, the Gridname configuration is used to name the entire Grid:

Gridname "My Grid"
Other configurations retain the default values.
2.3 Ganglia-web Installation and configuration
First download the Ganglia-web source package from the official website and unzip it. If you use Apache as your WEB server, you can modify the related variables in Makefile and perform the sudo make install to install them.

# Location where Gweb should is installed to (excluding Conf, Dwoo dirs).
Gdestdir =/var/www/html/ganglia

# Gweb Statedir (where conf dir and dwoo templates dir are stored)
Gweb_statedir =/var/lib/ganglia-web

# Gmetad RootDir (parent location of RRD folder)
Gmetad_rootdir =/var/lib/ganglia

# User by which your webserver is running
Apache_user = Apache
We use Nginx as the Web Server, you can extract the installation package directly to the target location, modify the relevant configuration file. Make sure the system installation is configured with Nginx and PHP. The following is a sample Nginx configuration:

server {
Listen 80;
server_name monitor.xxx.com;
CharSet UTF8;
Access_log logs/$host. Access.log main;
Root/var/www/ganglia;
Index index.php;
Include Mime.server.common;
Auth_digest_user_file/var/www/ganglia/users.htdigest;

Location ~. *. (PHP|PHP5)? $ {
Auth_digest ' XXX Monitor System ';
Auth_digest_timeout 60s;
Auth_digest_expires 3600s;
Fastcgi_split_path_info ^ (. +?). PHP) (/.*) $;
if (!-f $document _root$fastcgi_script_name) {
return 404;
}
Fastcgi_pass 127.0.0.1:9000;
Fastcgi_index index.php;
Include fastcgi.conf;
Fastcgi_param Ganglia_secret Yoursupersecret;
if ($http _authorization ~ username= "([^"]+) ") {
Set $htdigest _user $;
}
Fastcgi_param Remote_user $htdigest _user;
}
}
Please adjust the relevant parameters such as Ganglia-web path and PHP-FPM address according to the actual situation of the deployment.
Note that we are enabled for authentication and are using HTTP Digest authentication (a third-party nginx module is required). It is also possible to use your own HTTP Basic authentication, but it is recommended that you use HTTPS to enhance security.
Which Ganglia_secret and remote_user these two fastcgi parameters are used to control ganglia authentication, where Ganglia_secret is a custom string, casually fill out.
The ganglia-web itself also needs to be configured for authentication to take effect. Add a conf.php file to the Ganglia-web directory to write to the configuration:

<?php

#
# ' readonly ': No authentication is required.
# All users could view all resources.
# No edits are allowed.
# ' enabled ': Guest users may view public clusters.
# Login is required to make changes.
# An administrator must configure
# authentication scheme and ACL rules.
# ' disabled ': The Guest users may perform any actions,
# including edits. No authentication is required.
$conf [' auth_system '] = ' enabled ';

$acl = Gangliaacl::getinstance ();

$acl->addrole (' admin ', gangliaacl::admin);

?>
Where the Admin user is configured as an administrator, you can also assign different roles to other users as needed.
2.4 Start
Add system services with Chkconfig and start Gmond and Gmetad services:

sudo chkconfig--add gmond
sudo chkconfig--add Gmetad
sudo service Gmond start
sudo service Gmetad start
After you start the system, you can access Ganglia view graphics through the Web interface.
3. Extended
Ganglia is very extensible and makes it easy to use some of the mechanisms it provides to create custom metric to collect application data. It mainly provides the following ways:
Python Extensions
Gmetric command
The Ganglia data format and protocol are completely open, and third party applications can also send metric data to Gmond in its format. The following chapters in this series will detail the integration of JMX and ganglia in this way.
Ganglia officials have also collected a number of third-party extensions that can be used to monitor applications such as Redis, MySQL, and so on, and can be chosen as needed.
Here's a little bit of a way to address the Ganglia default disk monitoring by using Python extensions.
3.1 Small improvements to disk monitoring
Ganglia default disk metric, which is for all the disks in the system, adds up all the total disk space and free space. This way, when a partition is nearly full, other partitions are much more free, through the ganglia can not find the problem completely.
Fortunately, this problem can be easily solved by customizing the extension. I found a ganglia multidisk extension on the internet and made a little improvement on my own. The relevant documents are in this gist.
Put the multidisk.py file under/usr/local/lib64/ganglia/python_modules (create the directory if it doesn't exist), and put the multidisk.pyconf configuration file in/usr/local/ ETC/CONF.D, restart Gmond.
The configuration file can be added to the partition as needed. Where the parameter name is the mount point to remove the beginning of/, and the middle of the/replace with _, and then keep up with _disk_total and _disk_free. The special case is the root partition, the name is Root_disk_free and Root_disk_total. This is the small hack I made, originally this script uses the device name (DEV_SDA1) this. However, different machine equipment names are not the same, sometimes it is inconvenient to unify the monitoring of certain types of partitions.
The following is an example of monitoring root partitions and/var partitions.

Modules {
module {
name = ' Multidisk '
Language = ' python '
}
}

Collection_group {
Collect_every = 120
Time_threshold = 20

Metric {
Name = "Root_disk_total"
title = "Root Partition Total"
Value_threshold = 1.0
}

Metric {
Name = "Root_disk_free"
title = "Root Partition free"
Value_threshold = 1.0
}

Metric {
Name = "Var_disk_total"
title = "Var Partition Total"
Value_threshold = 1.0
}

Metric {
Name = "Var_disk_free"
title = "Var Partition free"
Value_threshold = 1.0
}

}
4. Summary
The above almost Ganglia basic installation, configuration and use are introduced. The graphical interface does not introduce too much, but I believe that when the reader is installed, you can see it, or go to the Wikipedia Ganglia instance to do it.


Three. Nagios article

Introduces the core of our surveillance program: Ganglia, and this time we continue to introduce the Nagios system for alarms, and how to get Nagios to use Ganglia as a data source for alarms.
Nagios, originally named Netsaint, was implemented and maintained by Ethan Galstad and some other developers. It is cross-platform and can run on all UNIX-class operating systems in the mainstream. Nagios provides a WEB interface based on PHP and CGI, and contains a number of plug-ins for monitoring various network services and network devices. Nagios's basic work is to periodically check the service on the user's configured host, which generates alarms (WARN or CRITICAL level) and notifies the administrator by email or text message if an exception is found.
A normal Nagios system needs to install npre on a monitored machine for remote execution of instructions to obtain monitoring data. But our program already has Ganglia to collect the monitoring data, can omit the npre completely, obtains the data directly from the Ganglia.
The following describes the installation and configuration of Nagios, as well as the integration with Ganglia.
1. Installation and operation
Nagios in the source of Redhat Epel, but the version is older (3.5.x), so we choose to download the source code, compile and install. The latest version of the Nagios 4.x source can be downloaded from here: http://sourceforge.net/projects/nagios/files/nagios-4.x/.
1.1 Compile and install Nagios-core
The first step is to establish the necessary users and groups:

sudo groupadd Nagios
sudo useradd-g nagios Nagios
sudo gpasswd-a nobody Nagios
Our Nginx and fcgiwrap users are nobody, so add it to the Nagios group to run its CGI program. If you use Apache as a Web Server, you can add Apache users to the Nagios group.
After extracting the source code into the source directory, execute the following command to compile the installation:

./configure--with-command-group=nagios
Make
sudo make install
Nagios will be installed under/usr/local/nagios, its configuration files under the ETC subdirectory, CGI under Sbin, PHP and HTML, and static resources under share.
Next enable the Nagios service:

sudo chkconfig--add Nagios
sudo chkconfig--level Nagios on
1.2 Configuration Nginx
Nagios with Apache configuration, you can install it directly and enable it. The Nginx we use, by contrast, are much more troublesome. First Nginx does not directly support CGI, only supports FastCGI, so you need to use the Fcgiwrap tool. Specific configuration is no longer described in detail, you can refer to some of the online tutorials.
Our Nagios and Ganglia run on the same server and share the same domain name, so the configuration is put together. Where monigor.foobar.com is the address of the Ganglia, and Monitor.foobar.com/nagios is the Nagios address.
The full configuration is as follows, where the location contains the/nagios Nagios-related configuration:

server {
Listen 80;
server_name monitor.foobar.com;
CharSet UTF8;
Access_log logs/$host. Access.log main;
Root/usr/local/www/ganglia;
Index index.php index.html index.htm;
Include Mime.server.common;
Auth_digest_user_file/usr/local/www/ganglia/users.htdigest;

Location ~ ^/nagios/(. *.php) $ {
Set $phpfile $;
Alias/usr/local/nagios/share/$phpfile;
Auth_digest ' Foobar Monitor System ';
Auth_digest_timeout 60s;
Auth_digest_expires 3600s;
if ($http _authorization ~ username= "([^"]+) ") {
Set $htdigest _user $;
}
Fastcgi_pass 127.0.0.1:9000;
Fastcgi_index index.php;
Fastcgi_param script_filename $request _filename;
Fastcgi_param auth_user $htdigest _user;
Fastcgi_param Remote_user $htdigest _user;
Include fastcgi.conf;
}

Location ~. *. (PHP|PHP5)? $ {
Auth_digest ' Foobar Monitor System ';
Auth_digest_timeout 60s;
Auth_digest_expires 3600s;
Fastcgi_split_path_info ^ (. +?). PHP) (/.*) $;
if (!-f $document _root$fastcgi_script_name) {
return 404;
}
Fastcgi_pass 127.0.0.1:9000;
Fastcgi_index index.php;
Include fastcgi.conf;
Fastcgi_param Ganglia_secret super-secret;
if ($http _authorization ~ username= "([^"]+) ") {
Set $htdigest _user $;
}
Fastcgi_param Remote_user $htdigest _user;
}

Location/nagios {
Alias/usr/local/nagios/share;
Index index.php index.html index.htm;
}

Location/nagios {
Alias/usr/local/nagios/share;
Index index.php index.html index.htm;
}

location/nagios/cgi-bin/{
alias/usr/local/nagios/sbin/;
Auth_digest ' Foobar Monitor System ';
Auth_digest_timeout 60s;
Auth_digest_expires 3600s;
if ($http _authorization ~ username= "([^"]+) ") {
Set $htdigest _user $;
}
Fastcgi_param script_filename $request _filename;
Fastcgi_param auth_user $htdigest _user;
Fastcgi_param Remote_user $htdigest _user;
Include fastcgi.conf;
Fastcgi_pass Unix:/var/run/fcgiwrap.sock;
}
}
If you are going to set a single domain name for Nagios, and path is not/nagios, you will also need to modify the Url_html_path path in/usr/share/nagios/etc/cgi.cfg file to/, while/usr/share/nagi The config.inc.php in Os/share also need some modifications:

$cfg [' Cgi_base_url ']= '/cgi-bin '; Default is/nagios/cgi-bin
1.3 Run
Enable the Nagios service:

sudo service Nagios start
After configuring the Nginx to reboot or overload the configuration file, make sure the fcgiwrap is running properly, and access the Monitor.foobar.com/nagios to see the effect of Nagios operation.
2. Nagios Configuration
Nagios's configuration files are under/usr/local/nagios/etc, and the primary configuration file is nagios.cfg, which can contain other profiles through Cfg_file configuration items. These child configuration files are generally placed in the objects subdirectory, separated by responsibilities, such as hosts.cfg configuration to monitor the host, Contacts.cfg is the contact, Commands.cfg is the command configuration and so on.

Cfg_file=/usr/local/nagios/etc/objects/commands.cfg
Cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
Cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
Cfg_file=/usr/local/nagios/etc/objects/templates.cfg
The Nagios configuration provides a template mechanism for extracting common configuration items into a template, and then inheriting them when defining actual objects. The default profile has a variety of templates defined, all in templates.cfg.
The following configurations are modified on the basis of the default configuration, and some of the default templates are used. In addition, I nagios.cfg in the localhost.cfg to disable, because the back and Ganglia integration, do not need to do the monitoring of the machine, but all based on the Ganglia data to achieve.

#cfg_file =/usr/local/nagios/etc/objects/localhost.cfg
2.1 Host and Host Group configuration
The first is the configuration of hosts and groups. The first is the host configuration:

Define host{
   use          linux-server          #使用linux-server template
   host_name    mysql2.foobar.com    #主机名
   alias        MySQL Slave Server 1 #别名
   address      192.168.1.9           #IP地址
   hostgroups   all-servers     #所属组, You can specify multiple, comma-separated
  }
and then host group configuration:

Define hostgroup{
   hostgroup_name  mysql-slaves                 #组名
   alias            MySQL slaves                 #组别名
   members         mysql2.foobar.com , mysql3.foobar.com #组成员
  }
to see that the host and group ownership can be specified either at the host or at the HostGroup point. The former uses groups with many members and the latter is suitable for groups with fewer members. I have defined a group called All-servers, which contains all the hosts, and is more convenient for configuring service that all servers need to monitor, such as Load, disk, memory, and so on. The
2.2 command Configuration
command is the Nagios you want to perform when you actually check a service. The commands can be provided by the Nagios built-in Plug-ins, provided by Third-party Plug-ins, and can even be written by themselves as script implementations. Here is a Command that we check the MySQL Slave state.

Define command{
 command_name  check_mysql_slave #指令名称
  #命令行
 command_line/usr/ Local/nagios/plugins/check_mysql_slavestatus.sh-h $HOSTNAME $-P 3306-u xxx-p xxx-w $ARG 1$-c $ARG 2$
}
This script is for online A Third-party plug-in, in fact, is a Shell script. You can see that there are some variables in the command_line definition: $HOSTNAME $, $ARG 1, $ARG 2$. When Nagios actually executes, they are replaced with the actual values. $HOSTNAME $ will be replaced with the host name, $ARG 1 and $ARG 2 will be replaced with the parameters specified at the service definition. In this example, these two parameters are used to specify the alarm thresholds for the MySQL Slave Seconds behind Master, $ARG 1 is the WARN threshold, $ARG 2 is the CRITICAL threshold value. The
2.3 service Configuration
Service is the core configuration that defines which services are checked on a host or host group, and what directives are used to check them. Take the MySQL slaves above as an example:

Define service{
   use                     generic-service    #使用的模板
   hostgroup_ name         mysql-slaves       #要检查的主机组
   service_description    MySQL Slave Status #服务描述
   check_command & nbsp;        check_mysql_slave!1200!2400 #命令
  }
where you use Hostgroup_name, you can also use HOST_NAME to specify individual hosts for a host group.
Check_command Specifies the command and parameters, with! delimited. For this example, we specify that the Slave behind Master will emit a WARN level alarm at more than 1200 seconds, and CRITICAL level alarms will be issued at least 2,400 seconds. The
above is almost the main configuration of the Nagios, you can see it is very simple. The
2.4 Contacts Configuration
Contact is an associate configuration, which is used primarily to set up users and user groups in Contacts.cfg files.

Define contact{
   contact_name                     Jerry
   use                               generic-contact
   alias                             Jerry Peng
   email                            jerry.peng@foobar.com
  }
One of the most important is the email configuration, which determines whether you can receive Nagios alert messages. In terms of the
Contact Group, we did not subdivide, only using the default Admins group, which is also the default service template Generic-service defined in the Contact group, so users in this group can receive all the alert messages.

Define contactgroup{
   contactgroup_name       Admins
    alias                    Nagios Administrators
   members                  jerry,drizzt
  }
If you have more services, you want different administrators to be responsible for different services, You can define more than one group and use the Contact_groups configuration item to specify the contact group when defining the service.
3. Integration with Ganglia
Our surveillance program is Ganglia, so Nagios alarms are based on Ganglia data (of course some types of monitoring are done directly via Nagios, such as the MySQL above Slave monitoring). There are many ways to integrate the two, Ganglia's Github WIKI has a summary.
We are using the first scenario: Ganglia Web Nagios Script. This program is Ganglia built, installed well, relatively simple and convenient. It's done with a PHP script in the Ganglia Web, plus a Bash script. The principle is simple, the Bash script accesses the PHP script of the WEB system via curl, passing in the parameters and hosts to check, and the alarm thresholds. The
Command is defined as follows:

Define command{
 command_name  check_ganglia_metric
 command_line /bin/sh/usr/local /www/ganglia/nagios/check_ganglia_metric.sh host= $HOSTNAME $ metric_name= $ARG 1$ operator= $ARG 2$ critical_value=$ arg3$
}
The three parameters are the metric name to be monitored, the operator (more/less), and the CRITICAL threshold value.
The following are several service definitions that are defined using this Command:
disk space monitoring of all hosts ' root partitions, and alarms when the remaining space is less than 20G:

Define service{
  use                              Generic-service
  hostgroup_name                   all-servers
  service_description              Root Partition free spaces
  check_command                    check_ganglia_metric!root_disk_free!less!20
 }
One-minute loadavg check for all hosts, greater than 16 o'clock alarms (our machines are all over 16 cores):

Define service{
   use                              Generic-service
   hostgroup_name                   all-servers
   service_description              Load one
   check_command                    Check_ganglia_metric !load_one!more!16
  }
A Java-applied performance-parameter alarm, which is exposed by JMX and Jmxtrans to Ganglia, when the core engine's processing time exceeds two seconds. The following chapters will explain this in more detail):

Define service{
   use                              Generic-service
   host_name                        engine1.foobar.com
   service_ description             Engine Process Time
    check_command                    check_ganglia_metric!engine1. processtime!more!2000
  }
You can see that any Metric in the Ganglia can be used in such a way to make alarms, very convenient.
3. Summary
At this point, our monitoring scheme is basically shaped, through which we can see the change of a certain monitoring parameter, or we can do alarm mechanism for them.
wrote a few, and Java-related things have not been shadow, indeed a bit of the title of the party's suspicion. Do not worry, with these basic systems, the rest of the Java-related things are simple, nothing more than the idea of recording the application of some monitoring parameters, and integration into the Ganglia. Next time I will introduce this in detail and share our experience.


Java Chapter

The main features are JMX and jmxtrans that will monitor JMX and send data to Ganglia, as well as a simple way to record performance parameters that I have implemented.
1. JMX
JMX is basically a standard solution for Java application monitoring, and the JVM's own performance metrics such as memory usage, GC, thread, and so on have corresponding JMX parameters to monitor. Customizing an MBean is also a very simple thing to do. You can define an Mbean in two ways, the first one is through custom interfaces and corresponding implementation classes, and the other is implementing the Javax.management.DynamicMBean interface to define a dynamic Mbean. We're using the second way, so skip the first way, and interested readers can refer to the tutorials in the Java Tutorial and the articles on Javalobby.
Here are the Metricmbean that we use internally, using the Dynamicmbean implementation:

public class Metricsmbean implements Dynamicmbean {

Private final map<string, metric> metrics;

Public Metricsmbean (map<string, metric> metrics) {
This.metrics = new hashmap<> (metrics);
}

    @Override
    public Object getattribute (String attribute)
  & nbsp;         throws Attributenotfoundexception,
                    Mbeanexception,
                   reflectionexception {
        Metric Metric = metrics.get (attribute);
        if (metric = null) {
             throw new Attributenotfoundexception ("attribute" + attribute + "not found");
       }
        return Metric.getvalue ();
   }

    @Override
    public void setattribute (attribute)
             throws Attributenotfoundexception,
                    Invalidattributevalueexception,
                    mbeanexception,
                    reflectionexception {
        //We just need to do monitoring, there is no need to set properties, so direct throw exception
        throw new Unsupportedoperationexception ("Setting attribute is not supported");
   }

    @Override
    public attributelist getattributes (string[] attributes) {
& nbsp;       attributelist attrlist = new AttributeList ();
        for (String attr:attributes) {
             Metric Metric = Metrics.get (attr);
            if (metric!= null)
                 Attrlist.add (New Attribute (attr, Metric.getvalue ()));
       }
        return attrlist;
   }

    @Override
    public attributelist setattributes (attributelist attributes) {
       //We just need to do monitoring, no need to set properties, so direct throw exception
         throw new Unsupportedoperationexception ("Setting attribute is not supported");
   }

@Override
Public Object Invoke (String ActionName,
Object[] params,
String[] signature) throws Mbeanexception, Reflectionexception {
Method calls do not need to be implemented
throw new Unsupportedoperationexception ("Invoking is not supported");
}

@Override
Public Mbeaninfo Getmbeaninfo () {
sortedset<string> names = new treeset<> (Metrics.keyset ());
list<mbeanattributeinfo> Attrinfos = new arraylist<> (Names.size ());
for (String name:names) {
Attrinfos.add (name, new Mbeanattributeinfo
"Long",
"Metric" + Name,
True
False
false));
}
Return to New Mbeaninfo (GetClass (). GetName (),
"Application Metrics",
Attrinfos.toarray (New Mbeanattributeinfo[attrinfos.size ()),
Null
Null
NULL);
}

}
The Metric is an interface we designed to define different monitoring metrics:

public class Metrics {

Private static final Logger log = Loggerfactory.getlogger (Metrics.class);
Private static final Metrics instance = new Metrics ();
Private map<string, metric> metrics = new hashmap<> ();

public static Metrics instance () {
return instance;
}

Private Metrics () {
}

Public Metrics Register (String name, Metric Metric) {
Metrics.put (name, metric);
return this;
}

public void Creatembean () {
Metricsmbean Mbean = new Metricsmbean (metrics);
Mbeanserver Server = Managementfactory.getplatformmbeanserver ();
try {
Final String name = MetricsMBean.class.getPackage (). GetName () +
": type=" +
MetricsMBean.class.getSimpleName ();
Log.debug ("Registering MBean: {}", name);
Server.registermbean (Mbean, new objectname (name));
catch (Exception e) {
Log.warn ("Error Registering Trafree metrics Mbean", e);
}
}

}
Called to register metrics and create an MBean when the application is started:

Createmaxvaluemetric and Createcountmetric can be based on the same data.
The maximum value and the number of indicators, see below averagemetric specific implementation.
Metrics.instance ()
. Register ("Searchavgtime", Metricloggers.searchtime)
. Register ("Searchmaxtime", MetricLoggers.searchTime.createMaxValueMetric ())
. Register ("Searchcount", MetricLoggers.searchTime.createCountMetric ())
. Creatembean ();
The name specified when registering is also the last property name seen from JMX.
Of course, it's just our internal monitoring framework, and what you need to focus on is how to implement a custom MBean.
The Metric interface mentioned above, I did not give the implementation. Here is a common internal implementation averagemetric (average metric). It can record a performance value and calculate the average, maximum, and number of times per unit time. For example, the searchtime, defined in the metricloggers above, is used to record the average time consuming of the search function of our system, the maximum time of one minute and the number of searches in one minute.

public class Metricloggers {
public static final Averagemetric searchtime = new Averagemetric ();
}
Record time in the actual search function:

Long starttime = System.currenttimemillis ();
Dosearch (Request);
Long timecost = System.currenttimemillis ()-starttime;

MetricLoggers.searchTime.log (Timecost);
This enables JMX to monitor the average search time, maximum search time, and number of searches in our system over the past minute.
The following is the specific implementation of the Averagemetric class, relatively long, please look slowly. The basic idea is to use Atomicreference and a value object to implement concurrency through non-blocking algorithms. After testing, in a low degree of concurrency performance is good, but a lot of online, competitive time is not very good. Again, this implementation is for informational purposes only.

public class Timewindowsupport {
Final long TimeWindow;

Timewindowsupport (Long TimeWindow) {
This.timewindow = TimeWindow;
}

Long Currentslot () {
return System.currenttimemillis ()/TimeWindow;
}
}


public class Averagemetric extends Timewindowsupport implements Metric {

Final atomicreference<value> CurrentValue = new atomicreference<value> ();
Private volatile Value lastvalue = null;

Public averagemetric (Long TimeWindow) {
Super (TimeWindow);
}

Public Averagemetric () {
Super (TimeUnit.MINUTES.toMillis (1));
}

Public Value Getlastvalue () {
Long slot = Currentslot ();
while (true) {
Value Curvalue = Currentvalue.get ();
if (curvalue!= null && slot!= curvalue.slot) {
if (Currentvalue.compareandset (Curvalue, value.create (slot)) {
Lastvalue = Curvalue;
Break
}
} else {
Break
}
}
return lastvalue;
}

public void log (Long value) {
Long slot = Currentslot ();
while (true) {
Value Curvalue = Currentvalue.get ();
if (Curvalue = = null) {
if (Currentvalue.compareandset (null, value.create (slot, Value))
Return
else if (slot = = Curvalue.slot) {
if (Currentvalue.compareandset (Curvalue, Curvalue.add (value))
Return
} else {
if (Currentvalue.compareandset (Curvalue, value.create (slot, Value)) {
Lastvalue = Curvalue;
Return
}
}
}
}

/**
* Based on the same data, create a count metric whose return value is the number of log events that have occurred in the past unit time
*
* @return Return Count metric
*/
Public Metric Createcountmetric () {
return new Metric () {
@Override
Public long GetValue () {
Value val = Getlastvalue ();
if (val!= null)
Return (long) VAL.N;
Else
return 0L;
}
};
}

/**
* Create a maximum metric based on the same data, whose return value is the maximum number recorded in the past unit time
*
* @return returns the maximum measure
*/
Public Metric Createmaxvaluemetric () {
return new Metric () {
@Override
Public long GetValue () {
Value val = Getlastvalue ();
if (val!= null)
return Val.max;
Else
return 0L;
}
};
}

@Override
Public long GetValue () {
Value Lastvalue = Getlastvalue ();
Long Lastslot = Currentslot ()-1;
if (lastvalue!= null && LASTVALUE.N!= 0 && lastslot = lastvalue.slot)
return LASTVALUE.TOTAL/LASTVALUE.N;
Else
return 0L;
}

Static Class Value {
Final long slot;
final int n;
Final long total;
Final long Max;

Value (long slot, int n, long total, long max) {
This.slot = slot;
THIS.N = n;
This.total = total;
This.max = max;
}

Static Value Create (long slot, long Value) {
return new Value (slot, 1, value, value);
}

Static Value Create (long slot) {
return new Value (slot, 0, 0, 0);
}

Value add (Long value) {
return new Value (This.slot,
THIS.N + 1,
This.total + value,
(Value > This.max)? Value:this.max);
}
}
}
2. Jmxtrans
With JMX, we still lack the last link: the monitoring data is sent to the monitoring system we have worked on before. Our core system is Ganglia, so send the data to it. We chose to jmxtrans this solution. It is also implemented in Java itself, using JSON as a configuration file.
2.1 Installation
It provides deb,rpm and standard zip packages that are easy to install. Choose to install according to the release version.
2.2 Configuration
The Jmxtrans configuration file is under/var/lib/jmxtrans, using the JSON format. Create a JSON file for each application you want to monitor, and configure it in the following format. I have attached a note below, but the actual configuration file appears to be an error if it has this annotation, please note.

{
"Servers": [{
"Host": "localhost",//JMX IP
"Port": "19008",//JMX Port
Alias, for ganglia identification of the source of the parameter, write cost machine IP and hostname can
"Alias": "192.168.221.29:fly2save02",
"Queries": [
{
"Outputwriters": [{
"@class": "Com.googlecode.jmxtrans.model.output.GangliaWriter",
"Settings": {
"GroupName": "MyApp", parameter group name in//ganglia
"Host": "192.168.1.9",//ganglia IP
"Port": 8648,//ganglia ports
"Slope": "BOTH",
"Units": "bytes",//parameter units
"Tmax": 60,
"DMax": 0,
"Sendmetadata": 30
}
} ],
"obj": "Java.lang:type=memory",//identification of MBean to monitor
"Resultalias": "App",//alias, use alias to avoid name too long
"attr": ["Heapmemoryusage", "nonheapmemoryusage"]//Mbean properties to monitor
},
To monitor multiple MBean, you need to write multiple groups of query, where the outputwriters part is redundant
Yu, this is more disgusting.
{
"Outputwriters": [{
"@class": "Com.googlecode.jmxtrans.model.output.GangliaWriter",
"Settings": {
"GroupName": "MyApp",
"Host": "192.168.1.9",
"Port": 8648,
"Slope": "BOTH",
"Tmax": 60,
"DMax": 0,
"Sendmetadata": 30
}
} ],
"obj": "Com.trafree.metrics:type=metricsmbean",//We apply an Mbean
"Resultalias": "App"
Unspecified attr means to monitor all properties
}
]
} ]
}
For more detailed configuration, refer to the official wiki.
2.3 Run
First application must open JMX Remote, add the following JVM parameters for the application.

-dcom.sun.management.jmxremote
-dcom.sun.management.jmxremote.port=19008
-dcom.sun.management.jmxremote.local.only=true
-dcom.sun.management.jmxremote.authenticate=false
-dcom.sun.management.jmxremote.ssl=false
Our applications and Jmxtrans are running on the same machine, so the local.only is changed to true, only local connections are allowed, and authentication and SSL support is removed. If your deployment is different, please adjust the requirements.
Jmxtrans is simple to run, start the appropriate service (make sure Java is in PATH):
1
2
Chkconfig--add Jmxtrans
/etc/init.d/jmxtrans start
3. Summary and other solution introduction
At this point, our complete monitoring program is basically formed. With GANGLIA,NAGIOS,JMX and Jmxtrans, we can fully monitor everything from the OS to the application, make it easy to do alarm support, and easily view historical trends.
The following show two maps is our core ticket retrieval engine's performance parameters in Ganglia and Nagios:
Ganglia aggregation view, stacked to show the same metric on multiple instances

From the Nagios to see the status of these services, if from OK into warn/critical, we will receive the mail immediately

Finally finished this series of articles, welcome readers to leave their own ideas, welcome to exchange.
3.1 Other programmes
In the study of these, I also found a number of other solutions, here to mention, interested in the in-depth study (Welcome to the Exchange):
COLLECTD is a good substitute for Ganglia, seemingly lighter weight, performance is also very good, should be more suitable for small clusters. He can also integrate well with Nagios.
Metrics is a Java library that provides a variety of tools for recording system metrics, essentially the best alternative to our own metricmbean, powerful, and supports many common components such as jetty,ehcache,log4j, and can send data to Ganglia. If I had found this earlier, I might not have written the same set of plans as described above. Yes, it has Clojure bindings, and if it's a Clojure application, it's better to consider using it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.