Perfect cluster monitoring combination ganglia and Nagios

Source: Internet
Author: User
Tags apache php rrd rrdtool system log

Ganglia is a cluster monitoring software developed by Berkeley. You can monitor and display the various status information of nodes in the cluster, such as CPU, MEM, hard disk utilization, I/O load, network traffic, etc., while the historical data can be presented in a curved way via PHP pages.

And ganglia relies on a Web server to display the state of the cluster, using RRDtool to store data and generate graphs, XML parsing needs to be expat, and configuration file parsing needs to be libconfuse. Installing Apche httpd also requires support for more than PHP4, as well as some dependent software.

Ganglia as a monitoring software in the most commonly used Linux environment, it specializes in collecting data at a lower cost from a node as the user needs it. But ganglia is not good at notifying users of alerts and events. The latest ganglia already has some of this functionality. But he's better at warning and Nagios. Nagios is a software that specializes in alerts and alerts. By combining ganglia and nagios, the data collected by ganglia as a data source for Nagios, and then using Nagios to send alert notifications, can perfectly implement a complete set of monitoring and management systems.



--------------------------------------------------------------------------------------------------------------- ---------------------------

Ganglia Installation Instructions



These dependent software can be installed in Redhat with the following commands:

Yum-y Install apr-devel apr-util check-devel cairo-devel pango-devel libxml2-devel rpmbuild glib2-devel dbus-devel freety Pe-devel fontconfig-devel gcc-c++ expat-devel python-devel libxrender-devel

Libconfuse can be obtained by the following command:

Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm
Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm

1.2 Installation configuration steps
1.2.1 Installation
Here to download the source code compiled installation, to the Http://ganglia.info website to download the latest version of ganglia, download and unzip.

./configure--with-librrd=/rrd/path--with-gmetad--prefix=/usr/local/ganglia

Make

Make install

If there is a software problem in the middle, you need to install the missing package. After installation, you need to configure the configuration file is usually placed in the/etc/ganglia directory, the name is gmetad.conf. Of course, the path is not strictly required because Gmetad can specify the configuration file to use at startup.

After installing the ganglia, you also need to install the Apache server, also need to have PHP module support, otherwise the final display page will not display properly. Yum Install httpd PHP recommended

Otherwise, if it is not configured correctly, it is possible that Apache cannot be properly associated with PHP.

After installation, you can enter http://localhost/test.php, compile a PHP page to test whether the installation is successful.


1.2.2 Configuration
If the source is installed, according to the previous-prefix,ganglia will be installed in the/usr/local/ganglia directory.

First, create a directory that is used to store ganglia Web pages

Mkdir-p/var/www/html/ganglia/

This directory is used to store Web pages that are later used to display data.

Due to the use of source code compiled, and did not add Gmetad and Gmond as a service, execute the following command.

CP Gmetad/gmetad. Init/etc/rc.d/init.d/gmetad//Copy Gmetad service startup script
CP Gmond/gmond. Init/etc/rc.d/init.d/gmond//Copy Gmond service startup script
Mkdir/etc/ganglia//Create Profile home directory
gmond-t | tee/etc/ganglia/gmond.conf//Generate Gmond service configuration file
CP gmetad/gmetad.conf/etc/ganglia///Copy Gmetad service configuration file
Mkdir-p/var/lib/ganglia/rrds//Create an RRD file storage directory
Chown Nobody:nobody/var/lib/ganglia/rrds//genera and groups are nobody
Chkconfig--add Gmetad//handing over services to Chkconfig management
Chkconfig--add Gmond//Ibid.

Modifying a configuration file/etc/gmetad.conf usually requires only the following parameters:

Data_source "ClusterName" host1 host2

To change the name of the cluster to your own, host1 Host2 is the data source of the XML file Gmetad used to get the cluster information, if there is no write port, the default port of 8649 is used, and Gmetad defaults to the host to download the XML file every 15 seconds via TCP connection. So they can be gmond 8649 ports, or 8651 ports of Gmetad, which can provide the cluster information in XML format for data download.

Host1 Host2 is or relationship, if host1 cannot download, will try to host2 download, so they should be the same cluster node, save the same data. With multicast mode, each Gmond node has all the monitoring data for the node machine within the cluster, so it is not necessary to write all nodes to data_source. Recommended write not less than 2, when the host1 node freezes, will automatically find the HOST2 node fetch data.

In addition, Gmetad has the following property settings:

RRD Database Storage DefInition

RRAs "rra:average:0.5:1:244" "rra:average:0.5:24:244" "rra:average:0.5:168:244" "rra:average:0.5:672:244" "RRA: average:0.5:5760:374 "

RRD Files Location

Access control

Trusted_hosts Address1 Address2 ... DN1 DN2 ...

All_trusted off/on

The directory where RRD holds the data

Rrd_rootdir "/var/lib/ganglia/rrds"

Network

Xml_port 8651 #可以telnet到该端口, get Gmetad XML file

Interactive_port 8652 #php页面数据交互使用的端口



1.2.3 The configuration of the PHP page

Need to find under/var/www/html/ganglia/directory

Php.conf

$gmetad _root = "/var/lib/ganglia"; #gmetad写入的rrd数据库的路径

$rrds = "$gmetad _root/rrds";

$ganglia _ip = "localhost"; #gmetad服务器的地址

$ganglia _port = 8652; #gmetad服务器的交互式提供监控数据端口

By default, the Web front end is refreshed every 300 seconds (5 minutes), and you can modify the refresh interval by modifying the config.php file implementation, which includes all the ganglia Web parameters.


1.2.4 Ganglia Client Configuration
Vi/etc/ganglia/gmond.conf

Mainly three places need to be modified, cluster name,udp_send_channel,udp_recv_channel. Note the difference between unicast and multicast mode, in multicast mode, the nodes that join the multicast group receive data from all the other nodes in the group, so each corresponds to a backup. In unicast mode, the data is sent only to the point and then to the specific host, which typically has a central collection node.



Cluster {

Name = "Cluster1" #本节点属于哪个cluster

Owner = "Chifeng" #谁是该节点的所有者

Latlong = "Unspecified" #在地球上的坐标, longitude, latitude?

url = "Unspecified"

}

Udp_send_channel {#udp包的发送通道

Mcast_join = 239.2.11.71 #多播, working under 239.2.11.71 channel. If you use unicast mode, you write host = Host1 (the target host that accepts the data), and you can configure multiple Udp_send_channel in unicast mode

Port = 8649 #监听端口

TTL = 1

}

Udp_recv_channel {#接收udp包配置

Mcast_join = 239.2.11.71 #同样工作在239.2.11.71 Channel, if you are using unicast mode, write host = Localip, which must be the IP of the native

Port = 8649 #监听端口

bind = 239.2.11.71 #绑定

}

Tcp_accept_channel {

Port = 8649 #通过tcp协议监听的端口, the remote can be linked to the 8649 port to get monitoring data, Gmetad is the port to get XML data

}



There are other configuration items that are usually not modified and have the following meanings:

Collection_group section:

Collect_once–specifies that the group of static metrics

Collect_every–collection interval (only valid for non-static)

Time_threshold–max Data Send Interval

Metric section:

Name–metric name (see "Gmond–m")

Value_threshold–metric Variance threshold (send if exceeded)



Examples are as follows:

Collection_group {

Collect_every = 80

Time_threshold = 950

Metric {

Name = "Proc_run"

Value_threshold = "1.0"

}

Metric {

Name = "Proc_total"

Value_threshold = "1.0"

}

}


1.3 Command Set
Description: The command set, which is the command-line command that I used when I installed the configuration, can be used as a basis for automating the deployment step. You can consider the steps behind writing an automated deployment.



Service side:

1) Install expat-2.0.1.tar.gz

Tar xvzf expat-2.0.1.tar.gz

CD expat*;. /configure--prefix=/usr/local/apr;make;make Install

2) Install confuse-2.6

./configure--prefix=/usr/local/confuse-2.6 cflags=-fpic--disable-nls;make;make Install

3) Install Apr

Tar xvjf apr-1.3.2.tar.bz2

CD apr-1.3.2;. /configure--prefix=/usr/local/apr;make;make Install

Installing APR-UTIL-1.3.2.TAR.BZ2

Tar xvjf apr-util-1.3.2.tar.bz2

CD apr-util-1.3.2;. /configure--WITH-APR=/USR/LOCAL/APR--with-expat=/usr/local/expat

Make;make Install

cp/usr/local/apr-1.3.2/include/apr-1/*/usr/local/apr-1.3.2/include/directory, because ganglia is installed by default to/usr/local/apr/ Include to find the APR library file.

4) Install rrdtool-1.2.27.tar.gz

Tar xvzf rrdtool-1.2.27.tar.gz

CD rrdtool-1.2.27;. /configure--prefix=/usr/local/rrdtool

Make;make Install

5) cp/usr/local/apr/bin/apr-1*/usr/local/bin/Copy this after OK otherwise it will compile the problem

The error is as follows:

Checking for Apr

Checking for apr-1-config ... no

Configure:error:apr-1-config binary not found in Pat

6) Install Ganglia

./configure--with-librrd=/opt/rrdtool-1.4.4--with-gmetad--prefix=/usr/local/ganglia--with-libconfuse=/usr/ local/confuse-2.6

7) Make;make Install

8) Install Apache Server and PHP support

Yum-y install httpd mysqld php-mysql php



Client:

Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm

Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm



SCP Apr-*.* 10.250.13.45:~/

SCP Libconfuse-*.* 10.250.13.45:~/

SCP ganglia-*.gz 10.250.13.45:~/

SCP ganglia-devel-*.rpm 10.250.13.45:~/

SCP *.conf 10.250.13.45:~/



SSH 10.250.13.45

sudo su-

Yum Install expat



Cd/home/admin

TAR-XVF apr-1.4.*.gz

CD apr*

./configure--PREFIX=/USR/LOCAL/APR

Make

Make install



Cd..

TAR-XVF apr-util-1.3.9.*

CD apr-util*

./configure--WITH-APR=/USR/LOCAL/APR

Make

Make install



Cd..

RPM-IVH libconfuse-2.5-4.el5.x86_64.rpm

RPM-IVH libconfuse-devel-2.5-4.el5.x86_64.rpm



TAR-XVF ganglia-3.1.*.gz

CD ganglia*

Cp/usr/local/apr/bin/apr-1*/usr/local/bin/

./configure--WITH-APR=/USR/LOCAL/APR

Find/-name "libpython2.5*"

Cp/usr/local/lib/libpython2.5.so/usr/lib/libpython2.5.so

Make

Make install



Cd..

RPM-IVH ganglia-devel-3.1.1-1.x86_64.rpm--nodeps

Cd/etc

mkdir Ganglia

cp/home/admin/*.conf/etc/ganglia/

Cd/etc/ganglia

vi gmond.conf; edit UDP send and recv host.

Vi/usr/local/etc/gmond.conf



Gmond--debug=10

Ps-e|grep Gmond

Kill-9 ID

Gmond

If necessary, you need to re-modify gmond.conf

SCP Test 10.250.13.42:~/

SCP Test 10.250.13.43:~/

SCP Test 10.250.13.44:~/

SCP Test 10.250.13.45:~/

Vi/etc/profile

Export ld_library_path= $LD _library_path: "/usr/local/lib64/"

Source/etc/profile

Problems and Solutions

1. Installation Issues

The library file is missing and usually occurs during the make process, and LD cannot find the corresponding library such as libpython2.5.so

Workaround: The find command looks for these two files and ln–s creates a soft link reference to the two files. Find/-name libpython*

If a dependency error occurs during installation, it usually occurs at configure

Workaround: First Use Find lookup, if found can read the readme, see if there is parameter support to indicate the path. Do not consider copying to the default directory, or not, you can add the parameter-nodeps, and then download the lib, usually contained in its devel package, you need to go online to find the file containing the LIB, and then install.

2. Configuration and Operation issues

Test if Gmond and Gmetad are running successfully

telent localhost 8649

telent localhost 8651

If there is no response

Workaround: It is likely that the service is not started or is not using the default port, Ps–e|grep Gmond, to find out if the service is started. Look at the TCP recv port used by gmond.conf.

If you can't find a reason, you can start with debug mode to see why

gmond–debug=10

If there is a port binding error such as UDP, such as if it has been bind, see if a port has been exploited Lsof–i:port.

It may also be that the configuration file configuration is incorrect, for example, I have modified the Udp_recv_channel host to the same value as Udp_send_channel, a port error occurred, the host of Udp_recv_channel must be the local IP ( A single machine may have multiple IPs). If the permission is forbidden, consider the current user identity or replace it with root.

Test if PHP front-end support is successful

Http://localhost/ganglia

PHP page appears as a file or prompt to download files

WORKAROUND: The Apache PHP module is not installed and configured properly. Use yum install or re-download the PHP module and configure it in Apache's conf file.

Show Page no image display

First check if SELinux is off

Then check that the RRDtool path in the conf.php file is correct, that the file exists, and that the path is the path to the RRDtool executable file, not its installation directory.

Then see if the/var/lib/ganglia/rrds exists and can be written. Chown Nobody:nobody/var/lib/ganglia/rrds # Make sure RRDTool can write here.

Check to see if the Gmetad path address port in php.conf is correct.




--------------------------------------------------------------------------------------------------------------- --------------------------------------------------------

Let's say how to install Nagios.

1. Download the installation package

The http://www.nagios.org/download can be downloaded to the latest Nagios Core, Nagios plugin installation package.



2. Create a user



Switch to root user

/usr/sbin/useradd Nagios

passwd Nagios

Create a user group named Nagcmd to execute external commands from the Web interface. Both Nagios users and Apache users are added to this group.

/usr/sbin/groupadd Nagcmd
/usr/sbin/usermod-g Nagcmd Nagios
/usr/sbin/usermod-g Nagcmd Apache



3. Installing Nagios



Tar xzf nagios-3.0.6.tar.gz
CD nagios-3.0.6.tar.gz
Run the Nagios configuration script and use the previously opened users and user groups:

./configure--with-command-group=nagcmd

Compiling Nagios package source code

Make All-j8

Make install
Make Install-init
Make Install-config
Make Install-commandmode

Make install-webconf #安装Nagios的WEB配置文件到Apache的conf. D Directory

4. Installing Nagios Plugin

./configure--with-nagios-user=nagios--with-nagios-group=nagios-prefix=/usr/local/nagios

Make-j8

Make install

5. Basic Configuration

Configuration file default location/usr/local/nagios/etc, you just need to make simple changes to experience nagios.

Modify the/usr/local/nagios/etc/objects/contacts.cfg, change the email address in the contact definition information of the e-mail nagiosadmin to receive the alert content in your email message. (Verify that your system can send mail to the appropriate recipient)
6. Configuring the Web Interface

Create a nagiosadmin user to log in to the Web interface for Nagios. Make a note of the login password you set, and you'll use it in a moment.
Htpasswd-c/usr/local/nagios/etc/htpasswd.users Nagiosadmin
Restart the Apache service for the settings to take effect.
Service httpd Restart

7. Start Nagios

Chkconfig--add Nagios
Chkconfig Nagios on
Service Nagios Start

This is where you can visit Http://localhost/nagios and you can see Nagios haha.





Some other files may be needed during Nagios installation, such as Apache .... I installed Nagios on the system where the ganglia was installed. So if your system prompts for some dependency when installing Nagios, you need to follow the prompts:)

The most troublesome problems I encountered during the actual installation process were:
1. Unable to start Nagios, a hint was found in system log that the/usr/local/nagios/var/rw/nagios.cmd could not be created. I found no RW directory ... I created the RW directory and changed its owner to Nagios:nagcmd, with the permission changed to DRW-RW----。 The tragedy is that the error message still exists. Finally I changed my permissions directly to drw-rw-rw-. can work, but root cause I haven't found it yet.
2. A similar error. When you view log on Nagios, you are prompted not to handle the/usr/local/nagios/var/archives permissions appropriately. The workaround is as above.

Overall, the installation process is relatively smooth hey, after running as follows:
<ignore_js_op>

Perfect cluster monitoring combination ganglia and Nagios

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.