Ganglia is a cluster monitoring software developed by Berkeley. You can monitor and display the various status information of nodes in the cluster, such as CPU, MEM, hard disk utilization, I/O load, network traffic, etc., while the historical data can be presented in a curved way via PHP pages.
And ganglia relies on a Web server to display the state of the cluster, using RRDtool to store data and generate graphs, XML parsing needs to be expat, and configuration file parsing needs to be libconfuse. Installing Apche httpd also requires support for more than PHP4, as well as some dependent software.
Ganglia as a monitoring software in the most commonly used Linux environment, it specializes in collecting data at a lower cost from a node as the user needs it. But ganglia is not good at notifying users of alerts and events. The latest ganglia already has some of this functionality. But he's better at warning and Nagios. Nagios is a software that specializes in alerts and alerts. By combining ganglia and nagios, the data collected by ganglia as a data source for Nagios, and then using Nagios to send alert notifications, can perfectly implement a complete set of monitoring and management systems.
--------------------------------------------------------------------------------------------------------------- ---------------------------
Ganglia Installation Instructions
These dependent software can be installed in Redhat with the following commands:
Yum-y Install apr-devel apr-util check-devel cairo-devel pango-devel libxml2-devel rpmbuild glib2-devel dbus-devel freety Pe-devel fontconfig-devel gcc-c++ expat-devel python-devel libxrender-devel
Libconfuse can be obtained by the following command:
Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm
Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm
1.2 Installation configuration steps
1.2.1 Installation
Here to download the source code compiled installation, to the Http://ganglia.info website to download the latest version of ganglia, download and unzip.
./configure--with-librrd=/rrd/path--with-gmetad--prefix=/usr/local/ganglia
Make
Make install
If there is a software problem in the middle, you need to install the missing package. After installation, you need to configure the configuration file is usually placed in the/etc/ganglia directory, the name is gmetad.conf. Of course, the path is not strictly required because Gmetad can specify the configuration file to use at startup.
After installing the ganglia, you also need to install the Apache server, also need to have PHP module support, otherwise the final display page will not display properly. Yum Install httpd PHP recommended
Otherwise, if it is not configured correctly, it is possible that Apache cannot be properly associated with PHP.
After installation, you can enter http://localhost/test.php, compile a PHP page to test whether the installation is successful.
1.2.2 Configuration
If the source is installed, according to the previous-prefix,ganglia will be installed in the/usr/local/ganglia directory.
First, create a directory that is used to store ganglia Web pages
Mkdir-p/var/www/html/ganglia/
This directory is used to store Web pages that are later used to display data.
Due to the use of source code compiled, and did not add Gmetad and Gmond as a service, execute the following command.
CP Gmetad/gmetad. Init/etc/rc.d/init.d/gmetad//Copy Gmetad service startup script
CP Gmond/gmond. Init/etc/rc.d/init.d/gmond//Copy Gmond service startup script
Mkdir/etc/ganglia//Create Profile home directory
gmond-t | tee/etc/ganglia/gmond.conf//Generate Gmond service configuration file
CP gmetad/gmetad.conf/etc/ganglia///Copy Gmetad service configuration file
Mkdir-p/var/lib/ganglia/rrds//Create an RRD file storage directory
Chown Nobody:nobody/var/lib/ganglia/rrds//genera and groups are nobody
Chkconfig--add Gmetad//handing over services to Chkconfig management
Chkconfig--add Gmond//Ibid.
Modifying a configuration file/etc/gmetad.conf usually requires only the following parameters:
Data_source "ClusterName" host1 host2
To change the name of the cluster to your own, host1 Host2 is the data source of the XML file Gmetad used to get the cluster information, if there is no write port, the default port of 8649 is used, and Gmetad defaults to the host to download the XML file every 15 seconds via TCP connection. So they can be gmond 8649 ports, or 8651 ports of Gmetad, which can provide the cluster information in XML format for data download.
Host1 Host2 is or relationship, if host1 cannot download, will try to host2 download, so they should be the same cluster node, save the same data. With multicast mode, each Gmond node has all the monitoring data for the node machine within the cluster, so it is not necessary to write all nodes to data_source. Recommended write not less than 2, when the host1 node freezes, will automatically find the HOST2 node fetch data.
In addition, Gmetad has the following property settings:
RRD Database Storage DefInition
RRAs "rra:average:0.5:1:244" "rra:average:0.5:24:244" "rra:average:0.5:168:244" "rra:average:0.5:672:244" "RRA: average:0.5:5760:374 "
RRD Files Location
Access control
Trusted_hosts Address1 Address2 ... DN1 DN2 ...
All_trusted off/on
The directory where RRD holds the data
Rrd_rootdir "/var/lib/ganglia/rrds"
Network
Xml_port 8651 #可以telnet到该端口, get Gmetad XML file
Interactive_port 8652 #php页面数据交互使用的端口
1.2.3 The configuration of the PHP page
Need to find under/var/www/html/ganglia/directory
Php.conf
$gmetad _root = "/var/lib/ganglia"; #gmetad写入的rrd数据库的路径
$rrds = "$gmetad _root/rrds";
$ganglia _ip = "localhost"; #gmetad服务器的地址
$ganglia _port = 8652; #gmetad服务器的交互式提供监控数据端口
By default, the Web front end is refreshed every 300 seconds (5 minutes), and you can modify the refresh interval by modifying the config.php file implementation, which includes all the ganglia Web parameters.
1.2.4 Ganglia Client Configuration
Vi/etc/ganglia/gmond.conf
Mainly three places need to be modified, cluster name,udp_send_channel,udp_recv_channel. Note the difference between unicast and multicast mode, in multicast mode, the nodes that join the multicast group receive data from all the other nodes in the group, so each corresponds to a backup. In unicast mode, the data is sent only to the point and then to the specific host, which typically has a central collection node.
Cluster {
Name = "Cluster1" #本节点属于哪个cluster
Owner = "Chifeng" #谁是该节点的所有者
Latlong = "Unspecified" #在地球上的坐标, longitude, latitude?
url = "Unspecified"
}
Udp_send_channel {#udp包的发送通道
Mcast_join = 239.2.11.71 #多播, working under 239.2.11.71 channel. If you use unicast mode, you write host = Host1 (the target host that accepts the data), and you can configure multiple Udp_send_channel in unicast mode
Port = 8649 #监听端口
TTL = 1
}
Udp_recv_channel {#接收udp包配置
Mcast_join = 239.2.11.71 #同样工作在239.2.11.71 Channel, if you are using unicast mode, write host = Localip, which must be the IP of the native
Port = 8649 #监听端口
bind = 239.2.11.71 #绑定
}
Tcp_accept_channel {
Port = 8649 #通过tcp协议监听的端口, the remote can be linked to the 8649 port to get monitoring data, Gmetad is the port to get XML data
}
There are other configuration items that are usually not modified and have the following meanings:
Collection_group section:
Collect_once–specifies that the group of static metrics
Collect_every–collection interval (only valid for non-static)
Time_threshold–max Data Send Interval
Metric section:
Name–metric name (see "Gmond–m")
Value_threshold–metric Variance threshold (send if exceeded)
Examples are as follows:
Collection_group {
Collect_every = 80
Time_threshold = 950
Metric {
Name = "Proc_run"
Value_threshold = "1.0"
}
Metric {
Name = "Proc_total"
Value_threshold = "1.0"
}
}
1.3 Command Set
Description: The command set, which is the command-line command that I used when I installed the configuration, can be used as a basis for automating the deployment step. You can consider the steps behind writing an automated deployment.
Service side:
1) Install expat-2.0.1.tar.gz
Tar xvzf expat-2.0.1.tar.gz
CD expat*;. /configure--prefix=/usr/local/apr;make;make Install
2) Install confuse-2.6
./configure--prefix=/usr/local/confuse-2.6 cflags=-fpic--disable-nls;make;make Install
3) Install Apr
Tar xvjf apr-1.3.2.tar.bz2
CD apr-1.3.2;. /configure--prefix=/usr/local/apr;make;make Install
Installing APR-UTIL-1.3.2.TAR.BZ2
Tar xvjf apr-util-1.3.2.tar.bz2
CD apr-util-1.3.2;. /configure--WITH-APR=/USR/LOCAL/APR--with-expat=/usr/local/expat
Make;make Install
cp/usr/local/apr-1.3.2/include/apr-1/*/usr/local/apr-1.3.2/include/directory, because ganglia is installed by default to/usr/local/apr/ Include to find the APR library file.
4) Install rrdtool-1.2.27.tar.gz
Tar xvzf rrdtool-1.2.27.tar.gz
CD rrdtool-1.2.27;. /configure--prefix=/usr/local/rrdtool
Make;make Install
5) cp/usr/local/apr/bin/apr-1*/usr/local/bin/Copy this after OK otherwise it will compile the problem
The error is as follows:
Checking for Apr
Checking for apr-1-config ... no
Configure:error:apr-1-config binary not found in Pat
6) Install Ganglia
./configure--with-librrd=/opt/rrdtool-1.4.4--with-gmetad--prefix=/usr/local/ganglia--with-libconfuse=/usr/ local/confuse-2.6
7) Make;make Install
8) Install Apache Server and PHP support
Yum-y install httpd mysqld php-mysql php
Client:
Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm
Wget http://download.fedora.redhat.co .... 5-4.el5.x86_64.rpm
SCP Apr-*.* 10.250.13.45:~/
SCP Libconfuse-*.* 10.250.13.45:~/
SCP ganglia-*.gz 10.250.13.45:~/
SCP ganglia-devel-*.rpm 10.250.13.45:~/
SCP *.conf 10.250.13.45:~/
SSH 10.250.13.45
sudo su-
Yum Install expat
Cd/home/admin
TAR-XVF apr-1.4.*.gz
CD apr*
./configure--PREFIX=/USR/LOCAL/APR
Make
Make install
Cd..
TAR-XVF apr-util-1.3.9.*
CD apr-util*
./configure--WITH-APR=/USR/LOCAL/APR
Make
Make install
Cd..
RPM-IVH libconfuse-2.5-4.el5.x86_64.rpm
RPM-IVH libconfuse-devel-2.5-4.el5.x86_64.rpm
TAR-XVF ganglia-3.1.*.gz
CD ganglia*
Cp/usr/local/apr/bin/apr-1*/usr/local/bin/
./configure--WITH-APR=/USR/LOCAL/APR
Find/-name "libpython2.5*"
Cp/usr/local/lib/libpython2.5.so/usr/lib/libpython2.5.so
Make
Make install
Cd..
RPM-IVH ganglia-devel-3.1.1-1.x86_64.rpm--nodeps
Cd/etc
mkdir Ganglia
cp/home/admin/*.conf/etc/ganglia/
Cd/etc/ganglia
vi gmond.conf; edit UDP send and recv host.
Vi/usr/local/etc/gmond.conf
Gmond--debug=10
Ps-e|grep Gmond
Kill-9 ID
Gmond
If necessary, you need to re-modify gmond.conf
SCP Test 10.250.13.42:~/
SCP Test 10.250.13.43:~/
SCP Test 10.250.13.44:~/
SCP Test 10.250.13.45:~/
Vi/etc/profile
Export ld_library_path= $LD _library_path: "/usr/local/lib64/"
Source/etc/profile
Problems and Solutions
1. Installation Issues
The library file is missing and usually occurs during the make process, and LD cannot find the corresponding library such as libpython2.5.so
Workaround: The find command looks for these two files and ln–s creates a soft link reference to the two files. Find/-name libpython*
If a dependency error occurs during installation, it usually occurs at configure
Workaround: First Use Find lookup, if found can read the readme, see if there is parameter support to indicate the path. Do not consider copying to the default directory, or not, you can add the parameter-nodeps, and then download the lib, usually contained in its devel package, you need to go online to find the file containing the LIB, and then install.
2. Configuration and Operation issues
Test if Gmond and Gmetad are running successfully
telent localhost 8649
telent localhost 8651
If there is no response
Workaround: It is likely that the service is not started or is not using the default port, Ps–e|grep Gmond, to find out if the service is started. Look at the TCP recv port used by gmond.conf.
If you can't find a reason, you can start with debug mode to see why
gmond–debug=10
If there is a port binding error such as UDP, such as if it has been bind, see if a port has been exploited Lsof–i:port.
It may also be that the configuration file configuration is incorrect, for example, I have modified the Udp_recv_channel host to the same value as Udp_send_channel, a port error occurred, the host of Udp_recv_channel must be the local IP ( A single machine may have multiple IPs). If the permission is forbidden, consider the current user identity or replace it with root.
Test if PHP front-end support is successful
Http://localhost/ganglia
PHP page appears as a file or prompt to download files
WORKAROUND: The Apache PHP module is not installed and configured properly. Use yum install or re-download the PHP module and configure it in Apache's conf file.
Show Page no image display
First check if SELinux is off
Then check that the RRDtool path in the conf.php file is correct, that the file exists, and that the path is the path to the RRDtool executable file, not its installation directory.
Then see if the/var/lib/ganglia/rrds exists and can be written. Chown Nobody:nobody/var/lib/ganglia/rrds # Make sure RRDTool can write here.
Check to see if the Gmetad path address port in php.conf is correct.
--------------------------------------------------------------------------------------------------------------- --------------------------------------------------------
Let's say how to install Nagios.
1. Download the installation package
The http://www.nagios.org/download can be downloaded to the latest Nagios Core, Nagios plugin installation package.
2. Create a user
Switch to root user
/usr/sbin/useradd Nagios
passwd Nagios
Create a user group named Nagcmd to execute external commands from the Web interface. Both Nagios users and Apache users are added to this group.
/usr/sbin/groupadd Nagcmd
/usr/sbin/usermod-g Nagcmd Nagios
/usr/sbin/usermod-g Nagcmd Apache
3. Installing Nagios
Tar xzf nagios-3.0.6.tar.gz
CD nagios-3.0.6.tar.gz
Run the Nagios configuration script and use the previously opened users and user groups:
./configure--with-command-group=nagcmd
Compiling Nagios package source code
Make All-j8
Make install
Make Install-init
Make Install-config
Make Install-commandmode
Make install-webconf #安装Nagios的WEB配置文件到Apache的conf. D Directory
4. Installing Nagios Plugin
./configure--with-nagios-user=nagios--with-nagios-group=nagios-prefix=/usr/local/nagios
Make-j8
Make install
5. Basic Configuration
Configuration file default location/usr/local/nagios/etc, you just need to make simple changes to experience nagios.
Modify the/usr/local/nagios/etc/objects/contacts.cfg, change the email address in the contact definition information of the e-mail nagiosadmin to receive the alert content in your email message. (Verify that your system can send mail to the appropriate recipient)
6. Configuring the Web Interface
Create a nagiosadmin user to log in to the Web interface for Nagios. Make a note of the login password you set, and you'll use it in a moment.
Htpasswd-c/usr/local/nagios/etc/htpasswd.users Nagiosadmin
Restart the Apache service for the settings to take effect.
Service httpd Restart
7. Start Nagios
Chkconfig--add Nagios
Chkconfig Nagios on
Service Nagios Start
This is where you can visit Http://localhost/nagios and you can see Nagios haha.
Some other files may be needed during Nagios installation, such as Apache .... I installed Nagios on the system where the ganglia was installed. So if your system prompts for some dependency when installing Nagios, you need to follow the prompts:)
The most troublesome problems I encountered during the actual installation process were:
1. Unable to start Nagios, a hint was found in system log that the/usr/local/nagios/var/rw/nagios.cmd could not be created. I found no RW directory ... I created the RW directory and changed its owner to Nagios:nagcmd, with the permission changed to DRW-RW----。 The tragedy is that the error message still exists. Finally I changed my permissions directly to drw-rw-rw-. can work, but root cause I haven't found it yet.
2. A similar error. When you view log on Nagios, you are prompted not to handle the/usr/local/nagios/var/archives permissions appropriately. The workaround is as above.
Overall, the installation process is relatively smooth hey, after running as follows:
<ignore_js_op>
Perfect cluster monitoring combination ganglia and Nagios