One, the problem description
Ganglia the various components of the installation (not I installed, only know this information):
The cluster has a total of 4 machines, namely 192.168.121.34-37. Gmetad, Gweb and httpd are installed on the 192.168.121.34 and Gmond are installed on the 34,35,36,37 machine.
Visit Ganglia-web Home: http://192.168.121.34/ganglia-web/The following error occurred:
there is an error collecting ganglia data (127.0.0.1:8652): Fsockopen Error:connection refused
View Gmetad status, hint: Gmetad dead but Subsys locked
While viewing the Gmond status on 192.168.121.34, 35, 36, 37: Service Gmond status, all is normal: Gmond (PID 30260) is running ...
This article says that there is a problem with permissions in the/var/lib/ganglia/rrds/directory: that is, the user rights are nobody, and the group permissions are root.
But my directory user rights are nobody, and the group permissions are root. Therefore, it has nothing to do with it.
Running on 192.168.121.35 machine: telnet 192.168.121.34 8652 hint: Connection refused
For root users: NETSTAT-ANP | grep 8652 did not find any information about Port 8652.
Other articles say the various modifications to the configuration file are not attempted.
Later found that the Rrds directory is too large, the entire partition has accounted for 97%.
The installed cluster HDFs also reported insufficient disk space warnings. After cleaning up the disk:
And then restart the Gmetad service (services Gmetad restart), actually. That's weird.
At this point, visit the Ganglia-web home page to see a variety of monitoring images.
Second, some basic knowledge of ganglia
①ganglia monitoring system mainly consists of three parts, Gmond, Gmetad, and Web Interface (Ganglia-web).
The Gmond is installed on each machine to be monitored and collects various monitoring indicators, which can be either sender or receiver. That is, it can collect some of the monitoring metrics (metrics, such as CPU utilization, system load) on the machine, or it can send the information it collects to gmond installed on other machines.
Gmetad periodically polls each Gmond to store the Monitoring metrics (metric) collected by each gmond into an rdd file.
Ganglia-web needs to be installed on the same machine as the Gmetad because it requires access to the Gmetad Rdd file, which displays the various metrics in the Rdd file as a web interface.
② By default, Gmond uses UDP port 8649 traffic, and Gmetad uses TCP port 8649 to download monitoring metrics (metric) from various gmond.
Some other basic references: Https://github.com/ganglia/monitor-core/wiki/Ganglia-Quick-Start
It also describes how to install Ganglia to monitor multiple clusters.
Ganglia Introduction and solving fsockopen error:connection refused problem