Deploying Ganglia to monitor Hadoop and HBase
Some performance problems often occur during Hadoop O & M. However, performance problems cannot be simply analyzed through web pages and logs. Many metrics are required. Ganglia is one of the more practical monitoring tools.
Many people have shared a lot about deploying Ganglia on Baidu. Combined with everyone's experience. Add the problems encountered during the installation process, and sort out this article.
1. Prepare two machines
Server
192.168.0.11 (gmetad, web, gmond-master)
Client
192.168.0.12 (gmond)
2. software packages to be installed on the Server
- Install the epel package: yum install-y epel-release (solve the problem that some installation packages cannot be installed by yum)
- Install gmetad: yum install-y ganglia-gmetad ganglia-devel
- Install gmond: yum install-y ganglia-gmond-python
- Rrdtool installation: yum install-y rrdtool-devel
- Httpd server installation: yum install-y httpd
- Install ganglia-web and php: yum install-y ganglia-web php
- Install other dependent packages: yum install-y apr-devel zlib-devel libconfuse-devel expat-devel pcre-devel
3. software packages to be installed on monitored nodes
- Install the epel package: yum install-y epel-release (solve the problem that some installation packages cannot be installed by yum)
- Install gmond: yum install-y ganglia-gmond-python
4. installation directory description
- Ganglia configuration file directory:/etc/ganglia
- Rrd database Directory:/var/lib/ganglia/rrds
- Httpd Main Site Directory:/var/www/html
- Ganglia-web installation directory:/usr/share/ganglia
- Ganglia-web configuration Directory:/etc/httpd/conf. d/ganglia. conf
5. Disable SELINUX
Vi/etc/selinux/config
Change SELINUX = enforcing to SELINUX = disable;
Restart the machine.
6. Disable the Firewall
# Chkconfig iptables off
# Chkconfig iptables -- list
Iptables 0: off 1: off 2: off 3: off 4: off 5: off 6: off
7. Configure/etc/ganglia/gmetad. conf
Modify data_source:
Data_source "testcluster" 192.168.0.11: 8650 # destination gmond address and port (tcp_accept_channel) of gmetad data collection)
8. Configure gmond
/Etc/ganglia/gmond. conf ):
Cluster {
Name = "testcluster" # Set the cluster name
# Owner = "unspecified"
Latlong = "unspecified"
Url = "unspecified"
}
# The address and port sent to the target gmond (unicast)
Udp_send_channel {
Host = 192.168.0.11
Port = 8649
Ttl = 1
}
# Udp receiving port
Udp_recv_channel {
Port = 8649
}
# Gmetad port used to collect data requests
Tcp_accept_channel {
Port = 8650
Gzip_output = no
}
9. Configure web
Soft connection mode
> Ln-s/usr/share/ganglia/var/www/ganglia
You can also copy/usr/share/ganglia contents to/var/www/ganglia directly.
10. Modify/etc/httpd/conf. d/ganglia. conf:
Alias/ganglia/usr/share/ganglia
<Location/ganglia>
Order deny, allow
Allow from all
</Location>
11. Start the service
# Service gmetad start
# Service gmond start
# Service httpd restart
So far, the server of Ganglia has been deployed.
Configure the client:
12. You only need to configure gmond on the client (you need to install yum-y install ganglia-gmond-python first)
/Etc/ganglia/gmond. conf ):
Cluster {
Name = "testcluster" # Set the cluster name
# Owner = "unspecified"
Latlong = "unspecified"
Url = "unspecified"
}
# The address and port sent to the target gmond (unicast)
Udp_send_channel {
Host = 192.168.248.130
Port = 8649
Ttl = 1
}
# Udp receiving port
Udp_recv_channel {
Port = 8649
}
# Gmetad port used to collect data requests
Tcp_accept_channel {
Port = 8650
Gzip_output = no
}
13. Configure HDFS and YARN to integrate Ganglia
Modify hadoop-metrics2.properties
# For Ganglia 3.1 support
*. Sink. ganglia. class = org. apache. hadoop. metrics2.sink. ganglia. GangliaSink31
*. Sink. ganglia. period = 10
# Default for supportsparse is false
*. Sink. ganglia. supportsparse = true
*. Sink. ganglia. slope = jvm. metrics. gcCount = zero, jvm. metrics. memHeapUsedM = both
*. Sink. ganglia. dmax = jvm. metrics. threadsBlocked = 70, jvm. metrics. memHeapUsedM = 40
Namenode. sink. ganglia. servers = 192.168.0.11: 8649 # For details about host, refer to the definition in gmond. conf.
Datanode. sink. ganglia. servers = 192.168.0.11: 8649
Resourcemanager. sink. ganglia. servers = 192.168.0.11: 8649
Nodemanager. sink. ganglia. servers = 192.168.0.11: 8649
Mrappmaster. sink. ganglia. servers = 192.168.0.11: 8649
Jobhistoryserver. sink. ganglia. servers = 192.168.0.11: 8649
14. Integrate HBase with Ganglia
Modify hadoop-metrics2-hbase.properties
*. Sink. file *. class = org. apache. hadoop. metrics2.sink. FileSink
# Default sampling period
*. Period = 10
*. Source. filter. class = org. apache. hadoop. metrics2.filter. GlobFilter
*. Record. filter. class =$ {*. source. filter. class}
*. Metric. filter. class =$ {*. source. filter. class}
Hbase. sink. ganglia. record. filter. exclude = * Regions *
Hbase. sink. ganglia. class = org. apache. hadoop. metrics2.sink. ganglia. GangliaSink31
Hbase. sink. ganglia. tagsForPrefix. jvm = ProcessName
*. Sink. ganglia. period = 20
Hbase. sink. ganglia. servers = 192.168.0.11: 8649 # For details about host, see the definition in gmond. conf.
15. Copy the configuration file to every machine to be monitored.
Copy the hadoop-metrics2.properties to the $ HADOOP_HOME/etc/hadoop/directory
Copy hadoop-metrics2-hbase.properties to the $ HBASE_HOME/conf directory
Restart the hadoop & hbase software to make it take effect.
16. Start monitoring gmond
Service gmond start
Problem summary:
The client has passed the information to see the overall CPU load and other information.
2. However, the information of each node is empty and "no matching metrics detected or rrds not readable" is displayed"
3. View RRDs Information
# Cd/var/lib/ganglia/rrds
# Ll
Drwxr-xr-x 5 ganglia 4096 Jan 17 azcluster
Drwxr-xr-x 2 ganglia 36864 Jan 17 _ SummaryInfo __
4. the folder name is in lower case.
# Ll
Drwxr-xr-x 2 ganglia 32768 Jan 17 azcbetadnl05.envazure.com
Drwxr-xr-x 2 ganglia 4096 Jan 17 azcbetaldapl01.envazure.com
Drwxr-xr-x 2 ganglia 36864 Jan 17 _ SummaryInfo __
5. All data has been transferred.
# Ls azcbetadnl05.envazure.com/| more
Boottime. rrd
Bytes_in.rrd
Bytes_out.rrd
Cpu_aidle.rrd
Disk_free_absolute_data1.rrd
Disk_free_absolute_data2.rrd
Disk_free_absolute_data3.rrd
Disk_free_absolute_data4.rrd
Disk_free_absolute_data5.rrd
Disk_free_absolute_dev_shm.rrd
Disk_free_absolute_mnt_resource.rrd
......
6. cause: the folders of each node in/var/lib/ganglia/rrds are in lower case. If the hostname of the node contains uppercase letters, the data cannot be found.
Solution: Modify gmetad. conf and set case_sensitive_hostnames to 1.
# Ls/etc/ganglia/
Drwxr-xr-x 2 root 4096 Jan 17 08:36 conf. d
-Rw-r -- 1 root 171 Oct 12 2015 conf. php
-Rw-r -- 1 root 9834 Jan 17 08:44 gmetad. conf
-Rw-r -- 1 root 8756 Jan 17 08:45 gmond. conf
# Vi gmetad. conf
# In earlier versions of gmetad, hostnames were handled in a case
# Sensitive manner
# If your hostname directories have been renamed to lower case,
# Set this option to 0 to disable backward compatibility.
# From version 3.2, backwards compatibility will be disabled by default.
# Default: 1 (for gmetad <3.2)
# Default: 0 (for gmetad >=3.2)
Case_sensitive_hostnames 1 # if it is set to 1, the upper case will not be changed to lower case
7. After the modification, go to the RRDs directory to view the results.
# Cd/var/lib/ganglia/rrds/azcluster
No changes
# Ls-al
Drwxr-xr-x 2 ganglia 32768 Jan 17 azcbetadnl05.envazure.com
Drwxr-xr-x 2 ganglia 4096 Jan 17 azcbetaldapl01.envazure.com
Drwxr-xr-x 2 ganglia 36864 Jan 17 _ SummaryInfo __
8. Restart gmetad to make the configuration take effect.
# Service gmetad restart
Shutting down GANGLIA gmetad: [OK]
Starting GANGLIA gmetad: [OK]
9. You can see that the folder of the upper-case host name has been created.
# Ls-al
Drwxr-xr-x 2 ganglia 32768 Jan 18 azcbetadnl05.envazure.com
Drwxr-xr-x 2 ganglia 4096 Jan 18 AZcbetadnL05.envazure.com <
Drwxr-xr-x 2 ganglia 4096 Jan 17 azcbetaldapl01.envazure.com
Drwxr-xr-x 2 ganglia 4096 Jan 18 AZcbetaLDAPL01.envazure.com <
Drwxr-xr-x 2 ganglia 36864 Jan 18 _ SummaryInfo __
10. You can see that the information has arrived.
# Ls-l AZcbetaLDAPL01.envazure.com
-Rw-1 ganglia 630760 Jan 18 boottime. rrd
-Rw-1 ganglia 630760 Jan 18 bytes_in.rrd
-Rw-1 ganglia 630760 Jan 18 bytes_out.rrd
-Rw-1 ganglia 630760 Jan 18 cpu_aidle.rrd
11. Check the webpage again. It is normal.
This article permanently updates link: https://www.bkjia.com/Linux/2018-03/151488.htm