Ganglia collects hbase metrics

Source: Internet
Author: User
Tags emit rrd hadoop ecosystem jconsole

Ganglia is an open-source monitoring project initiated by UC Berkeley designed to measure thousands of nodes. Each computer runs a gmond daemon that collects and sends metric data (such as processor speed and memory usage. It is collected from the operating system and the specified host. Hosts that receive all metric data can display the data and pass the simplified form of the data to the hierarchy. Ganglia can be well expanded just because of this hierarchical structure. Gmond has very little system load, which makes it a piece of code running on each computer in the cluster without affecting user performance.


Ganglia monitoring software is mainly used to monitor system performance, such as CPU, mem, hard disk utilization, I/O load, and network traffic, it is easy to see the working status of each node through the curve, which plays an important role in adjusting and allocating system resources and improving the overall system performance.


Hadoop and hbase support the open-source monitoring tool ganglia, which indicates that ganglia is an indispensable part of the hadoop ecosystem.


This article describes how to use ganglia to collect various hbase metrics and focus on solving the following two issues:

(1) how to filter hbase indicators that are too many?

(2) After modifying the hadoop-metrics.properties, do not need to restart hadoop or hbase.


1. hbase metrics Configuration

Taking hbase-0.98 as an example, you need to configure hadoop-metrics2-hbase.properties


# syntax: [prefix].[source|sink].[instance].[options]# See javadoc of package-info.java for org.apache.hadoop.metrics2 for details#*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSinkdefault sampling period*.period=10# Below are some examples of sinks that could be used# to monitor different hbase daemons.# hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink# hbase.sink.file-all.filename=all.metrics# hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink# hbase.sink.file0.context=hmaster# hbase.sink.file0.filename=master.metrics# hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink# hbase.sink.file1.context=thrift-one# hbase.sink.file1.filename=thrift-one.metrics# hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink# hbase.sink.file2.context=thrift-two# hbase.sink.file2.filename=thrift-one.metrics# hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink# hbase.sink.file3.context=rest# hbase.sink.file3.filename=rest.metrics*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  *.sink.ganglia.period=10  hbase.sink.ganglia.period=10  hbase.sink.ganglia.servers=172.18.144.198:8648


This class is required for ganglia 3.1 and later versions: org. Apache. hadoop. metrics2.sink. ganglia. gangliasink31


Restart hbase and gmod and you can see many metrics on the ganglia web interface. However, there are too many indicators. At the region level, each region belongs to which table, and each table has a bunch of indicators. From the RRD database of ganglia, we can see that:

-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.Namespace_default_table_o_m_ocs_ordersplitamount_region_f15998ced89264146b3ec3888db625f6_metric_scanNext_max.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.Namespace_default_table_o_m_forest_maps_region_cb5490455403d92ff2e6acd17c2b3877_metric_get_95th_percentile.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.namespace_default_table_o_s_peking_orders_region_bc87e3d9b61ee8c14956914d407ad11c_metric_mutateCount.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.namespace_default_table_o_m_ocs_ordersplitamount_region_5ca4ba7d83369781b41c939d260fdcdd_metric_mutateCount.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.namespace_default_table_o_m_ocs_orderamount_region_3ecd8c8440eae5a1191ef9e6e523ea9f_metric_appendCount.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.Namespace_default_table_o_m_ocs_orderamount20140722_region_985afc00551d4fb68ceaf9188f5b9d12_metric_get_75th_percentile.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.namespace_default_table_o_m_ocs_orderamount20140722_region_9672e9b9ea759fc1aee838e4ae228fa9_metric_memStoreSize.rrd-rw-rw-rw- 1 hadoop root 12216 Jul 23 16:40 regionserver.Regions.Namespace_default_table_o_m_chat_analysis_session20140722_region_d709618a5b60e44a03befe57ca480ef9_metric_get_num_ops.rrd



The http://hbase.apache.org/book/hbase_metrics.html above mentioned a warning to ganglia User:



Warning to ganglia users: by default, hbase will emit a lot of metrics per regionserver which may swamp your installation. Options include either increasing ganglia server capacity, or using hbase to emit fewer metrics.


By default, hbase will expose so many indicators. We have to find a way to filter and retain the required indicators. The following describes how to filter


Ii. Metric filtering fortunately hadoop metrics system provides the filtering function. hbase metric Monitoring plan:

(1) Master: Remove assignment, balancer, filtersystem. metahlog, percentile, Max, median, Min, and retain the mean value.

(2) regionserver: Remove Wal-related items, and retain the mean value for each percentile, Max, median, and Min.

(3) region: too many, table-level, all removed.


*.source.filter.class=org.apache.hadoop.metrics2.filter.RegexFilter#*.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter*.record.filter.class=${*.source.filter.class}*.metric.filter.class=${*.source.filter.class}*.period=10*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  *.sink.ganglia.period=10  hbase.sink.ganglia.metric.filter.exclude=^(.*table.*)|(\\w+metric)|(\\w+region\\w+)|(Balancer\\w+)|(\\w+Assign\\w+)|(\\w+percentile)|(\\w+max)|(\\w+median)|(\\w+min)|(MetaHlog\\w+)|(\\w+WAL\\w+)$hbase.sink.ganglia.period=10  hbase.sink.ganglia.servers=<span style="font-family: Arial, Helvetica, sans-serif;">172.18.144.198</span><span style="font-family: Arial, Helvetica, sans-serif;">:8648</span>


Filter Using a regular expression. After the modification, restart hbase and gmod. On the ganglia Web Console, we can see that there are fewer indicators, refreshing a lot, and more targeted.
Iii. hadoop metrics system management
Note that after you modify the configuration in the previous two steps, you must restart hbase to take effect, because metric system is started with hbase. If hbase is already online, violent stop will affect the service. graceful_stop will involve temporary data migration, which is not ideal. It is best not to restart.
The following describes how to separately manage the start and stop of hadoop metrics system.
The hadoop ecosystem involves a large amount of human resources. In this case, the metrics system exposes mbeans and can be controlled using JMX. First, we need to enable JMX.
Hbase-env.sh
 
export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

There is no way to restart hbase. In fact, you can start JMX only at the beginning and restart it once. The following metrics configurations can be separately started and stopped.
 


The jconsole shows that metricssystem provides four operations, namely start and stop. The test environment may be connected using jconsole or other tools, but the production environment will not work. Think about how our hadoop cluster is used in combination with ganglia and Nagios. ganglia collects data, displays charts, collects data from Nagios, and monitors alarms. This is a perfect combination. It would be better to integrate the JMX functions into Nagios.
Check that Nagios does have a JMX plug-in, but Nagios is a mechanism for checking and alerting, and there is no operation. It is difficult to put the start and stop of metric system in Nagios. You can only set up another stove and use jmxtoolkit and shell scripts.

Jmxtoolkit is relatively simple and has a lot of information on the Internet. There are some precautions during use.


Git clone https://github.com/larsgeorge/jmxtoolkit.gitcd jmxtoolkitvim build. xml # modify hbase to 0.98.1-hadoop2ant

Ant build/hbase-0.98.1-hadoop2-jmxtoolkit.jar after compilation, this jar is what we finally want. Look at the configuration file again, build/CONF/hbase-0.98.1-hadoop2-jmx.properties

I don't know what the internal implementation of jmxtoolkit is like. The indicator names here are all incorrect. In addition, we only need to operate metrics system and edit the file as follows:


[hbaseMetricsSystem]@object=Hadoop:service=HBase,name=MetricsSystem,sub=Control@url=service:jmx:rmi:///jndi/rmi://${HOSTNAME}:${PORT}/jmxrmi@user=${USER|controlRole}@password=${PASSWORD|password}*stop=VOID*start=VOID

Rename build/hbase-0.98-metrics.properties

Note that the object is the objectname of metrics system in the jconsole above, while * Stop and * Start are the methods provided by mbean.


Test:

java -cp hbase-0.98.1-hadoop2-jmxtoolkit.jar   org.apache.hadoop.hbase.jmxtoolkit.JMXToolkit -f hbase-0.98-metrics.properties  -o hbaseMetricsSystem -q stop/etc/init.d/gmond restart

You can see that the modified indicator has taken effect.


Finally, write a simple shell and throw it to the production environment.

#!/bin/bash# manage the metrics system of hbase individually.# ./hbase-metrics-system.sh start# ./hbase-metrics-system.sh stopconfig_file_template=hbase-0.98-metrics.propertiesconfig_file_product=hbase-0.98-metrics-product.properties#for i in ${hbase_hostnames=[@]};dofor i in `cat $HBASE_HOME/conf/regionservers`;do  cp ${config_file_template} ${config_file_product}  sed -i "s/\${HOSTNAME}/$i/g" ${config_file_product}  sed -i "s/\${PORT}/10102/g" ${config_file_product}  java -cp hbase-0.98.1-hadoop2-jmxtoolkit.jar   org.apache.hadoop.hbase.jmxtoolkit.JMXToolkit -f ${config_file_product}  -o hbaseMetricsSystem -q $1  echo "$1 metrics system of $i"  /bin/rm ${config_file_product}done  echo "done"

The solution is a little simple. Let's solve it first.








Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.