Ganglia monitoring custom metric practices

Source: Internet
Author: User
Tags float number http 200 apache log

Ganglia monitoring custom metric practices


The Ganglia monitoring system is an open-source UC Berkeley project designed to monitor distributed clusters. The monitoring level includes the resource level and business level, the resource layer includes cpu, memory, disk, IO, and network load. As for the business layer, you can easily add custom metric, therefore, it can be used for monitoring such as service performance, load, and error rate, such as the QPS and Http status error rate of a web service. In addition, if it is integrated with Nagios, an alarm can be triggered when an indicator exceeds a certain threshold.
Compared with zabbix, Ganglia has a low system overhead caused by the client collection agent (gmond) and does not affect the performance of related services.


Ganglia mainly has several modules:
Gmond: deployed on various monitored machines for periodically collecting data for broadcast or unicast.
Gmetad: deployed on the server, regularly pulls the data collected by gmond from the host in the configured data_source.
Ganglia-web: ship monitoring data to the web page

About the installation of ganglia this article does not introduce too much, the portal: http://www.it165.net/admin/html/201302/770.html

This document describes how to develop custom metric to monitor metrics that you are concerned about.

There are several main methods:

1. Use gmetric directly

The machine where gmond is installed will install/usr/bin/gmetric at the same time. This command is a tool that broadcasts information such as a metric name value, for example

 

/usr/bin/gmetric -c /etc/ganglia/gmond.conf --name=test --type=int32 --units=sec --value=2    
For more information about gmetric options, see: http://manpages.ubuntu.com/manpages/hardy/man1/gmetric.1.html
 
 

In addition to using gmetric through command lines, you can also use binding in common languages, such as go, Java, and python. Related binding can be used on github, you only need to import it in. Go language https://github.com/ganglia/ganglia_contrib/tree/master/ganglia-go

Ruby https://github.com/igrigorik/gmetric/blob/master/lib/gmetric.rb

Java https://github.com/ganglia/ganglia_contrib/tree/master/gmetric-java

Python https://github.com/ganglia/ganglia_contrib/tree/master/gmetric-python

2. Use a third-party tool based on gmetric

 

This paper uses ganglia-logtailer as an example: https://github.com/ganglia/ganglia_contrib/tree/master/ganglia-logtailer

 

This tool is based on the logtail (debain)/logcheck (centos) package to implement regular tail of logs, and then uses the corresponding class to analyze logs by specifying classname,

The custom metric is calculated based on the fields that interest you, and is broadcast by gmetric.
 

For example, you can modify NginxLogtailer. py according to the nginx log format of your service as follows:


 

# -*- coding: utf-8 -*-######  This plugin for logtailer will crunch nginx logs and produce these metrics:###    * hits per second###    * GETs per second###    * average query processing time###    * ninetieth percentile query processing time###    * number of HTTP 200, 300, 400, and 500 responses per second######  Note that this plugin depends on a certain nginx log format, documented in##   __init__.import timeimport threadingimport re# local dependenciesfrom ganglia_logtailer_helper import GangliaMetricObjectfrom ganglia_logtailer_helper import LogtailerParsingException, LogtailerStateExceptionclass NginxLogtailer(object):    # only used in daemon mode    period = 30    def __init__(self):        '''This function should initialize any data structures or variables        needed for the internal state of the line parser.'''        self.reset_state()        self.lock = threading.RLock()        # this is what will match the nginx lines        #log_format ganglia-logtailer        #    '$host '        #    '$server_addr '        #    '$remote_addr '        #    '- '        #    '"$time_iso8601" '        #    '$status '        #    '$body_bytes_sent '        #    '$request_time '        #    '"$http_referer" '        #    '"$request" '        #    '"$http_user_agent" '        #    '$pid';        # NOTE: nginx 0.7 doesn't support $time_iso8601, use $time_local        # instead        # original apache log format string:        # %v %A %a %u %{%Y-%m-%dT%H:%M:%S}t %c %s %>s %B %D \"%{Referer}i\" \"%r\" \"%{User-Agent}i\" %P        # host.com 127.0.0.1 127.0.0.1 - "2008-05-08T07:34:44" - 200 200 371 103918 - "-" "GET /path HTTP/1.0" "-" 23794        # match keys: server_name, local_ip, remote_ip, date, status, size,        #               req_time, referrer, request, user_agent, pid        self.reg = re.compile('^(?P
 
  [^ ]+) (?P
  
   [^ ]+) (?P
   
    [^ ]+) \[(?P
    
     [^\]]+)\] "(?P
     
      [^"]+)" (?P
      
       [^ ]+) (?P
       
        [^ ]+) "(?P
        
         [^"]+)" "(?P
         
          [^"]+)" "(?P
          
           [^"]+)" "(?P
           
            [^"]+)"') # assume we're in daemon mode unless set_check_duration gets called self.dur_override = False # example function for parse line # takes one argument (text) line to be parsed # returns nothing def parse_line(self, line): '''This function should digest the contents of one line at a time, updating the internal state variables.''' self.lock.acquire() try: regMatch = self.reg.match(line) if regMatch: linebits = regMatch.groupdict() if '-' == linebits['request'] or 'file2get' in linebits['request']: self.lock.release() return self.num_hits+=1 # capture GETs if( 'GET' in linebits['request'] ): self.num_gets+=1 # capture HTTP response code rescode = float(linebits['status']) if( (rescode >= 200) and (rescode < 300) ): self.num_two+=1 elif( (rescode >= 300) and (rescode < 400) ): self.num_three+=1 elif( (rescode >= 400) and (rescode < 500) ): self.num_four+=1 elif( (rescode >= 500) and (rescode < 600) ): self.num_five+=1 # capture request duration dur = float(linebits['req_time']) self.req_time += dur # store for 90th % calculation self.ninetieth.append(dur) else: raise LogtailerParsingException, "regmatch failed to match" except Exception, e: self.lock.release() raise LogtailerParsingException, "regmatch or contents failed with %s" % e self.lock.release() # example function for deep copy # takes no arguments # returns one object def deep_copy(self): '''This function should return a copy of the data structure used to maintain state. This copy should different from the object that is currently being modified so that the other thread can deal with it without fear of it changing out from under it. The format of this object is internal to the plugin.''' myret = dict( num_hits=self.num_hits, num_gets=self.num_gets, req_time=self.req_time, num_two=self.num_two, num_three=self.num_three, num_four=self.num_four, num_five=self.num_five, ninetieth=self.ninetieth ) return myret # example function for reset_state # takes no arguments # returns nothing def reset_state(self): '''This function resets the internal data structure to 0 (saving whatever state it needs). This function should be called immediately after deep copy with a lock in place so the internal data structures can't be modified in between the two calls. If the time between calls to get_state is necessary to calculate metrics, reset_state should store now() each time it's called, and get_state will use the time since that now() to do its calculations''' self.num_hits = 0 self.num_gets = 0 self.req_time = 0 self.num_two = 0 self.num_three = 0 self.num_four = 0 self.num_five = 0 self.ninetieth = list() self.last_reset_time = time.time() # example for keeping track of runtimes # takes no arguments # returns float number of seconds for this run def set_check_duration(self, dur): '''This function only used if logtailer is in cron mode. If it is invoked, get_check_duration should use this value instead of calculating it.''' self.duration = dur self.dur_override = True def get_check_duration(self): '''This function should return the time since the last check. If called from cron mode, this must be set using set_check_duration(). If in daemon mode, it should be calculated internally.''' if( self.dur_override ): duration = self.duration else: cur_time = time.time() duration = cur_time - self.last_reset_time # the duration should be within 10% of period acceptable_duration_min = self.period - (self.period / 10.0) acceptable_duration_max = self.period + (self.period / 10.0) if (duration < acceptable_duration_min or duration > acceptable_duration_max): raise LogtailerStateException, "time calculation problem - duration (%s) > 10%% away from period (%s)" % (duration, self.period) return duration # example function for get_state # takes no arguments # returns a dictionary of (metric => metric_object) pairs def get_state(self): '''This function should acquire a lock, call deep copy, get the current time if necessary, call reset_state, then do its calculations. It should return a list of metric objects.''' # get the data to work with self.lock.acquire() try: mydata = self.deep_copy() check_time = self.get_check_duration() self.reset_state() self.lock.release() except LogtailerStateException, e: # if something went wrong with deep_copy or the duration, reset and continue self.reset_state() self.lock.release() raise e # crunch data to how you want to report it hits_per_second = mydata['num_hits'] / check_time gets_per_second = mydata['num_gets'] / check_time if (mydata['num_hits'] != 0): avg_req_time = mydata['req_time'] / mydata['num_hits'] else: avg_req_time = 0 two_per_second = mydata['num_two'] / check_time three_per_second = mydata['num_three'] / check_time four_per_second = mydata['num_four'] / check_time five_per_second = mydata['num_five'] / check_time # calculate 90th % request time ninetieth_list = mydata['ninetieth'] ninetieth_list.sort() num_entries = len(ninetieth_list) if (num_entries != 0 ): ninetieth_element = ninetieth_list[int(num_entries * 0.9)] else: ninetieth_element = 0 # package up the data you want to submit hps_metric = GangliaMetricObject('nginx_hits', hits_per_second, units='hps') gps_metric = GangliaMetricObject('nginx_gets', gets_per_second, units='hps') avgdur_metric = GangliaMetricObject('nginx_avg_dur', avg_req_time, units='sec') ninetieth_metric = GangliaMetricObject('nginx_90th_dur', ninetieth_element, units='sec') twops_metric = GangliaMetricObject('nginx_200', two_per_second, units='hps') threeps_metric = GangliaMetricObject('nginx_300', three_per_second, units='hps') fourps_metric = GangliaMetricObject('nginx_400', four_per_second, units='hps') fiveps_metric = GangliaMetricObject('nginx_500', five_per_second, units='hps') # return a list of metric objects return [ hps_metric, gps_metric, avgdur_metric, ninetieth_metric, twops_metric, threeps_metric, fourps_metric, fiveps_metric, ]
           
          
         
        
       
      
     
    
   
  
 

 

After ganglia-logtailer is deployed on the monitored machine, run the following command to create a crond task:

*/1 * root/usr/local/bin/ganglia-logtailer -- classname NginxLogtailer -- log_file/usr/local/nginx-video/logs/access. log -- mode cron -- gmetric_options '-C test_cluster-g nginx_status'

Reload crond service. After one minute, you can see the corresponding metric Information On ganglia web:

VcyoICZndDsgZ2FuZ2xpYbzgv9jX1Lao0uXWuLHqyrW8 + SAmZ3Q7IGltYWdlMjAxNS03LTEzIDExOjU2OjEwLnBuZw = "/>

For the deployment method of ganglia-logtailer, see: https://github.com/ganglia/ganglia_contrib/tree/master/ganglia-logtailer

3. Write your own module in Supported languages. This article uses Python as an example.

Ganglia allows you to write your own Python module. The following is a brief introduction to github:

Writing a Python module is very simple. You just need to write it following a template and put the resulting Python module (. py) in/usr/lib (64)/ganglia/python_modules.

A corresponding Python Configuration (. pyconf) file needs to reside in/etc/ganglia/conf. d /.

For example, compile an example Python file to check machine Temperature

 

acpi_file = "/proc/acpi/thermal_zone/THRM/temperature"def temp_handler(name):      try:        f = open(acpi_file, 'r')    except IOError:        return 0    for l in f:        line = l.split()    return int(line[1])def metric_init(params):    global descriptors, acpi_file    if 'acpi_file' in params:        acpi_file = params['acpi_file']    d1 = {'name': 'temp',        'call_back': temp_handler,        'time_max': 90,        'value_type': 'uint',        'units': 'C',        'slope': 'both',        'format': '%u',        'description': 'Temperature of host',        'groups': 'health'}    descriptors = [d1]    return descriptorsdef metric_cleanup():    '''Clean up the metric module.'''    pass#This code is for debugging and unit testingif __name__ == '__main__':    metric_init({})    for d in descriptors:        v = d['call_back'](d['name'])        print 'value for %s is %u' % (d['name'],  v)

With the module function file, you also need to write a corresponding configuration file (under/etc/ganglia/conf. d/temp. pyconf) in the following format:

 

 

modules {  module {    name = "temp"    language = "python"    # The following params are examples only    #  They are not actually used by the temp module    param RandomMax {      value = 600    }    param ConstantValue {      value = 112    }  }}collection_group {  collect_every = 10  time_threshold = 50  metric {    name = "temp"    title = "Temperature"    value_threshold = 70  }}

 

With these two files, this module is successfully added.

For more modules contributed by users, see https://github.com/ganglia/gmond_python_modules

This module is useful for monitoring metric of common services such as elasticsearch, filecheck, nginx_status, and MySQL. You only need to make some modifications to meet your needs.


Other useful tools for user contribution
Ganglia-alert: Get gmetad data and alert https://github.com/ganglia/ganglia_contrib/tree/master/ganglia-alert
Ganglia-docker: Use ganglia, https://github.com/ganglia/ganglia_contrib/tree/master/docker in docker
Gmetad-health-check: monitoring the gmetad service status, restart service, https://github.com/ganglia/ganglia_contrib/tree/master/gmetad_health_checker if down
Chef-ganglia: deploy ganglia with chef, https://github.com/ganglia/chef-ganglia
Ansible-ganglia: automated deployment of ganglia with ansible, https://github.com/remysaissy/ansible-ganglia
Ganglia-nagios: integrated with nagios and ganglia, https://github.com/ganglia/ganglios
Ganglia-api: provide external rest api, return the data collected by gmetad in a specific format, https://github.com/guardian/ganglia-api

If you have any questions, please leave a message.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.