Use ganglia to monitor mongodb Clusters
A few days ago, I submitted a blog post on ganglia monitoring storm cluster. This article will introduce how to use ganglia to monitor mongdb clusters. Because we need to use ganglia to unify the world.
1. ganglia Extension Mechanism
To use ganglia to monitor mongodb clusters, you must first understand the ganglia extension mechanism. The ganglia plug-in provides us with two methods to expand the ganglia monitoring function:
1) Add the embedded (in-band) Plug-in through the gmetric command.
This is a common method. It mainly uses the cronjob method and the ganglia gmetric command to input data to gmond for unified monitoring. This method is simple, A small amount of monitoring can be used, but it is difficult to manage monitoring data in a unified manner for large-scale custom monitoring.
2) add some additional scripts to monitor the system, mainly through the C or python interface.
After ganglia3.1.x, a C or Python interface is added, through which data collection modules can be customized, and these modules can be directly inserted into gmond to monitor user-defined applications.
2. Monitor mongdb using python scripts
We use the python script to monitor mongodb clusters. After all, it is more convenient to expand through the python script. When we need to add monitoring information, we can add monitoring data to the corresponding py script, which is very convenient, high scalability and simple porting.
2.1 environment Configuration
To use the python script to implement ganglia monitoring extension, you must first specify modpython. whether the so file exists. This file is the dynamic link library that ganglia calls python. To develop the ganglia plug-in through the python interface, you must compile and install this module. The modpython. so file is stored in the lib (or lib64)/ganglia/directory under the ganglia installation directory. If it exists, you can write the following script. If it does not exist, you need to re-compile and install gmond. The "-- with-python" parameter is included during compilation and installation ".
2.2 write monitoring scripts
Open/etc/gmond In the ganglia installation directory. conf file. You can see include ("/usr/local/ganglia/etc/conf in client monitoring. d /*. conf ") indicates that the gmond service directly scans the monitoring configuration file under the directory, so we need to put the monitoring configuration script in/etc/conf. d/directory and name it XX. conf, so we will name the configuration script for monitoring mongdb as mongdb. conf
1) view the modpython. conf file
Modpython. conf is located in the/etc/conf. d/directory. The file content is as follows:
- Modules {
- Module {
- Name = "python_module" # Main module text
- Path = "modpython. so" # dynamic link library required by ganglia to expand python scripts
- Params = "/usr/local/ganglia/lib64/ganglia/python_modules" # location where python scripts are stored
- }
- }
- Include ("/usr/local/ganglia/etc/conf. d/*. pyconf") # ganglia extension stores the path of the configuration script
Therefore, we need to place the configuration script and py script in the corresponding directory to extend ganglia monitoring mongodb using python, and then restart the ganglia service to complete mongdb monitoring. The following describes how to write the script.
2) create a mongodb. pyconf script
Note that you must use the root permission to create and edit a script, which is stored in the conf. d directory. For details about how to collect mongdb parameters, refer.
- Modules {
- Module {
- Name = "mongodb" # Module name, which must be the same as the python script name stored in the path specified by "/usr/lib64/ganglia/python_modules ".
- Language = "python" # declare to use the python language
- # Parameter list. All parameters are transmitted as a dict (map) to the metric_init (params) function of the python script.
- Param server_status {
- Value = "mongo path -- host -- port 27017 -- quiet -- eval 'printjson (db. serverStatus ())'"
- }
- Param rs_status {
- Value = "mongo path -- host -- port 2701 -- quiet -- eval 'printjson (rs. status ())'"
- }
- }
- }
- # List of metric to be collected. Any metric can be expanded in one module.
- Collection_group {
- Collect_every = 30
- Time_threshold = 90 # maximum sending Interval
- Metric {
- Name = "mongodb_opcounters_insert" # name OF metric in the module
- Title = "Inserts" # title displayed on the graphic interface
- }
- Metric {
- Name = "mongodb_opcounters_query"
- Title = "Queries"
- }
- Metric {
- Name = "mongodb_opcounters_update"
- Title = "Updates"
- }
- Metric {
- Name = "mongodb_opcounters_delete"
- Title = "Deletes"
- }
- Metric {
- Name = "mongodb_opcounters_getmore"
- Title = "Getmores"
- }
- Metric {
- Name = "mongodb_opcounters_command"
- Title = "Commands"
- }
- Metric {
- Name = "mongodb_backgroundFlushing_flushes"
- Title = "Flushes"
- }
- Metric {
- Name = "mongodb_mem_mapped"
- Title = "Memory-mapped Data"
- }
- Metric {
- Name = "mongodb_mem_virtual"
- Title = "Process Virtual Size"
- }
- Metric {
- Name = "mongodb_mem_resident"
- Title = "Process Resident Size"
- }
- Metric {
- Name = "mongodb_extra_info_page_faults"
- Title = "Page Faults"
- }
- Metric {
- Name = "mongodb_globalLock_ratio"
- Title = "Global Write Lock Ratio"
- }
- Metric {
- Name = "mongodb_indexCounters_btree_miss_ratio"
- Title = "BTree Page Miss Ratio"
- }
- Metric {
- Name = "mongodb_globalLock_currentQueue_total"
- Title = "Total Operations Waiting for Lock"
- }
- Metric {
- Name = "mongodb_globalLock_currentQueue_readers"
- Title = "Readers Waiting for Lock"
- }
- Metric {
- Name = "mongodb_globalLock_currentQueue_writers"
- Title = "Writers Waiting for Lock"
- }
- Metric {
- Name = "mongodb_globalLock_activeClients_total"
- Title = "Total Active Clients"
- }
- Metric {
- Name = "mongodb_globalLock_activeClients_readers"
- Title = "Active Readers"
- }
- Metric {
- Name = "mongodb_globalLock_activeClients_writers"
- Title = "Active Writers"
- }
- Metric {
- Name = "mongodb_connections_current"
- Title = "Open Connections"
- }
- Metric {
- Name = "mongodb_connections_current_ratio"
- Title = "Open Connections"
- }
- Metric {
- Name = "mongodb_slave_delay"
- Title = "Replica Set Slave Delay"
- }
- Metric {
- Name = "mongodb_asserts_total"
- Title = "Asserts per Second"
- }
- }
From the above, you can find that the configuration file is written in the same way as the gmond. conf syntax. For more information, see gmond. conf.
3) create a mongodb. py script
Replace mongodb. the py file is stored in the lib64/ganglia/python_modules directory. You can see that many python scripts exist in this directory. For example: monitors disks, memory, networks, mysql, redis, and other scripts. You can use these python scripts to write mongodb. py. We open some of the scripts and we can see that each script has a function metric_init (params). We also mentioned that the parameters sent from mongodb. pyconf are passed to the metric_init function.
- #! /Usr/bin/env python
- Import json
- Import OS
- Import re
- Import socket
- Import string
- Import time
- Import copy
- NAME_PREFIX = 'mongodb _'
- PARAMS = {
- 'Server _ status': '/bin/mongo path -- host -- port 27017 -- quiet -- eval "printjson (db. serverStatus ())"',
- 'Rs _ status': '/bin/mongo path -- host -- port 27017 -- quiet -- eval "printjson (rs. status ())"'
- }
- METRICS = {
- 'Time': 0,
- 'Data ':{}
- }
- LAST_METRICS = copy. deepcopy (METRICS)
- METRICS_CACHE_TTL = 3
- Def flatten (d, pre = '', sep = '_'):
- "Flatten a dict (I. e. dict ['a'] ['B'] ['C'] => dict ['A _ B _c'])"
- New_d = {}
- For k, v in d. items ():
- If type (v) = dict:
- New_d.update (flatten (d [k], '% s % s' % (pre, k, sep )))
- Else:
- New_d ['% s % s' % (pre, k)] = v
- Return new_d
- Def get_metrics ():
- "" Return all metrics """
- Global METRICS, LAST_METRICS
- If (time. time ()-METRICS ['time'])> METRICS_CACHE_TTL:
- Metrics = {}
- For status_type in PARAMS. keys ():
- # Get raw metric data
- O = OS. popen (PARAMS [status_type])
- # Clean up
- Metrics_str = ''. join (o. readlines (). strip () # convert to string
- Metrics_str = re. sub ('\ w + \ (. *) \)', r "\ 1", metrics_str) # remove functions
- # Convert to flattened dict
- Try:
- If status_type = 'server _ status ':
- Metrics. update (flatten (json. loads (metrics_str )))
- Else:
- Metrics. update (flatten (json. loads (metrics_str), pre = '% s _' % status_type ))
- Failed t ValueError:
- Metrics = {}
- # Update cache
- LAST_METRICS = copy. deepcopy (METRICS)
- METRICS = {
- 'Time': time. time (),
- 'Data': metrics
- }
- Return [METRICS, LAST_METRICS]
- Def get_value (name ):
- "" Return a value for the requested metric """
- # Get metrics
- Metrics = get_metrics () [0]
- # Get value
- Name = name [len (NAME_PREFIX):] # remove prefix from name
- Try:
- Result = metrics ['data'] [name]
- Counter t StandardError:
- Result = 0
- Return result
- Def get_rate (name ):
- "" Return change over time for the requested metric """
- # Get metrics
- [Curr_metrics, last_metrics] = get_metrics ()
- # Get rate
- Name = name [len (NAME_PREFIX):] # remove prefix from name
- Try:
- Rate = float (curr_metrics ['data'] [name]-last_metrics ['data'] [name])/\
- Float (curr_metrics ['time']-last_metrics ['time'])
- If rate <0:
- Rate = float (0)
- Counter t StandardError:
- Rate = float (0)
- Return rate
- Def get_opcounter_rate (name ):
- "" Return change over time for an opcounter metric """
- Master_rate = get_rate (name)
- Repl_rate = get_rate (name. replace ('opcounters _ ', 'opcountersrepl _'))
- Return master_rate + repl_rate
- Def get_globalLock_ratio (name ):
- "Return the global lock ratio """
- Try:
- Result = get_rate (NAME_PREFIX + 'globallock _ locktime ')/\
- Get_rate (NAME_PREFIX + 'globallock _ totaltime') * 100
- Except t ZeroDivisionError:
- Result = 0
- Return result
- Def get_indexCounters_btree_miss_ratio (name ):
- "Return the btree miss ratio """
- Try:
- Result = get_rate (NAME_PREFIX + 'indexcounters _ btree_misses ')/\
- Get_rate (NAME_PREFIX + 'indexcounters _ btree_accesses ') * 100
- Except t ZeroDivisionError:
- Result = 0
- Return result
- Def get_connections_current_ratio (name ):
- "Return the percentage of connections used """
- Try:
- Result = float (get_value (NAME_PREFIX + 'ons ons _ current '))/\
- Float (get_value (NAME_PREFIX + 'ons ons _ available ') * 100
- Except t ZeroDivisionError:
- Result = 0
- Return result
- Def get_slave_delay (name ):
- "Return the replica set slave delay """
- # Get metrics
- Metrics = get_metrics () [0]
- # No point checking my optime if I'm not replicating
- If 'rs _ status_mystate' not in metrics ['data'] or metrics ['data'] ['rs _ status_mystate']! = 2:
- Result = 0
- # Compare my optime with the master's
- Else:
- Master = {}
- Slave = {}
- Try:
- For member in metrics ['data'] ['rs _ status_members ']:
- If member ['state'] = 1:
- Master = member
- If member ['name']. split (':') [0] = socket. getfqdn ():
- Slave = member
- Result = max (0, master ['optime'] ['T']-slave ['optime'] ['T'])/1000
- Failed t KeyError:
- Result = 0
- Return result
- Def get_asserts_total_rate (name ):
- "Return the total number of asserts per second """
- Return float (reduce (lambda memo, obj: memo + get_rate ('% sasserts _ % s' % (NAME_PREFIX, obj), ['regular', 'warning ', 'msg ', 'user', 'rollvers'], 0 ))
- Def metric_init (lparams ):
- "Initialize metric descriptors """
- Global PARAMS
- # Set parameters
- For key in lparams:
- PARAMS [key] = lparams [key]
- # Define descriptors
- Time_max = 60
- Groups = 'mongodb'
- Descriptors = [
- {
- 'Name': NAME_PREFIX + 'opcounters _ insert ',
- 'Call _ back': get_opcounter_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'inserts/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'inserts ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'opcounters _ query ',
- 'Call _ back': get_opcounter_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'queries/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'querys ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'opcounters _ Update ',
- 'Call _ back': get_opcounter_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'updates/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'updates ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'opcounters _ delete ',
- 'Call _ back': get_opcounter_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'deletees/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'deletees ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'opcounters _ getmore ',
- 'Call _ back': get_opcounter_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'getmores/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'getmores ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'opcounters _ command ',
- 'Call _ back': get_opcounter_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'commands/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'commands ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'backgroundflushing _ flushes ',
- 'Call _ back': get_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'flushes/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'flushes ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'mem _ mapped ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'mb ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'memory-mapped data ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'mem _ virtual ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'mb ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'process Virtual size ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'mem _ resident ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'mb ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'process Resident size ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'extra _ info_page_faults ',
- 'Call _ back': get_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'faults/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'page Faults ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ ratio ',
- 'Call _ back': get_globalLock_ratio,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ':' % ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'Global Write Lock Ratio ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'indexcounters _ btree_miss_ratio ',
- 'Call _ back': get_indexCounters_btree_miss_ratio,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ':' % ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'btree Page Miss Ratio ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ currentQueue_total ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'operations ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'total Operations Waiting for lock ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ currentqueue_readers ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'operations ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'readers Waiting for lock ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ currentQueue_writers ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'operations ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'writers Waiting for lock ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ activeClients_total ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'clients ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'total Active Clients ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ activeclients_readers ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'clients ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'Active readers ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'globallock _ activeClients_writers ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'clients ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'Active Writers ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'ons ons _ current ',
- 'Call _ back': get_value,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'connections ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'Open connections ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'ons ons _ current_ratio ',
- 'Call _ back': get_connections_current_ratio,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ':' % ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'centage of Connections used ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'slave _ delay ',
- 'Call _ back': get_slave_delay,
- 'Time _ max ': time_max,
- 'Value _ type': 'uint ',
- 'Units ': 'seconds ',
- 'Slope': 'both ',
- 'Format': '% U ',
- 'Description': 'replace Set Slave delay ',
- 'Groupup': groups
- },
- {
- 'Name': NAME_PREFIX + 'asserts _ total ',
- 'Call _ back': get_asserts_total_rate,
- 'Time _ max ': time_max,
- 'Value _ type': 'float ',
- 'Units ': 'asserts/Sec ',
- 'Slope': 'both ',
- 'Format': '% F ',
- 'Description': 'asserts ',
- 'Groupup': groups
- }
- ]
- Return descriptors
- Def metric_cleanup ():
- "Cleanup """
- Pass
- # The following code is for debugging and testing
- If _ name _ = '_ main __':
- Descriptors = metric_init (PARAMS)
- While True:
- For d in descriptors:
- Print ('% s = % s') % (d ['name'], d ['format']) % (d ['call _ back'] (d ['name'])
- Print''
- Time. sleep (METRICS_CACHE_TTL)
Functions that must be rewritten in the python extension script include metric_init (params) and metric_cleanup ()
The metric_init () function is called during module initialization. A metric description dictionary or dictionary list must be returned. mongdb. py returns the dictionary list.
The Metric dictionary is defined as follows:
D = {'name': '<your_metric_name>', # The name must be consistent with the name in the pyconf file.
'Call _ back': <call_back function>,
'Time _ max ': int (<your_time_max> ),
'Value _ type': '<string | uint | float | double> ',
'Units ':' <your_units> ',
'Slope': '<zero | positive | negative | both> ',
'Format': '<your_format> ',
'Description': '<your_description>'
}
The metric_cleanup () function is called when the module ends and no data is returned.
4) view monitoring statistics on the web end
After the script is compiled, restart the gmond service.