Architecture design and implementation of enterprise operation and maintenance monitoring platform (ganglia)

Last Update:2016-04-06 Source: Internet

Author: User

Tags rrd rrdtool snmp

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First,Cacti/nagios/zabbix/centreon/gangliaThe Choice1,Cacti

Cacti is a set of graphical analysis tools for network traffic monitoring based on PHP,MYSQL,SNMP and RRDTool development.

to put it simply Cacti is a PHP program. It obtains remote network devices and related information by using the SNMP protocol (which is actually obtained using the Snmpget and snmpwalk commands of the NET-SNMP software package ) and RRDTOOL tool drawing, shown through PHP program. We use it to show the status or performance trend of a monitored object over time.

2,Nagios

Nagios is an open source, free network monitoring tool that effectively monitors Windows,Linux and Unix host status, network settings such as switch routers, printers, and more. Send mail or SMS alarm when the system or service status is abnormal the first time to notify the site operators, after the status of the resumption of normal mail or SMS notification.

3,Zabbix

Zabbix is an enterprise-Class open source solution that provides distributed system monitoring and network monitoring capabilities based on a WEB interface. Zabbix can monitor various network parameters, ensure the safe operation of the server system, and provide a soft notification mechanism for the system administrator to quickly locate / resolve various problems.

The Zabbix is composed of 2 parts,zabbixserver and optional components Zabbix agent. Zabbix Server provides access toremote servers via SNMP, Zabbix agent, Ping, port monitoring and other methods Network status monitoring, data collection and other functions, it can be run on Linux, Solaris, HP-UX, AIX, free BSD, Open BSD, OS X and other platforms.

4,Ganglia

The Ganglia is a scalable,distributed monitoring system designed for HPC (high performance computing) clusters that monitors and displays the various state information of nodes in a cluster, which is captured by the Gmond daemon running on each node CPU , memory, hard disk utilization,I/O load, network traffic, etc., and then aggregated into the Gmetad daemon, using RRDtool to store data, Finally, the historical data is presented in a curved way through the PHP page.

Ganglia Monitoring System is composed of three parts, namely Gmond,Gmetad,webfrontend.

5,Centreon

Centreon is a powerful, distributed it monitoring system that enables monitoring of networks, operating systems, and applications through third-party components: First, it is open source, we can use it for free, and second, its bottom-level uses Nagios as a monitoring software, Nagios writes the monitored data to the database periodically through the Ndoutil module, while Centreon reads the data from the database in real time and through the Web interface to display monitoring data, and finally, we can manage and configure Nagios via Centreon,or Centreon is a management configuration tool for Nagios that Centreon provides a Web configuration interface that makes it easy to complete the various cumbersome configurations of Nagios.

6, contrast chart 650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/7E/93/wKioL1cE11fQj5mwAAJzvqsn6jk700.jpg "title=" Qq20160405150819.jpg "alt=" Wkiol1ce11fqj5mwaajzvqsn6jk700.jpg "/>

Second,design idea of unified operation and maintenance monitoring platform

To build an intelligent operation and maintenance monitoring platform, we must focus on the two aspects of operation monitoring and fault alarm, including network resources, hardware resources, software resources, database resources, etc. in all business systems into a unified operation and maintenance monitoring platform, and by eliminating the difference of management software and data collection means, For a variety of different data sources to achieve unified management, unified standardization, unified processing, unified display, unified user login, unified access control, and ultimately achieve operational standardization, automation, intelligent large operation and maintenance management.

Intelligent operation and maintenance monitoring platform, the design structure from low to high can be divided into 6 layers, three modules, such as:

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/7E/93/wKioL1cE17mzjL46AAByOXMa-Vw832.jpg "title=" Qq20160406172750.jpg "alt=" Wkiol1ce17mzjl46aabyoxma-vw832.jpg "/>

Operation and maintenance monitoring platform to achieve the topology diagram, see:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/7E/97/wKiom1cE1yGwsUgWAABXwDX2EtM258.jpg "title=" Qq20160406172759.jpg "alt=" Wkiom1ce1ygwsugwaabxwdx2etm258.jpg "/>

Third,Gangliathe installation1,Gangliathe common architecture

Ganglia Monitoring System is composed of three parts, namely Gmond,Gmetad,webfrontend, as shown:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/7E/93/wKioL1cE1-HTVPzlAABOAgMoYdU446.jpg "title=" Qq20160406172810.jpg "alt=" Wkiol1ce1-htvpzlaaboagmoydu446.jpg "/>

Meanwhile, Ganglia supports a variety of monitoring architectures, which are determined by the characteristics of Gmetad,Gmetad can periodically collect data from multiple Gmond nodes, which is Ganglia Two-tier architecture. At the same time,Gmetad can not only collect data from Gmond, but also get data from other Gmetad , which forms the Gnaglia three-layer architecture. Many architectures also embody the flexibility and extensibility of Ganglia as a distributed monitoring system.

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M00/7E/97/wKiom1cE15nx74suAABQDidwbGs078.jpg "title=" Qq20160406172826.jpg "alt=" Wkiom1ce15nx74suaabqdidwbgs078.jpg "/>

2,YumMode installationGanglia

The default Yum source in the CentOS system does not contain Ganglia, so we must install the extended Yum source. Download the Linux add-on package (EPEL) from the address belowand install the extended Yum Source:

[[Email protected] ~] #wgethttp://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm[[email Protected] ~]# rpm-ivhepel-release-5-4.noarch.rpm complete the Yum source installation, you can install ganglia directly through Yum. The installation of ganglia is divided into two parts, namely Gmetad and Gmond,gmetad installed on the monitoring management side, Gmond installed in the need to monitor the client host, the corresponding Yum package name is Ganglia-gmetad and Ganglia-gmond respectively. The following describes the process of installing ganglia through yum. The following operations are performed on the monitoring management side by first looking at the available ganglia installation information via the yum command: [[email protected] ~] #yum list ganglia* installation Gmetad need RRDtool support, and by Yum Way, Automatically finds the installation package that Gmetad relies on to automatically complete the installation, which is also the advantage of Yum mode installation. Finally, install the Gmond service on all client hosts that need to be monitored: [[email protected] ~]# yum-y install ganglia-gmond.x86_64 so the ganglia monitoring system is complete. The ganglia default profile installed through Yum is located in/etc/ganglia.

3,GangliaMonitoring the management side configuration

The profile on the monitoring management side is gmetad.conf

data_source  "Cluster1"  cloud0cloud2gridname  "Iiveygrid" xml_port 8651interactive_port  8652rrd_rootdir "/var/lib/ganglia/rrds"      data_source: This parameter defines the name of the cluster and the nodes in the cluster. Cluster1 is the name of this cluster, cloud0 and CLOUD2 indicate that data is collected from these two nodes, the node name specified after Cluster1 can be either an IP address or a hostname, due to the multicast mode, Each gmond node has all the monitoring data for this Cluster1 cluster node, so it is not necessary to write all the nodes into data_source. However, it is recommended to write no less than 2, so that when the CLOUD0 node fails, Gmetad will automatically collect data to the CLOUD2 node, thus guaranteeing the high availability of the ganglia monitoring system. The above through the Data_source parameter defines a server cluster Cluster1, for monitoring multiple application systems, but also for different purposes of the host group, define multiple server clusters, grouping can be defined by the following method:data_source  "My cluster"  10localhost  my.machine.edu:8649  1.2.3.5:8655data_source  "my  grid " 501.3.4.7:8655 grid.org:8651 grid-backup.org:8651data_source " another  SOURCE "1.3.4.7:8655  1.3.4.8 can monitor multiple server clusters by defining multiple data_source, and each server cluster can use the form of host name or IP address when defining cluster nodes. can also add the port, if not add the port, the default port is 8649, at the same time can set the frequency of data acquisition, such as the above "10 localhost, 50 1.3.4.7:8655", respectively, each 10 seconds, 50 seconds to collect data.      gridname: This parameter is defined by a grid name. A grid consists of multiple server clusters, each of which is defined by the "data_source" option.      xml_port: This parameter defines an interactive port that collects data summaries, and if not specified, the default is 8651, which allows you to get all the data from the client that is collected by the monitoring management side via Telnet.      interactive_port: This parameter defines the port on which the web side obtains data, which needs to be specified when configuring the Ganglia Web monitoring interface.      rrd_rootdir: This parameter defines the storage path of the RRD database, which is updated to the corresponding RRD database in the Gmetad after the monitoring data is collected.

4,Gangliathe Client Configuration

Ganglia Monitoring Client gmond installation is complete, the configuration file is located in the Ganglia installation path of the etc directory under the name gmond.conf, This configuration file is slightly more complex, as follows:

globals {daemonize = yes   #是否后台运行, which represents the way the future station runs setuid = yes                 #是否设置运行用户, need to be set to False in Windows  user = nobody     #设置运行的用户名称, must be a user of the operating system already exists, default is nobodydebug_level =  0    #调试级别, the default is 0, which means no log output, the larger the number indicates the more log output max_udp_msg_len = 1472 mute =  no      #是否发送监控数据到其他节点, set to no indicates that this node will no longer broadcast any of its own collected data to the network  deaf = no       #是否接受其他节点发送过来的监控数据, set to no indicates that this node will no longer receive packets broadcast by any other node allow_extra_data = yes# Whether to send extended data host_dmax = 0 /*secs */#是否删除一个节点, 0 represents never deleted, and an integer other than 0 represents the non-response time of the node, after which time Ganglia will refresh the cluster node information and then delete this node cleanup_threshold = 300 /*secs */   #gmond清理过期数据的时间gexec  = no             # Whether to use Gexec to tell if the host is available, send_metadata_interval = 0 is not enabled here#在单播协议中, how long the newly added node responds to indicate its presence, and 0 represents only one notification at Gmond boot time,}cluster {name =  "Cluster1" per second (s)        #集群的名称, is the flag that distinguishes this node belonging to a cluster and must match an item name in the monitoring server Data_source owner =  "Junfeng"        #节点的拥有者, which is the node administrator latlong =  "Unspecified"     #节点的坐标, longitude, latitude, etc., generally do not need to specify a URL  =  "Unspecified"              #节点的URL地址, Generally no need to specify} host { location =  "unspecified"   #节点的物理位置, generally no need to specify  }udp_send_channel  {          #udp包的发送通道mcast_join  = 239.2.11.71     #指定发送的多播地址, where 239.2.11.71 is a class D address. If you use unicast mode, you write Host = host1, and you can configure multiple udp_send_channel port = 8649    in unicast mode           #监听端口ttl  = 1}udp_recv_channel {             #接收udp包配置mcast_join &NBSP;=&NBsp;239.2.11.71    #指定接收的多播地址, also 239.2.11.71 this class D address  port = 8649                  #监听端口  bind =  239.2.11.71          #绑定地址}tcp_accept_channel { port  = 8649               # Through the TCP protocol listening port, at the remote can be connected to the 8649 port to get monitoring data}          in a cluster, all the client configuration is the same. After a client configuration is completed, the configuration files are copied to all client hosts within the cluster to complete the configuration of the client host.

5,Ganglia Web-side Configuration

The Ganglia Web monitoring interface is PHP-based , so you need to install a PHP environment.

There are two ways to install Ganglia Web monitoring interface, one is yum http://sourceforge.net/projects/ganglia/ Files/ download ganglia-web program put to apche web The root directory, Here we download the version is ganglia-web-3.7.1

config ganglia The interface is relatively simple, only need to modify a few Php file. The first is conf_default.php conf_default.php renamed Conf.php ganglia< Span style= "font-family: ' The song Body '; > web default first find conf.php, we can't find it. Conf_default.php

$conf [' Gweb_confdir '] =  "/var/www/html/ganglia";   #ganglia  web's root directory $conf[' Gmetad_root ' ] =  "/opt/app/ganglia";          # ganglia program installation directory $ conf[' rrds '] =  "${conf[' gmetad_root ']}/rrds";                 #gangliaweb读取rrd数据库的路径, this is/opt/app/ganglia/rrds$conf[' Dwoo_compiled_dir ')  = "${conf[' Gweb_confdir ']}/dwoo/compiled";     #需要 "777" permission $conf[' Dwoo_cache_dir ']  = "${conf[' gweb_confdir ']}/dwoo/cache";        #需要 "777" permission $conf[' RRDtool ']  = "/opt/rrdtool/bin/rrdtool";   #指定rrdtool的路径 $conf [' Graphdir ']=  $conf [' Gweb_root '] . ' /graph.d ';           #生成图形模板目录 $conf [' ganglia_ip '] = ' 127.0.0.1 ";              #gmetad服务所在服务器的地址 $ conf[' Ganglia_port '] = 8652;            #gmetad服务器的交互式提供监控数据端口发布            Here's what to note: "$conf [' Dwoo_compiled_dir ']" and "$conf [' Dwoo_cache_dir '] "The specified path may not exist by default, so you will need to manually establish the compiled and cache directories and grant permissions to" 777 "under Linux. In addition, the storage directory of the RRD database/opt/app/ganglia/rrds must be guaranteed to be rrdtool writable, so authorization commands need to be executed:          Chown–R nobody:nobody /opt/app/ganglia/rrds           This allows the RRDtool to read the RRD database properly and then display the data through the Web interface. In fact, the configuration of Ganglia-web is relatively simple, once the configuration error will give a hint, according to the error prompt for troubleshooting, generally can find a solution.

Iv. ExpansionGangliamonitoring function1, throughGmetricInterface ExtensionsGangliaMonitoring

Gmetric is a command-line tool for Ganglia it can send data directly to the Gmond node responsible for collecting data , or to all gmond nodes.

in the After the Ganglia installation is complete, the gmetric command is generated in the bin directory . Here is an example of how gmetric is used:

[Email protected] ~]#/opt/app/ganglia/bin/gmetric>-n disk_used-v 40-t int32-u '% test '-D 50-s ' 8.8.8.8:cloud1 ' Where:     -N, which indicates the name of the indicator to monitor.     -V, which represents the value of the monitored metric being written.     -T, which represents the type of monitoring data written to.     -U, which represents the unit of the monitoring data.     -D, which indicates the time to live monitoring metrics.     -C, which specifies the location of the ganglia configuration file. -S, which represents spoofed client information, 8.8.8.8 represents the spoofed client address, and CLOUD1 represents the host name of the monitored host.

2,PythonExtension Plugins

Ready-to-use extensions:

Https://github.com/ganglia/gmond_python_modules

Five,Gangliathe advantages and precautions

1, can easily monitor tens of thousands of servers, data delay within 10s .

2, distributed architecture, expansion, is very suitable for multi-machine room deployment.

3, with Centrenon Seamless integration, realize monitoring, alarm integration.

4. Data storage Disk IO can be a bottleneck, requiring high-performance disks for support.

This article is from the "Love Linux" blog, make sure to keep this source http://ixdba.blog.51cto.com/2895551/1761003

Architecture design and implementation of enterprise operation and maintenance monitoring platform (ganglia)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More