How ganglia works

Last Update:2014-07-10 Source: Internet

Author: User

Tags rrd rrdtool

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The ganglia project was initiated by the University of California and has become a widely used cluster monitoring software. Monitors and displays various status information of nodes in the cluster, such as CPU, mem, hard disk utilization, I/O load, and network traffic, historical data can also be displayed on the PHP page in a curve. At the same time, it has good scalability and allows users to add the status information they want to monitor.

Ganglia Overall Structure

Ganglia includes the following programs, which transmit monitoring data through xdl (XML compression format) or XML format to achieve monitoring effect. Nodes in the cluster run gmond to collect and publish node state information. Then gmetad periodically polls the information collected by gmond and stores the information in the RRD database, the Web server can query and display it.
The gmetad program periodically collects data from each cluster from each datasource and updates it to the RRD database. It can be understood as a server.
Gmond collects local monitoring data and sends it to other machines to collect monitoring data from other machines. gmond communicates with each other through UDP and transmits the file format to xdl. The collected data is read by gmetad. The default listening port is port 8649. After listening to gmetad requests, the system sends XML files. It can be understood as a client.
Web Front-end is a Web-based monitoring interface, which is usually installed on the same node as gmetad (you also need to confirm whether it can be not on one node, because ms in the PHP configuration file can configure the gmetad address and port), It retrieves data from gmetad, reads the RRD database, generates an image, and displays it.
As shown in, gmetad periodically removes poll data from gmond nodes or gmetad nodes. One gmetad can set multiple datasource. Each datasource can have multiple backups. If one failure occurs, data can be retrieved from other hosts.
If the muticast mode is used, gmond transmits data to each other through multicast. Gmond has UDP send and Recv channels and a TCP Recv channel. The UDP channel is used to send or receive data to other gmond nodes, and TCP is used to export the XML file, mainly to accept requests from gmetad. Gmetad only has a TCP channel. On the one hand, it sends a request to datasource, and on the other hand, it uses a TCP port to publish its own XML file. Port 8651 is used by default. Therefore, gmetad can obtain XML data from gmond or other gmetad.
The following figure shows the internal modules of the gmond node:

Figure 2 gmond node module structure

As shown in, it consists of three modules: collect and publish. This module periodically calls some internal commands to obtain metric data and then publishes the data to other gmond nodes through UDP channels. Listen threads: Listen to the UDP data sent by other gmond nodes and store the data in the memory. XML export thread is responsible for releasing data in XML format, for example, to gmetad.
The following describes the data streams in the ganglia System in Unicast mode.

Figure 3 data streams between cluster nodes in Unicast Condition

As shown in, multiple gmond nodes send data to the gmond of the unicast target host through UDP, gmetad then requests the XML file to the gmond of the target host, and then stores it in the rrdtool database. In the unicast mode, the components in the box in the figure are usually the same node in the cluster. This node collects and stores the status information of each monitored node.

2.2 custom metrics
There are two methods to add custom metric to ganglia: one is to run gmetric through command lines, and the other is to use the C and Python extension modules provided by ganglia, added support for custom modules.
2.3 advantages and possible problems 2.3.1 advantages
N Automatic Data Collection
The information of each node in the group can be automatically collected by the ganglia system, which is independently collected. Its communication performance is well designed and optimized. The specific mechanism is to periodically send the information to gmond so that the information is added to the ganglia monitoring system. The Monitoring Mechanism of ganglia is used to collect and display monitoring data. For details about the mechanism of the ganglia system, refer to 2.1ganglia.
N graphical interface
Data can be displayed in graphs. Log on to the web server. You can view the status curves of clusters and individual nodes in this view. It also has a basic sorting mechanism, which can be sorted by values in descending or ascending order. You can view the status curves of the time periods such as one day, one week, and one year in the past one hour.
N database rrdtool stores historical data
Because RRD is used to store data, we can not only view the current status, but also view the previous status history, at the same time, metrics can be realized in a curve over time. However, it is difficult to save and conveniently view previous history records by writing logs to files separately. In addition, the log file may be large. Rrdtool has the following advantages:
1) In addition to data storage, it has tools for creating images;
2) its database file size is fixed. When new data is added to the end of existing data, data writing starts from the file, round Robin means this;
3) A general database can only store the data itself, while RRD can store changes relative to previous data.
4) generally, the database is updated only when data is provided, and RRD is updated at each preset time interval. Time stamp is also stored during each update.
2.3.2 possible problems and bottlenecks
N overhead estimation: network I/O CPU
The node overhead of running the gmond process is very small. It usually takes about 1 MB of memory and the CPU is less than 1%. At the same time, gmond only saves data in the memory, so the IO overhead can be ignored. At the same time, the network pressure on unicast information to other nodes is not great. Therefore, for nodes that only run gmond, the overhead is very small. If the unicast mode is adopted, the main overhead will be the network overhead brought by the gmond process on each node to the UDP data sent to the central node. In addition, gmond and gmetad communication, the Web Service is also performed on this central node. The main bottleneck is on the central node, and the network I/O CPU is under great pressure.
For the network, the central node will receive a UDP packet from all other nodes. If a node sends 10 packets per second, 500 nodes will send 5000 packets, each containing 200 bytes, 1 MB in size. The CPU usage required for processing 5000 packets also increases.
For the memory, each state information stored in the memory consumes about 300 bytes. If a job has 0.1 million instances, each instance has 10 more States to monitor, it will consume 10000*10*300 = 30 mb of memory, and the corresponding XML file size should be 10 MB.
For Io, gmetad retrieves XML data from gmond every 15 seconds by default. If both gmond and gmetad are on the same node, this is equivalent to local IO requests. At the same time, after gmetad requests the XML file, it also needs to be parsed, that is, according to the default settings every 15 seconds to parse a 10 m level XML file, so that the CPU pressure will be great. At the same time, it also writes data to the RRD database. It also processes resolution requests from the Web Client and reads the RRD database. In this way, the I/O CPU network is under great pressure. Therefore, this node should be at least idle and capable.
N gmetad RRD write bottleneck
Note that the gmetad daemon uses rrdtool to store the RRD data in a sub-directory under the/var/lib/ganglia/rrds/directory, if there are more than 100 cluster nodes, you may need to place this directory on the RAM file system because the disk I/O of this database will be very high. Due to the unique storage method of RRD, it stores one file for each Metric. If multiple sampling frequencies are configured, it also saves a separate file for each sampling frequency. This means that gmetad saves the metric value to the RRD data warehouse, which is for Io of a large number of small files. Assume that the cluster has 300 nodes and each node has 50 metric, this means that gmetad will record 15000 metric. If these metric are updated once a second, it means 15000 random write operations per second. Generally, the hard disk is not supported.
One possible solution is to divide the nodes in the cluster into multiple Subsets and configure a central collection node for each subset. However, this will make the deployment and result viewing inconvenient. In addition, rrdcached can be used to alleviate the problem of using rrdtool in gmetad. It will cache these writes and update them in batches. In addition, the sampling frequency of metric is reduced, the number of metrics is reduced, and the Write Request volume is minimized. If the machine has multiple disks, try to use multiple disks to store RRD data. In addition, we can load the RRD directory as tmpfs as mentioned above.
N used services, ports, and dependent Libraries
The gmond process of ganglia uses UDP for unicast. The default port is 8649, And the TCP monitoring port 8651 8652 is also used. These ports must be opened inside the cluster, these ports can be configured. In addition, Apache also needs a port to provide services. This port will be accessed from outside. The default port is 80.
N the same metirc of different processes of the same host may be confused.
Ganglia distinguishes different state parameters based on host + metric_name, that is, it cannot distinguish state variables with the same name of different processes in the same host. However, although a single State may be the status of multiple processes, only one name can be seen for it, therefore, when multiple processes report values with the same name at the same time, they cannot distinguish between processes. To distinguish them, you need to add a naming mechanism to distinguish them.
After the process is completed, the custom metric corresponding to the program will not disappear, which means that although the program is running, we can still view its history. On the other hand, this will also bring new questions. Due to the metric naming mechanism we adopt, Metric will accumulate a lot, and XML will become larger and larger, this increases the pressure on the central node to parse the file, and is not checked. Currently, a feasible method is to modify the gmetad configuration file to reduce the data retention time settings.

========================================================== ========================================================== ========================================================== ====================================

Ganglia configuration file:
Globals segment: global configuration of gmond, which generally does not need to be modified
Cluster section: This section is an important section. At least the name variable must be defined. All nodes with the same name are considered to be in the same cluster. Other variables are descriptions of this cluster.
Host segment: there is only one variable, location, which is a description of this node.
Udp_send_channel: this is also an important section. You can define multiple such segments. However, if you only have one cluster in a LAN, this section can also work well by default. By default, ganglia uses multicast to send monitoring data. The mcast_join variable specifies multicast groups. However, not all environments are suitable for multicasting. For example, my environment is not suitable for multicasting. Fortunately, ganglia also supports unicast, the host variable specifies that a gmond Server accepts monitoring data. It is worth noting that mcast_join and host cannot appear in a channel at the same time. Port: Specifies the port number. TTL indicates that the hop count is generally 1. If you need to pass through gmond, You need to modify the hop count.
Udp_recv_channel segment: it corresponds to the udp_send_channel segment. You can also set multiple segments. If you use the unicast method, pay attention to the IP address pointed out by the Bind Variable. All your other gmond can be accessed. In fact, you can use 0.0.0.0
Tcp_accept_channel: Specifies a port. You can use TCP to read monitoring data in XML format from this port.
Modules and collection_group:
This modules area contains the configuration data loaded for each module. It should contain one or more module subnodes. Each module sub-node contains the name of the measurement module, the language in it, and other parameter descriptions: The name corresponds to the name of the created module (. (End of Py) language unless you write your module in C/C ++, you must explicitly declare the language used by the module. Declare 'python' as your language and tell gmond to search for your module files in the python_modules directory. Each Param subnode of param has a name and a value. They form a name/value pair and are passed as parameters to the metric_init () function described above. This parameter is of the dictionary type. 'name' is the key and 'value' is the value. Therefore, you can customize your parameters as follows:
Collection_group
The remaining parts of the configuration file have the same format: collection_group or metric. Reading the help document of gmond. conf is very rewarding, but we will briefly introduce the collection_group command in the example. Collect_every or collect_oncecollect_every tells gmond how often (in seconds) to collect data from measurements defined in collection_group ). In this example, the 'temp 'metric is collected every 10 seconds. You can also set collect_once = Yes command gmond to collect static measurements, which will be collected once at gmond startup. This is useful for things that will not change during running (such as the number of CPUs running) time_threshold reports the measurement data to ganglia at its maximum frequency (in seconds ). In this example, the temp module reports at least once every 50 seconds. This command will be discarded when the collected metric value is greater than the 'value _ threshold 'defined by metirc. Metric: This is where you define special metric settings.
Name: The name of a special metric. It is also defined in the descriptor dictionary type in your module.
Title: an optional friendly metric name, which will be displayed at the ganglia foreground.
Value_threshold: if the value of the collected metric report (unit defined in your descriptor) exceeds the value defined here, then it will report to ganglia and ignore the 'Time _ threshold 'parameter defined in collection_group.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More