Cluster tools Chukwa and Ganglia

Source: Internet
Author: User
Tags memory usage rrdtool cpu usage mysql database log4j

v/:* {behavior:url (#default #vml);} o/:* {behavior:url (#default #vml);} w/:* {Behavior:url (#default #vml);. Shape { Behavior:url (#default #vml);} Normal 0 7.8 lbs 0 2 false false MicrosoftInternetExplorer4/* Style definitions */table. msonormaltable {mso-style-name: general form; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; Mso-style-parent: ""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; Mso-pagination:widow-orphan; font-size:10.0pt; Font-family: "Times New Roman"; Mso-fareast-font-family: "Times New Roman"; Mso-ansi-language: #0400; Mso-fareast-language: #0400; Mso-bidi-language: #0400;}

As we all know, Hadoop is run in a distributed cluster environment, the same is a lot of users or groups shared clusters, so at any time there will be many users to access NN or JT, the Distributed File system or MapReduce operation, using the machine under the cluster to complete their storage and computing work. When users use Hadoop more and more, it makes it difficult for cluster operators to objectively analyze the current situation and trends of the cluster. For example, the memory of the NN will not be aware of a memory overflow, so it is necessary to use data to derive the current health of Hadoop.

Chukwa is the use of several processes in the cluster output logs, such as NN,DN,JT,TT, such as the process will have log information, because these processes in the program calls log4j provided interface to record the log, and the physical storage of the log is the Log4j.properties Configuration file to be configured, can be written in a local file, or can be written to a database. Chukwa is to control the records of these logs, by the Chukwa program to take over this part of the work, complete the logging and collection work. The Chukwa consists of several components: the agent collects logs for each process and sends the collected logs to collector. Collector collects the data sent by the agent and saves the data to HDFs, the MR job uses MapReduce to analyze the data. Dumptool saves the result download to the MySQL database. HICC shows the data. More information: http://incubator.apache.org/chukwa/


Ganglia is more inclined to the low-level monitoring of the operating system, mainly to collect the CPU usage of each machine in the cluster, memory usage, disk I0, network IO, disk capacity, etc., more like the task Manager of Windows, but it is to manage the distributed cluster machine. Similarly, it is composed of the following components: The data acquisition component, which collects the information every once in a while, then sends the data to the collector, collects the data, and then saves the data to the database, and the last one is called RRDtool to visualize the data graphically. What's more, the ganglia is more versatile, in addition to collecting fixed machine sex, it also provides plug-ins that can be plugged into other processes, such as JAVA programs, and can then collect information about these processes.

More information: http://ganglia.info/

Http://www.javabloger.com/article/j2ee-linux-ganglia-rrdtool-java-mysql-1.html


For an in-depth understanding of the state of the current platform and the functioning of the machines in the cluster, Chukwa and ganglia are undoubtedly good tools to get relevant accurate data, to know the current state of operation, to make decisions for the future, to infer current bottlenecks, and to optimize related applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.