Summary of Hadoop monitoring methods

Last Update:2018-07-20 Source: Internet

Author: User

Tags current time json

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Contact Hadoop has been 1.5, during the accumulation of some hadoop operations experience, has always wanted to develop a Hadoop monitoring system, just recently the laboratory has a related project, took the opportunity to study a bit, here to summarize the Hadoop monitoring method.

The HDFs and Jobtracker monitoring pages that Hadoop itself has been considered to be the best use of the monitoring page are simple and straightforward. But now you want to develop a monitoring system, how to get the current situation of the Hadoop cluster.

Web Crawl
First, the idea is to crawl the Web page, crawling 50030 and 50070 pages to get the monitored data. Have to say, this method is really too earth, not to be the last resort really embarrassed to use.

Hadoop JMX interface
After a variety of lookup, see a god-written document (link: http://slaytanic.blog.51cto.com/2057708/1179108), once again worship the great God. Replace the http://namenode:50070/dfshealth.jsp with the http://namenode:50070/jmx You can see the data in JSON format returned by the JMX interface that comes with Hadoop, and the information is very comprehensive. At the same time, after the link, you can add parameters to get the monitoring information for the specified name, such as Access http://namenode:50070/jmx?qry=hadoop:service=namenode,name=namenodeinfo can only get namenodeinfo information, by changing the parameters after qry=, you can specify what you want to get, the value of the Qry parameter is the content of name in the JSON information.
in the same way, you can get:
Jobtracker information: HTTP://NAMENODE:50030/JMX
Datanode information: HTTP://DATANODE:50075/JMX
Tasktracker Info: http://datanode:50060/jmx
These links basically provide all the information you want to monitor, but I didn't find the job list I wanted, including running jobs, Successful jobs and failed job information.

Hadoop API
When I think of the previous version of the Hadoop API submission job will use the Jobclient this class, take a try attitude, go to the Hadoop API inside half a day, there is really harvest.
directly on the dry goods:

Configuration conf = new configuration ();
Inetsocketaddress inetsocket = new Inetsocketaddress (Monitorutil.gethostnameofnamenode (), 9001);
Jobclient jobclient = new Jobclient (inetsocket, conf);
jobstatus[] Jobsstatus = Jobclient.getalljobs ();
In this way, a jobstatus array is obtained, and a random element named Jobstatus
Jobstatus = Jobsstatus[0] is taken.
JobID JobID = Jobstatus.getjobid (); Get JobID
runningjob runningjob = Jobclient.getjob (JobID) via Jobstatus;  Get Runningjob object runningjob.getjobstate through Jobid
();//can get job status, there are five kinds of status, for Jobstatus.failed, jobstatus.killed, Jobstatus.prep, jobstatus.running, jobstatus.succeeded
jobstatus.getusername ();//You can get the name of the user running the job.
runningjob.getjobname ();//You can get the job name.
jobstatus.getstarttime ();//Can get the start time of the job, in UTC milliseconds.
runningjob.mapprogress ();//Can get the scale of map phase completion, 0~1,
runningjob.reduceprogress ();//You can get the ratio of reduce phase completion.
runningjob.getfailureinfo ();//can get failure information.
runningjob.getcounters ();//You can get job-related counters, and the contents of the counters are the same as the values of the counters you see on the Job monitoring page.

Counter this piece is a little bit of a hassle, for instance. To get the value of Hdfs_bytes_read, the method is:

Runningjob.getcounters (). Getgroup ("Filesystemcounters"). Getcounter ("Hdfs_bytes_read");

The filesystemcounters here is the name of the group and the corresponding group can be obtained with the name as the Getgrout parameter. The name of group is not the same as the names of the groups you see on the 50030 page, and the corresponding relationship is:

Org.apache.hadoop.mapred.jobinprogress$counter	Job Counters
Org.apache.hadoop.mapreduce.lib.output.fileoutputformat$counter	File Output Format Counters
Filesystemcounters	Filesystemcounters
Org.apache.hadoop.mapreduce.lib.input.fileinputformat$counter	File Input Format Counters
Org.apache.hadoop.mapred.task$counter	Map-reduce Framework

The left is the Getgroup function parameter name, followed by the name of the group you want to get.

The corresponding counter value can be obtained by the name of the counter after the group is obtained.

The information available here is comprehensive, but lacks a job run time or the end time of the job. For a running job, it is possible to get run time by subtracting the start time from the current time, but the end time has not yet been found. If you know a friend, you can tell me, thank you.

See a blog online (http://blog.sina.com.cn/s/blog_ 4a1f59bf0100nv03.html), referring to the cluster class provides a richer API interface, this should be required for more than Hadoop2.0 version, because the lab can not upgrade Hadoop so there is no testing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More