Summary of Hadoop monitoring methods

Source: Internet
Author: User
Tags current time json

Contact Hadoop has been 1.5, during the accumulation of some hadoop operations experience, has always wanted to develop a Hadoop monitoring system, just recently the laboratory has a related project, took the opportunity to study a bit, here to summarize the Hadoop monitoring method.

The HDFs and Jobtracker monitoring pages that Hadoop itself has been considered to be the best use of the monitoring page are simple and straightforward. But now you want to develop a monitoring system, how to get the current situation of the Hadoop cluster.

Web Crawl
First, the idea is to crawl the Web page, crawling 50030 and 50070 pages to get the monitored data. Have to say, this method is really too earth, not to be the last resort really embarrassed to use.

Hadoop JMX interface
After a variety of lookup, see a god-written document (link: http://slaytanic.blog.51cto.com/2057708/1179108), once again worship the great God. Replace the http://namenode:50070/dfshealth.jsp  with the  http://namenode:50070/jmx  You can see the data in JSON format returned by the JMX interface that comes with Hadoop, and the information is very comprehensive. At the same time, after the link, you can add parameters to get the monitoring information for the specified name, such as Access  http://namenode:50070/jmx?qry=hadoop:service=namenode,name=namenodeinfo   can only get namenodeinfo information, by changing the parameters after qry=, you can specify what you want to get, the value of the Qry parameter is the content of name in the JSON information.
in the same way, you can get:
Jobtracker information: HTTP://NAMENODE:50030/JMX
Datanode information: HTTP://DATANODE:50075/JMX
Tasktracker Info: http://datanode:50060/jmx
These links basically provide all the information you want to monitor, but I didn't find the job list I wanted, including running jobs, Successful jobs and failed job information.

Hadoop API
When I think of the previous version of the Hadoop API submission job will use the Jobclient this class, take a try attitude, go to the Hadoop API inside half a day, there is really harvest.
directly on the dry goods:

Configuration conf = new configuration ();
Inetsocketaddress inetsocket = new Inetsocketaddress (Monitorutil.gethostnameofnamenode (), 9001);
Jobclient jobclient = new Jobclient (inetsocket, conf);
jobstatus[] Jobsstatus = Jobclient.getalljobs ();
In this way, a jobstatus array is obtained, and a random element named Jobstatus
Jobstatus = Jobsstatus[0] is taken.
JobID JobID = Jobstatus.getjobid (); Get JobID
runningjob runningjob = Jobclient.getjob (JobID) via Jobstatus;  Get Runningjob object runningjob.getjobstate through Jobid
();//can get job status, there are five kinds of status, for Jobstatus.failed, jobstatus.killed, Jobstatus.prep, jobstatus.running, jobstatus.succeeded
jobstatus.getusername ();//You can get the name of the user running the job.
runningjob.getjobname ();//You can get the job name.
jobstatus.getstarttime ();//Can get the start time of the job, in UTC milliseconds.
runningjob.mapprogress ();//Can get the scale of map phase completion, 0~1,
runningjob.reduceprogress ();//You can get the ratio of reduce phase completion.
runningjob.getfailureinfo ();//can get failure information.
runningjob.getcounters ();//You can get job-related counters, and the contents of the counters are the same as the values of the counters you see on the Job monitoring page.
Counter this piece is a little bit of a hassle, for instance. To get the value of Hdfs_bytes_read, the method is:
Runningjob.getcounters (). Getgroup ("Filesystemcounters"). Getcounter ("Hdfs_bytes_read");
The filesystemcounters here is the name of the group and the corresponding group can be obtained with the name as the Getgrout parameter. The name of group is not the same as the names of the groups you see on the 50030 page, and the corresponding relationship is:
Org.apache.hadoop.mapred.jobinprogress$counter Job Counters
Org.apache.hadoop.mapreduce.lib.output.fileoutputformat$counter File Output Format Counters
Filesystemcounters Filesystemcounters
Org.apache.hadoop.mapreduce.lib.input.fileinputformat$counter File Input Format Counters
Org.apache.hadoop.mapred.task$counter Map-reduce Framework

The left is the Getgroup function parameter name, followed by the name of the group you want to get.

The corresponding counter value can be obtained by the name of the counter after the group is obtained.

The information available here is comprehensive, but lacks a job run time or the end time of the job. For a running job, it is possible to get run time by subtracting the start time from the current time, but the end time has not yet been found. If you know a friend, you can tell me, thank you.

See a blog online (http://blog.sina.com.cn/s/blog_ 4a1f59bf0100nv03.html), referring to the cluster class provides a richer API interface, this should be required for more than Hadoop2.0 version, because the lab can not upgrade Hadoop so there is no testing.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.