Monitoring of Hadoop production clusters-counters

Source: Internet
Author: User

You can insert a pile counter in your Hadoop job to analyze its overall operation. Define different counters in your program to accumulate the number of occurrences of a particular event. For the same counters from all tasks in the same job, Hadoop automatically sums them to reflect the overall job situation. The values of these counters are displayed in the Jobtracker Web user interface with Hadoop's internal counters.

A typical application of a counter is to track different types of input records, especially to track "bad" records. For example, we get a data set format (showing only a subset):

"Patent", "Gyear", "Gdate", "appyear", "Country", "Postate", "ASSIGNEE", "Asscode", "CLAIMS", "NClass", "CAT", "subcat", " Cmade "," creceive "," ratiocit "," General "," ORIGINAL "," Fwdaplag "," Bckgtlag "," Selfctub "," Selfctlb "," SECDUPBD "," SECDLWBD "3070801,1963,1096," be "," ",, 1,,269,6,69,,1,,0,,,,,,,3070802,1963,1096,, "US", "TX",, 1,,2,6,63,,0,,,,,,,,,3070803,1963,1096,, "US", "IL",, 1,,2,6,63,,9,,0.3704,,,,,,,3070804,1963,1096,, "US", "OH",, 1,,2,6,63,,3,,0.6667,,,,,,,3070805,1963,1096,, "US", "CA",, 1,,2,6,63,,1,,0,,,,,,,3070806,1963,1096,, "US", "PA",, 1,,2,6,63,,0,,,,,,,,,3070807,1963,1096,, "US", "OH",, 1,,623,3,39,,3,,0.4444,,,,,,,3070808,1963,1096,, "US", "IA",, 1,,623,3,39,,4,,0.375,,,,,,,3070809,1963,1096,, "US", "AZ",, 1,,4,6,65,,0,,,,,,,,,3070810,1963,1096,, "US", "IL",, 1,,4,6,65,,3,,0.4444,,,,,,,3070811,1963,1096,, "US", "CA",, 1,,4,6,65,,8,,0,,,,,,,3070812,1963,1096,, "US", "LA",, 1,,4,6,65,,3,,0.4444,,,,,,,3070813,1963,1096,, "US", "NY",, 1,,5,6,65,,2,,0,,,,,,,3070814,1963,1096,, "US", "MN",, 2,,267,5,59,,2,,0.5,,,,,,,3070815,1963,1096,, "US", "CO",, 1,,7,5,59,,1,,0,,,,,,,3070816,1963,1096,, "US", "OK", 1,,114,5,55,,4,,0,,,,,,,3070817,1963,1096,, "US", "RI",, 2,,114,5,55,,5,,0.64,,,,,,,3070818,1963,1096,, "US", "in",, 1,,441,6,69,,4,,0.625,,,,,,,3070819,1963,1096,, "US", "TN",, 4,,12,6,63,,0,,,,,,,,,3070820,1963,1096,, "GB", "",, 2,,12,6,63,,0,,,,,,,,,3070821,1963,1096,, "US", "IL",, 2,,15,6,69,,1,,0,,,,,,,3070822,1963,1096,, "US", "NY",, 2,,401,1,12,,4,,0.375,,,,,,,3070823,1963,1096,, "US", "MI",, 1,,401,1,12,,8,,0.6563,,,,,,,3070824,1963,1096,, "US", "IL",, 1,,401,1,12,,5,,0.48,,,,,,,3070825,1963,1096,, "US", "IL",, 1,,401,1,12,,7,,0.6531,,,,,,,3070826,1963,1096,, "US", "IA",, 1,,401,1,12,,1,,0,,,,,,,3070827,1963,1096,, "US", "CA",, 4,,401,1,12,,2,,0.5,,,,,,,3070828,1963,1096,, "US", "CT",, 2,,16,5,59,,4,,0.625,,,,,,,3070829,1963,1096,, "FR", "",, 3,,16,5,59,,5,,0.48,,,,,,,3070830,1963,1096,, "US", "NH",, 2,,16,5,59,,0,,,,,,,,,3070831,1963,1096,, "US", "CT",, 2,,16,5,59,,0,,,,,,,,,3070832,1963,1096,, "US", "LA",, 2,,452,6,61,,1,,0,,,,,,,3070833,1963,1096,, "US", "LA",, 1,,452,6,61,,5,,0,,,,,,,3070834,1963,1096,, "US", "FL",, 1,,452,6,61,,3,,0.4444,,,,,,,3070835,1963,1096,, "US", "IL",, 2,,264,5,51,,5,,0.64,,,,,,,3070836,1963,1096,, "US", "OK", 2,,264,5,51,,24,,0.7569,,,,,,,3070837,1963,1096,, "CH", "",, 3,,264,5,51,,7,,0.6122,,,,,,,3070838,1963,1096,, "CH", "",, 5,,425,5,51,,5,,0.48,,,,,,,3070839,1963,1096,, "US", "TN",, 2,,425,5,51,,8,,0.4063,,,,,,,3070840,1963,1096,, "GB", "",, 3,,425,5,51,,6,,0.7778,,,,,,,3070841,1963,1096,, "US", "OH",, 2,,264,5,51,,6,,0.8333,,,,,,,3070842,1963,1096,, "US", "TX",, 1,,425,5,51,,1,,0,,,,,,,3070843,1963,1096,, "US", "NY",, 2,,425,5,51,,1,,0,,,,,,,3070844,1963,1096,, "US", "OH",, 2,,425,5,51,,2,,0,,,,,,,3070845,1963,1096,, "US", "IL",, 1,,52,6,69,,3,,0,,,,,,,3070846,1963,1096,, "US", "NY",, 2,,425,5,51,,9,,0.7407,,,,,,,

We want to calculate the average number of patent claims for each country, but there are no declarations in many records. Our program ignores these records and it is useful to know the number of ignored records. In addition to satisfying our curiosity, this pile of piles allows us to understand the operation of the program and to check its correctness.

The counter is used by the Reporter.incrcounter () method. The Reporter object is passed to the map () and reduce () methods. Call Incrcounter () with the counter name and increment as the parameter. Each of the different events has a separate named counter. When the Incrcounter () is called with a new counter name, the counter is initialized and the value is incremented.

  The Reporter.incrcounter () method has two types of signatures:

 Public void Long amount)  Public void Long amount)

The following is a code snippet that calculates the average number of patent declarations per country after using the counter:

 Packagehadoop.in.action;Importjava.io.IOException;ImportJava.util.Iterator;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.DoubleWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapred.FileInputFormat;ImportOrg.apache.hadoop.mapred.FileOutputFormat;Importorg.apache.hadoop.mapred.JobClient;Importorg.apache.hadoop.mapred.JobConf;Importorg.apache.hadoop.mapred.MapReduceBase;ImportOrg.apache.hadoop.mapred.Mapper;ImportOrg.apache.hadoop.mapred.OutputCollector;ImportOrg.apache.hadoop.mapred.Reducer;ImportOrg.apache.hadoop.mapred.Reporter;ImportOrg.apache.hadoop.mapred.RunningJob;ImportOrg.apache.hadoop.mapred.TextInputFormat;ImportOrg.apache.hadoop.mapred.TextOutputFormat; Public classAveragebyattribute { Public Static classMapclassextendsMapreducebaseImplementsMapper<longwritable, text, text, text> {        Static enumclaimscounters {MISSING, QUOTED}; PrivateText k =NewText (); PrivateText v =NewText (); @Override Public voidmap (longwritable key, Text value, Outputcollector<text, text>output, Reporter Reporter)throwsIOException {string[] fields= Value.tostring (). Split (",",-1); String Country= Fields[4]; String Numclaims= Fields[8]; if(numclaims.length () = = 0) {reporter.incrcounter (claimscounters.missing,1); } Else {                if(Numclaims.startswith ("\" ") {reporter.incrcounter (claimscounters.quoted,1); } Else{k.set (country); V.set (Numclaims+ ", 1");                Output.collect (k, v); }            }        }    }     Public Static classCombineclassextendsMapreducebaseImplementsReducer<text, text, text, text> {        PrivateText v =NewText (); @Override Public voidReduce (Text key, iterator<text>values, Outputcollector<text, text>output, Reporter Reporter)throwsIOException {intCount = 0; Doublesum = 0;  while(Values.hasnext ()) {string[] fields= Values.next (). toString (). Split (","); Sum+ = Double.parsedouble (fields[0]); Count+ = Integer.parseint (fields[1]); V.set (Sum+ "," +count);            Output.collect (key, V); }        }    }     Public Static classReduceclassextendsMapreducebaseImplementsReducer<text, text, text, doublewritable> {        PrivateDoublewritable v =Newdoublewritable (); @Override Public voidReduce (Text key, iterator<text>values, Outputcollector<text, doublewritable>output, Reporter Reporter)throwsIOException {intCount = 0; Doublesum = 0;  while(Values.hasnext ()) {string[] fields= Values.next (). toString (). Split (","); Sum+ = Double.parsedouble (fields[0]); Count+ = Integer.parseint (fields[1]); } v.set ((Double) Sum/count);        Output.collect (key, V); }    }     Public Static voidRun ()throwsIOException {Configuration Configuration=NewConfiguration (); jobconf jobconf=Newjobconf (Configuration, Averagebyattribute.class); String input= "Hdfs://localhost:9000/user/hadoop/input/apat63_99.txt"; String Output= "Hdfs://localhost:9000/user/hadoop/output"; //Hdfsdao Hdfsdao = new Hdfsdao (configuration); //HDFSDAO.RMR (output);fileinputformat.setinputpaths (jobconf,NewPath (input)); Fileoutputformat.setoutputpath (jobconf,NewPath (output)); Jobconf.setinputformat (Textinputformat.class); Jobconf.setoutputformat (Textoutputformat.class); Jobconf.setmapoutputkeyclass (Text.class); Jobconf.setmapoutputvalueclass (Text.class); Jobconf.setoutputkeyclass (Text.class); Jobconf.setoutputvalueclass (doublewritable.class); Jobconf.setmapperclass (Mapclass.class); Jobconf.setcombinerclass (Combineclass.class); Jobconf.setreducerclass (Reduceclass.class); Runningjob Runningjob=jobclient.runjob (jobconf);  while(!Runningjob.iscomplete ())        {runningjob.waitforcompletion (); }    }     Public Static voidMain (string[] args)throwsIOException {run (); }}

After the program runs, you can see that the defined counters and the internal Hadoop counters are displayed in the Jobtracker Web user interface:

Monitoring of Hadoop production clusters-counters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.