You can insert a pile counter in your Hadoop job to analyze its overall operation. Define different counters in your program to accumulate the number of occurrences of a particular event. For the same counters from all tasks in the same job, Hadoop automatically sums them to reflect the overall job situation. The values of these counters are displayed in the Jobtracker Web user interface with Hadoop's internal counters.
A typical application of a counter is to track different types of input records, especially to track "bad" records. For example, we get a data set format (showing only a subset):
"Patent", "Gyear", "Gdate", "appyear", "Country", "Postate", "ASSIGNEE", "Asscode", "CLAIMS", "NClass", "CAT", "subcat", " Cmade "," creceive "," ratiocit "," General "," ORIGINAL "," Fwdaplag "," Bckgtlag "," Selfctub "," Selfctlb "," SECDUPBD "," SECDLWBD "3070801,1963,1096," be "," ",, 1,,269,6,69,,1,,0,,,,,,,3070802,1963,1096,, "US", "TX",, 1,,2,6,63,,0,,,,,,,,,3070803,1963,1096,, "US", "IL",, 1,,2,6,63,,9,,0.3704,,,,,,,3070804,1963,1096,, "US", "OH",, 1,,2,6,63,,3,,0.6667,,,,,,,3070805,1963,1096,, "US", "CA",, 1,,2,6,63,,1,,0,,,,,,,3070806,1963,1096,, "US", "PA",, 1,,2,6,63,,0,,,,,,,,,3070807,1963,1096,, "US", "OH",, 1,,623,3,39,,3,,0.4444,,,,,,,3070808,1963,1096,, "US", "IA",, 1,,623,3,39,,4,,0.375,,,,,,,3070809,1963,1096,, "US", "AZ",, 1,,4,6,65,,0,,,,,,,,,3070810,1963,1096,, "US", "IL",, 1,,4,6,65,,3,,0.4444,,,,,,,3070811,1963,1096,, "US", "CA",, 1,,4,6,65,,8,,0,,,,,,,3070812,1963,1096,, "US", "LA",, 1,,4,6,65,,3,,0.4444,,,,,,,3070813,1963,1096,, "US", "NY",, 1,,5,6,65,,2,,0,,,,,,,3070814,1963,1096,, "US", "MN",, 2,,267,5,59,,2,,0.5,,,,,,,3070815,1963,1096,, "US", "CO",, 1,,7,5,59,,1,,0,,,,,,,3070816,1963,1096,, "US", "OK", 1,,114,5,55,,4,,0,,,,,,,3070817,1963,1096,, "US", "RI",, 2,,114,5,55,,5,,0.64,,,,,,,3070818,1963,1096,, "US", "in",, 1,,441,6,69,,4,,0.625,,,,,,,3070819,1963,1096,, "US", "TN",, 4,,12,6,63,,0,,,,,,,,,3070820,1963,1096,, "GB", "",, 2,,12,6,63,,0,,,,,,,,,3070821,1963,1096,, "US", "IL",, 2,,15,6,69,,1,,0,,,,,,,3070822,1963,1096,, "US", "NY",, 2,,401,1,12,,4,,0.375,,,,,,,3070823,1963,1096,, "US", "MI",, 1,,401,1,12,,8,,0.6563,,,,,,,3070824,1963,1096,, "US", "IL",, 1,,401,1,12,,5,,0.48,,,,,,,3070825,1963,1096,, "US", "IL",, 1,,401,1,12,,7,,0.6531,,,,,,,3070826,1963,1096,, "US", "IA",, 1,,401,1,12,,1,,0,,,,,,,3070827,1963,1096,, "US", "CA",, 4,,401,1,12,,2,,0.5,,,,,,,3070828,1963,1096,, "US", "CT",, 2,,16,5,59,,4,,0.625,,,,,,,3070829,1963,1096,, "FR", "",, 3,,16,5,59,,5,,0.48,,,,,,,3070830,1963,1096,, "US", "NH",, 2,,16,5,59,,0,,,,,,,,,3070831,1963,1096,, "US", "CT",, 2,,16,5,59,,0,,,,,,,,,3070832,1963,1096,, "US", "LA",, 2,,452,6,61,,1,,0,,,,,,,3070833,1963,1096,, "US", "LA",, 1,,452,6,61,,5,,0,,,,,,,3070834,1963,1096,, "US", "FL",, 1,,452,6,61,,3,,0.4444,,,,,,,3070835,1963,1096,, "US", "IL",, 2,,264,5,51,,5,,0.64,,,,,,,3070836,1963,1096,, "US", "OK", 2,,264,5,51,,24,,0.7569,,,,,,,3070837,1963,1096,, "CH", "",, 3,,264,5,51,,7,,0.6122,,,,,,,3070838,1963,1096,, "CH", "",, 5,,425,5,51,,5,,0.48,,,,,,,3070839,1963,1096,, "US", "TN",, 2,,425,5,51,,8,,0.4063,,,,,,,3070840,1963,1096,, "GB", "",, 3,,425,5,51,,6,,0.7778,,,,,,,3070841,1963,1096,, "US", "OH",, 2,,264,5,51,,6,,0.8333,,,,,,,3070842,1963,1096,, "US", "TX",, 1,,425,5,51,,1,,0,,,,,,,3070843,1963,1096,, "US", "NY",, 2,,425,5,51,,1,,0,,,,,,,3070844,1963,1096,, "US", "OH",, 2,,425,5,51,,2,,0,,,,,,,3070845,1963,1096,, "US", "IL",, 1,,52,6,69,,3,,0,,,,,,,3070846,1963,1096,, "US", "NY",, 2,,425,5,51,,9,,0.7407,,,,,,,
We want to calculate the average number of patent claims for each country, but there are no declarations in many records. Our program ignores these records and it is useful to know the number of ignored records. In addition to satisfying our curiosity, this pile of piles allows us to understand the operation of the program and to check its correctness.
The counter is used by the Reporter.incrcounter () method. The Reporter object is passed to the map () and reduce () methods. Call Incrcounter () with the counter name and increment as the parameter. Each of the different events has a separate named counter. When the Incrcounter () is called with a new counter name, the counter is initialized and the value is incremented.
The Reporter.incrcounter () method has two types of signatures:
Public void Long amount) Public void Long amount)
The following is a code snippet that calculates the average number of patent declarations per country after using the counter:
Packagehadoop.in.action;Importjava.io.IOException;ImportJava.util.Iterator;Importorg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.DoubleWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapred.FileInputFormat;ImportOrg.apache.hadoop.mapred.FileOutputFormat;Importorg.apache.hadoop.mapred.JobClient;Importorg.apache.hadoop.mapred.JobConf;Importorg.apache.hadoop.mapred.MapReduceBase;ImportOrg.apache.hadoop.mapred.Mapper;ImportOrg.apache.hadoop.mapred.OutputCollector;ImportOrg.apache.hadoop.mapred.Reducer;ImportOrg.apache.hadoop.mapred.Reporter;ImportOrg.apache.hadoop.mapred.RunningJob;ImportOrg.apache.hadoop.mapred.TextInputFormat;ImportOrg.apache.hadoop.mapred.TextOutputFormat; Public classAveragebyattribute { Public Static classMapclassextendsMapreducebaseImplementsMapper<longwritable, text, text, text> { Static enumclaimscounters {MISSING, QUOTED}; PrivateText k =NewText (); PrivateText v =NewText (); @Override Public voidmap (longwritable key, Text value, Outputcollector<text, text>output, Reporter Reporter)throwsIOException {string[] fields= Value.tostring (). Split (",",-1); String Country= Fields[4]; String Numclaims= Fields[8]; if(numclaims.length () = = 0) {reporter.incrcounter (claimscounters.missing,1); } Else { if(Numclaims.startswith ("\" ") {reporter.incrcounter (claimscounters.quoted,1); } Else{k.set (country); V.set (Numclaims+ ", 1"); Output.collect (k, v); } } } } Public Static classCombineclassextendsMapreducebaseImplementsReducer<text, text, text, text> { PrivateText v =NewText (); @Override Public voidReduce (Text key, iterator<text>values, Outputcollector<text, text>output, Reporter Reporter)throwsIOException {intCount = 0; Doublesum = 0; while(Values.hasnext ()) {string[] fields= Values.next (). toString (). Split (","); Sum+ = Double.parsedouble (fields[0]); Count+ = Integer.parseint (fields[1]); V.set (Sum+ "," +count); Output.collect (key, V); } } } Public Static classReduceclassextendsMapreducebaseImplementsReducer<text, text, text, doublewritable> { PrivateDoublewritable v =Newdoublewritable (); @Override Public voidReduce (Text key, iterator<text>values, Outputcollector<text, doublewritable>output, Reporter Reporter)throwsIOException {intCount = 0; Doublesum = 0; while(Values.hasnext ()) {string[] fields= Values.next (). toString (). Split (","); Sum+ = Double.parsedouble (fields[0]); Count+ = Integer.parseint (fields[1]); } v.set ((Double) Sum/count); Output.collect (key, V); } } Public Static voidRun ()throwsIOException {Configuration Configuration=NewConfiguration (); jobconf jobconf=Newjobconf (Configuration, Averagebyattribute.class); String input= "Hdfs://localhost:9000/user/hadoop/input/apat63_99.txt"; String Output= "Hdfs://localhost:9000/user/hadoop/output"; //Hdfsdao Hdfsdao = new Hdfsdao (configuration); //HDFSDAO.RMR (output);fileinputformat.setinputpaths (jobconf,NewPath (input)); Fileoutputformat.setoutputpath (jobconf,NewPath (output)); Jobconf.setinputformat (Textinputformat.class); Jobconf.setoutputformat (Textoutputformat.class); Jobconf.setmapoutputkeyclass (Text.class); Jobconf.setmapoutputvalueclass (Text.class); Jobconf.setoutputkeyclass (Text.class); Jobconf.setoutputvalueclass (doublewritable.class); Jobconf.setmapperclass (Mapclass.class); Jobconf.setcombinerclass (Combineclass.class); Jobconf.setreducerclass (Reduceclass.class); Runningjob Runningjob=jobclient.runjob (jobconf); while(!Runningjob.iscomplete ()) {runningjob.waitforcompletion (); } } Public Static voidMain (string[] args)throwsIOException {run (); }}
After the program runs, you can see that the defined counters and the internal Hadoop counters are displayed in the Jobtracker Web user interface:
Monitoring of Hadoop production clusters-counters