Hadoop Learning Notes-7. Counters and custom counters

Source: Internet
Author: User

One, the counters in Hadoop

counter: A counter is used to record the execution progress and status of the job. Its role can be understood as a log . We can usually insert a counter somewhere in the program to record the change in data or progress, which is more convenient to analyze than the log.

For example, we have a file that contains the following:

Hello Youhello Me

It is executed by the WordCount program and displays the following log:

In the shown, the counter has 19, divided into four groups:file Output format Counters,filesystemcounters,file Input format Counters and map-reduce framkework.

The grouping file Input Format Counters includes a counter bytes Read, which indicates that the content of the output file after job execution finishes consists of 19 bytes ( spaces, newline characters ), as shown below.

211

The grouping file Output Format Counters includes a counter bytes written, which indicates that the contents of the files read by the job execution include 19 bytes ( spaces, newline characters ), as shown below.

Hello Youhello Me

For a detailed description of the above counter log, see the following note:

1Counters:19//counter represents a counter, 19 means there are 19 counters (4 counter groups below)2File Output Format Counters//file output Formatting counter group3Bytes written=19//reduce the number of bytes output to HDFs, altogether 19 bytes4Filesystemcounters//File System counter Group5file_bytes_read=4816hdfs_bytes_read=387file_bytes_written=813168Hdfs_bytes_written=199File Input Format Counters//file Input Format counter GroupTenBytes read=19//number of bytes read by map from HDFs OneMap-reduce Framework//MapReduce Framework AMap output materialized bytes=49 -Map input records=2//Map reads the number of record lines, reads two lines of records, "Hello You", "Hello Me" -Reduce Shuffle bytes=0//number of bytes in the protocol partition theSpilled records=8 -Map Output bytes=35 -Total committed heap usage (bytes) =266469376 -split_raw_bytes=105 +Combine input Records=0//number of records entered in the merge -Reduce input records=4//reduce the number of record rows received from the map end +Reduce input groups=3//The number of keys received by the reduce function, that is, the number of K2 after merging ACombine Output Records=0//number of records to merge output atReduce Output records=3//the number of record lines for the reduce output.  -Map Output records=4// number of record rows for map output, output 4 rows of records
Second, user-defined counter

These are the standard counters built into the system in Hadoop. In addition, since different scenarios have different counter application requirements, we can also define our own counters to use.

2.1 Sensitive Word Records-prepare

Now suppose we need to make a statistic of the sensitive words in the file, that is, to make a record of the number of times the sensitive word appears in the file. Here, let's take the following file as an example:

Hello world! Hello Hadoop!

The text content is very simple, here we specify Hello is a sensitive word, it is obvious that there are two times hello, that is, two times sensitive words need to be recorded.

2.2 Sensitive Word Recording-Program

On the basis of the WordCount program, rewrite the map method in the Mapper class to count the number of Hello occurrences, as shown in the following code:

         Public Static classMymapperextendsMapper<longwritable, text, text, longwritable> {        /** @param KEYIN→K1 indicates the starting position of each line (offset) * * @param valuein→v1 represents the text content of each line * * @param KEYOUT→K2 represents each word in each line * * @param valueout→v2 represents the number of occurrences of each word in each line, fixed value 1*/        protected voidmap (longwritable key, Text value, Mapper<longwritable, text, text, longwritable>. Context context)throwsjava.io.IOException, interruptedexception {Counter sensitivecounter= Context.getcounter ("Sensitive Words:", "Hello"); String Line=value.tostring (); //This assumes that Hello is a sensitive word .            if(Line.contains ("Hello") {sensitivecounter.increment (1L); } string[] Spilted= Line.split ("");  for(String word:spilted) {context.write (NewText (Word),NewLongwritable (1L));    }        }; }

We first get the counter object directly from the Mapper.context class. There are two parameters, the first is the name of the counter group, and the second is the name of the counter.

The Contains method of the string class is then used to determine if a Hello sensitive word exists. If there is, enter the conditional judgment block to invoke the increment method of the Counter object.

2.3 Sensitive Word Records-results

By viewing the console log information, you can see the information as shown:

We can clearly see the counter from the original 19 to 20, the extra counter is our custom sensitive word counter, because there are only two hello in the file, so this shows hello=2.

Resources

(1) Suddenly, "Hadoop diary 17-counters, map protocols and partitions": http://www.cnblogs.com/sunddenly/p/4009568.html

(2) Chao Wu, "counters in Hadoop": http://www.superwu.cn/2013/08/14/460

(3) Dajuezhao, "Custom counters in Hadoop": http://blog.csdn.net/dajuezhao/article/details/5788705

(4) Wanchunmei, Xie Zhenglan, "Hadoop Application Development Practical Explanation (Revised version)": http://item.jd.com/11508248.html

Zhou Xurong

Source: http://edisonchou.cnblogs.com/

The copyright of this article is owned by the author and the blog Park, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to give the original link.

Hadoop Learning Notes-7. Counters and custom counters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.