Preface
The project requires statistics on the computer resources used by each business group, such as CPU, memory, Io read/write, and network traffic. Read the source code to view the default counter of hadoop.
Mapreduce counter can observe some detailed data during mapreduce job runtime,Counter has the concept of "group Group", which is used to represent all values in the same logical range.
CPU
How can we measure the computing workload of mapreduce tasks? According to the running time of the tasks, most of the time of some tasks may be stuck in the last reduce, or there is a resource preemption problem during running, the running time is high. The number of MAP and reduce tasks is inaccurate because some map and reduce tasks process a small amount of data and the running time is short.
The CPU time used to run a hadoop task is used to measure the computing workload of the task. The counter provided by hadoop :"Map-Reduce framework:CPU time spent (MS) ", is the CPU time consumed by the task running, how the CPU time is counted, is hadoop during running, each taskFrom/Proc/<pid>/STAT reads the user CPU time and kernel CPU time of the corresponding process. Their sum is the CPU time.
Appendix: source code for the task to obtain the CPU time: Org. apache. hadoop. mapred. task. updateresourcecounters --> Org. apache. hadoop. util. linuxresourcecalculatorplugin. getprocresourcevalues (obtain CPU and memory resources) --> Org. apache. hadoop. util. procfsbasedprocesstree. getprocesstree.
Memory
Hadoop's default counter obtains memory information, which has the following parameters:
"Map-Reduce framework:Physical memory (bytes) Snapshot"Each task reads the memory snapshot of the corresponding process from/proc/<pid>/STAT, which is the current physical memory usage of the process.
"Map-Reduce framework:Virtual Memory (bytes) Snapshot"Each task reads the virtual memory snapshot of the corresponding process from/proc/<pid>/STAT, which is the current virtual memory usage of the process.
"Map-Reduce framework: total committed heap usage (bytes)" The JVM of each task calls runtime. getruntime (). totalmemory () to get the current JVM heap size.
Attachment: source code for the task to obtain the memory: org. Apache. hadoop. mapred. task. updateresourcecounters
Io read/write
Hadoop reads and writes files using Org. apache. hadoop. FS. filesystem. open a file. If it is an HDFS file, there is a File URL starting with HDFS: //. If it is a local file, it is the File URL starting with file. Therefore, the file read/write status of each task can be obtained from filesystem. getallstatistics (), while hadoop usesFilesystemcounters records all the I/O reads and writes of filesystem. The filesystemcounters analysis is as follows:
"Filesystemcounters:Hdfs_bytes_read"During job execution, data is read from HDFS only when the map side is running. This data is not limited to the source file content, but also includes all the split metadata of the map. Therefore, this value should be slightly larger than fileinputformatcounters. bytes_read.
"Filesystemcounters: hdfs_bytes_written" the total data size written to HDFS during job execution. After reduce is executed, it will be written to HDFS (only map exists and there is no reduce, in this case, the result is written to HDFS after the map is executed ).
"Filesystemcounters: file_bytes_read" indicates the total size of file data read from the local disk. The map and reduce nodes are sorted. Local files must be read and written during sorting.
"Filesystemcounters: file_bytes_written" indicates the total size of the file data written to the local disk. The map and reduce nodes are sorted. Local files need to be read and written during sorting, and when the reduce node is shuffle, data needs to be pulled from the map end, and data is also written to a local disk file.
Attachment: filesystemcounters relatedCode: Org. Apache. hadoop. mapred. task. updateresourcecounters --> org. Apache. hadoop. mapred. task. filesystemstatisticupdater. updatecounters
The counter of filesystemcounters has a complete set of Io read/write data, but hadoop also has some minor Io read/write counter:
"File input format counters:Bytes read"During job execution,The size of the input split source file read by the map end from HDFS, but does not include the split metadata of the map. Therefore, this value is slightly smaller than "filesystemcounters: hdfs_bytes_read", but it is very similar. If the source file input by map is a compressed file, its value is only the size before the compressed file is decompressed (Attachment: the code is located at org. Apache. hadoop. mapred. maptask. trackedrecordreader. fileinputbytecounter.).
"Map-Reduce framework:Map input bytes"During job execution, the size of the input split source file read by the map end from HDFS. If the source file is a compressed file, its value is the size after the compressed file is decompressed (Attachment: the code is located at org. Apache. hadoop. mapred. maptask. trackedrecordreader. inputbytecounter.).
"File output format counters:Bytes written"During job execution, there may be map and reduce operations, but only map operations may exist. However, after the job is executed, the results are generally written to HDFS, this value is the size of the result file. If it is a compressed file, its value is only the size before the compressed file is decompressed (Attachment: the code is located at org. Apache. hadoop. mapred. maptask. directmapoutputcollector. fileoutputbytecounter and org. Apache. hadoop. mapred. cetcetask. newtrackingrecordwriter. fileoutputbytecounter.).
However, these tiny counters do not count the file reads and writes during map and reduce sorting. Therefore, to measure the I/O reads and writes of job tasks, I think it is best to use the counter of filesystemcounters.
Io read/write traffic can be summed up by the preceding four filesystemcounters parameters. The following deficiencies exist:
"Filesystemcounters: hdfs_bytes_written", it is only the HDFS write size of a copy, and the HDFS block copy can be adjusted, so the IO read/write traffic also needs "filesystemcounters: hdfs_bytes_written "* Number of copies.
Both map and reduce are user-defined. It may be because your code bypasses the hadoop framework and does not use Org. apache. hadoop. FS. filesystem. open file, this part of Io read/write traffic cannot be counted.
Network Traffic
The phase in which hadoop tasks generate network traffic: Map input pulls data from HDFS, reduce shuffle pulls data from the map end, and reduce completes writing results to HDFS (if there is no reduce, that is, the map completes writing results to HDFS ).
The traffic generated by the interaction between job and HDFS can be obtained through two counters of Io read/write analysis: "filesystemcounters: hdfs_bytes_read" and "filesystemcounters: hdfs_bytes_written"
The counter corresponding to the traffic generated by pulling data from the map end during reduce Shuffle is:
"Map-Reduce framework:Reduce shuffle bytes"It is the cumulative data size of the intermediate result pulled from reduce to map. If the intermediate result produced by map is a compressed file, its value is the size before the compressed file is decompressed (Attachment: the code is located at org. Apache. hadoop. mapred. reducetask. reduceshufflebytes.).
Network traffic can be summed up by the preceding three parameters:
"Filesystemcounters: hdfs_bytes_read" and "filesystemcounters: hdfs_bytes_written" do not consider hadoop's local optimization of HDFS. When HDFS reads and writes data blocks, if the client and the target block are on the same node, it reads and writes data locally. If some data blocks are stored locally, hadoop reads and writes data directly through the local file system instead of through the network.
"Filesystemcounters: hdfs_bytes_written", it is only the HDFS write size of a copy, and the HDFS block copy can be adjusted, so the network traffic also needs "filesystemcounters: hdfs_bytes_written "* Number of copies.
Both map and reduce are user-defined, and user code may bypass the hadoop framework to generate network communication on its own. This part of traffic cannot be counted.
Reference http://www.cnblogs.com/xuxm2007/archive/2012/06/15/2551030.html