Document directory
- 1. system. Out and system. Err
- 2. counters
- 3. Set the status
- 4. Use the output file to output debugging information
- Reference
Reprint hadoopProgram printing and debugging 1. system. Out and system. errmain
InMain
Use in FunctionsSystem.out
Standard output andSystem.err
Standard Error output: the output is directed to the terminal of the execution program node. That is, in the fully distributedHadoopIn deployment, the output is directed to the terminal of the node where the program is started.
Mapper
For each split,Mapper
Object as a task, inHadoopRun the command on a map slot in a machine in the cluster.Mapper
Objectmap
Functions andconfigure
The function is called by the main program in RPC mode. ThereforeMapper
ObjectSystem.out
AndSystem.err
The output result is not directed to the terminal of the execution program node. In this case, there are only two ways to view mapper output:
On the jobtracker web page, the default port is bound to port 50030. Enter the job details page according to jobid, and then enter the map details page. You can see the list of map tasks executed by this job. Go to any one and view the task logs. You can see the stdout logs and stderr logs of the corresponding map task, which areSystem.out
AndSystem.err
Output content.
Enter the node that executes a map task<mapred.local.dir>/userlogs/job_<jobid>/attempt_<taskid>/
In the directory, the stdout and stderr files areSystem.out
AndSystem.err
Output content. This method requires the access permission of the node host, which is not very convenient.
Reducer
Reducer
AndMapper
SimilarMapper
.
Eclipse
Debug the program in a pseudo-distribution environment built using eclipse and locally,System.out
AndSystem.err
The output will be directed to the eclipse terminal. However, in the local pseudo-distribution environment, onlymain
. Maybe eclipseHadoopThe plug-in uses a method that has not been analyzed yet.
2. counters
For large distributed jobs, it is easier to use counters because:
- Obtaining counter values is more convenient than outputting logs.
- Counting the occurrence times of a specific event based on the counter value is much easier than analyzing a pile of log files.
The counter is maintained by its associated tasks, regularly transmitted to tasktracker, and then to jobtracker. Therefore, counters can be globally aggregated. The partial counter value of a task is a complete report every time, instead of reporting the current Increment Based on the previous report, so as to avoid errors caused by message loss. In addition, when a task fails, the Global Counter minus the value of the local counter of the failed task. When a task is successfully executed, its local counter value is complete and reliable.
HadoopMaintains severalBuilt-in countersTo describe the indicators of the job.
User-Defined counters
You can write a program to define your own counters.mapper
Andreducer
. A group of counters can be defined by the enumeration type. The Enumeration type name is the group name. Several counters in the group can be defined by enumerated fields. The field name is the counter name.
A set of counters are defined globally as follows:Temperature
The Group has two counters.MISSING
AndMALFORMED
:
enum Temperature { MISSING, MALFORMED}
Thenmapper
OrMALFORMED
Available inReporter
Of
public void incrCounter(String group, String counter, long amount)
Method To increase the counter value:
reporter.incrCounter(Temperature.MISSING, 1);reporter.incrCounter(Temperature.MALFROMED, 1);
Dynamic counter
Dynamic counters do not need to be pre-defined by Enumeration type, but only need to dynamically create counters during execution. You only need to useReporter
Of
public void incrCounter(String group, String counter, long amount)
Method.
Counter value acquisition
InHadoopWhen the job is executed,mapper
Andreducer
AvailableReporter
To obtain the current counter.
public Counter getCounter(Enum<?> name);public Counter getCounter(String group, String name);
InHadoopAfter the job is executed, the terminal returns the value of each counter of the program. In addition, there are two ways to get the counter value:
View the counter value of each task in each job and job on the jobtracker web page.
Run the following command to obtain the counter value:
hadoop job -counter [-counter ]
3. Set the status
InMap
AndReduce
Availablereporter.setStatus("Error")
Set the status of each task, which can be viewed in the jobtracker web.
4. Use the output file to output debugging information
Another method is to useReducer
Uses the key to differentiate the normal output and debugging output, and outputs the debugging information to the file. For details, refer to 3.
Reference
- Hadoop: The definitive guide
- Hadoop-Mapreduce Process
- HadoopDebugging information output