Click to view original text
Hadoop offers several ways to debug hadoopstreaming that you use to quickly locate problems. Let the Hadoopstreaming program run on the development machine. (Recommended for use at development time)
Add mapred.job.tracker=local to the jobconf. The inputs and outputs of the data are from HDFs
At this point, Hadoopstreaming will run the program locally to keep the error on the spot (recommended when running large amounts of data)
By setting the jobconf parameter Keep.failed.task.files=true, when a program error occurs, you can leave the current
Debug. The GUI can be traced to the specific node in which the failure was run, and then logged into the node <local>/tasktracker/<taskid>/work/to view the core file. To debug a program by collecting information from a script program (recommended for development use)
Write the debug script, through the script, you can put the program execution of any site in the process of preservation, such as
The stack information for the core file so that you can determine where the program is going wrong.
The script is called in the following way:
$script $stdout $stderr $syslog $jobconf program name
(Note: In the official documentation
The program name in the Http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms description is returned by the 5th parameter, but in the test environment of Hadoop 0.19, I get this parameter is empty. )
Script Example:
Core= ' Find. -name ' core* ';
CP $core/home/admin/
gdb-quiet./a.out-c $core-X./pipes-default-gdb-commands.txt
Pipes-default-gdb-commands.txt marked the execution of the GDB command
Info Threads
BackTrace
quit
(Note: If you want to execute the above script correctly, you must allow the program to output the core file, you can add the following code snippet in the program)
struct Rlimit limit;
Limit.rlim_cur = 65535;
Limit.rlim_max = 65535;
if (Setrlimit (Rlimit_core, &limit)! = 0) {
printf ("Setrlimit () failed with errno=%s\n", Strerror (errno));
Exit (1);
}
Then, in jobconf, assign the script to execute to the variable "mapred.map.task.debug.script" or "Mapred.reduce.task.debug.script". This way, when the hadoopstreaming execution process occurs core dump, you can see GDB's information through the Jobtracker GUI interface.