When running mapreduce jobs, beginners often encounter various errors, often on the cloud. Generally, they directly paste the errors printed on the terminal to the search engine for help.
For hadoop, when an error occurs, you should first check the log, and the general production in the log will have a detailed error cause prompt. Hadoop mapreduce logs are divided into two parts:Service logs, In partJob log, The details are as follows:
1. hadoop 1.x
Mapreduce in hadoop 1. xService logsIncluding jobtracker logs and tasktracker logs. Their log locations are as follows:
Jobtracker: On the jobtracker installation node, the default location is
$ {Hadoop_home}/logs/*-jobtracker-*. log. This file is generated every day. The old log suffix is date, and the log file Suffix of the current day is ". log ".
Tasktracker: on each tasktracker installation node, the default location is
$ {Hadoop_home}/logs/*-tasktracker-*. log. This file is generated every day. The old log will be followed by a log. The log file Suffix of the day is ". log"
Job logIncluding the jobhistory log and task log. The jobhistory log is the job running log, including the job start time, end time, start time, end time, and various counter information, users can parse various information about job running from this log, which is very valuable. The default storage location is $ {hadoop_home}/logs/history directory of the node where jobtracker is located. You can configure it by using the hadoop. Job. History. location parameter. Each task log is stored on the task running node at $ {hadoop_home}/userlogs //. Each task contains three log files: stdout, stderr, and syslog, stdout is a log printed by standard output, such as system. out. println, note that the logs printed by standard output in the program are not directly displayed on the terminal, but saved in this file. Syslog logs are printed through log4j, this log usually contains the most useful information and is also the most critical reference log for error debugging.
2. hadoop 2.x version
In hadoop 2.x, yarn system service logs include ResourceManager logs and nodemanager logs. Their log locations are as follows:
The log storage location of ResourceManager is yarn-*-ResourceManager-*. log under the logs directory under the hadoop installation directory.
Nodemanager logs are stored in yarn-*-nodemanager-*. log under the logs directory under the hadoop installation directory on each nodemanager node.
Application logs include the jobhistory log and the container log. The jobhistory log is the application program running log, including the application startup time, end time, start time, and end time of each task, various counter information.
The container log contains applicationmaster logs and common task logs, which are stored in the application_xxx directory in the userlogs directory under the hadoop installation directory. The applicationmaster log directory is named container_xxx_000001, the common task log directory name is container_xxx_000002, container_xxx_000003 ,...., Like hadoop 1.x, each directory contains three log files: stdout, stderr, and syslog, which have the same meanings.
3. Summary
Hadoop logs are the most important channel for users to locate problems. For Beginners, they often do not realize this, or even realize this, they cannot find the log storage location, I hope this article will help beginners.
Hadoop Log File