Log Files in the Hadoop Cluster

Source: Internet
Author: User

Log Files in the Hadoop Cluster

Hadoop has a variety of log files, in which the log files on the master record comprehensive information, including the jobtracker and datanode on the slave will also write the error information to the master. The logs in slave mainly record completed task information.

By default, hadoop logs are stored in the HADOOP_INSTALL/logs directory. However, you are advised to specify the path again. Commonly Used:/var/log/hadoop, by adding the following line to the hadoop-env.sh:
Export HADOOP_LOG_DIR =/var/log/hadoop

1. logs on the master server

1. There are four types of logs stored on the master server. Note that some of the logs on tasktracker and datanode are stored in the master to locate the specific server in case of problems.

2. There are two types of logs in the master, which are suffixed with log and out respectively. Each daemon generates these two logs, for example, jobtracker/namenode/tasktracker/datanode generates the two log files respectively. These two files are generated every day.

3. log files are recorded through log4j. Most application log messages are written to this log file. The first step of fault diagnosis is to check the file. [This log file is the most important]
The out log file records the standard output and standard error logs. Because most logs are output to the log file using log4j, this file is small or empty. The system only keeps the latest 5 logs.

4. The names of these two types of logs contain the user name, daemon name, and local host name.


Ii. logs on the slave server

(1) tasktracker logs
Each tasktracker sub-process uses log4j to generate the following four log files, which record the log output of each task.
1. Log File (syslog)
Logs recorded through Log4j

2. Save the file (stdout) that is sent to the standard output data)

3. Save the stderr File)

4. log. index

(1) tasktracker records the logs of all tasks it runs. The default directory is $ HADOOP_LOG_DIR/userlogs. Each job generates a separate directory as follows:
[Jediael @ slave1 userlogs] $ pwd
// Mnt/jediael/hadoop-1.2.1/logs/userlogs
Jediael @ slave1 userlogs] $ ls
Job_201502271057_0243 job_201502271057_0245 job_201502271057_0247 job_201502271057_0250 job_201502271057_0253
Job_201502271057_0244 job_201502271057_0246 job_201502271057_0249 job_201502271057_01_job_201502271057_0255

(2) enter the specific directory with the following content:
[Jediael @ slave1 job_201502271057_0243] $ ll
Total 16
Lrwxrwxrwx 1 jediael 95 Feb 28 pm->/mnt/tmphadoop/mapred/local/userlogs/job_201502271057_0243/large
Lrwxrwxrwx 1 jediael 95 Feb 28 pm->/mnt/tmphadoop/mapred/local/userlogs/job_201502271057_0243/large
Lrwxrwxrwx 1 jediael 95 Feb 28 pm->/mnt/tmphadoop/mapred/local/userlogs/job_201502271057_0243/large
-Rw-r ----- 1 jediael 502 Feb 28 job-acls.xml
It can be seen that this tasktracker runs three job_201502271057_0243 tasks. The log directory of this task is just a link, which is linked to the tmphadoop directory.

(3) Go to the actual directory and find the following four log files:
[Jediael @ slave1 userlogs] $ cd/mnt/tmphadoop/mapred/local/userlogs/job_201502271057_0243/attempt_201502271057_0243_m_000000_0
[Jediael @ slave1 attempt_201502271057_0243_m_000000_0] $ ll
Total 36
-Rw-r -- 1 jediael 154 Feb 28 log. index
-Rw-r -- 1 jediael 0 Feb 28 :06 stderr
-Rw-r -- 1 jediael 0 Feb 28 :06 stdout
-Rw-r -- 1 jediael 30248 Feb 28 syslog

(2) datanode logs

Iii. audit logs
This log records all HDFS requests, which are disabled by default. Generally, it is written into the namenode log.
Set the following options in the log4j. properties property file:
# All audit events are logged at INFO level
Log4j.logger.org. apache. hadoop. hdfs. server. namenode. FSNamesystem. audit = WARN
Because the audit information is implemented at the INFO level, you can enable audit by changing WARN to info.

Iv. MR job history logs
Record completed tasks in HADOOP_LOG_DIR/histroy.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.