Hadoop log simple analysis

Source: Internet
Author: User
Tags syslog

I. Overview
Based on the analysis of 0.19.1, this article shows some alibaba hadoop optimizations. This article does not involve the jobtracker and nodename metadata. This article mainly describes some logs generated by a task in the computing stage and some log problems.
Ii. Brief Introduction to logs
When all the daemon processes get up (for simplicity, we use the pseudo-distribution mode, which is built on a machine), the general directory structure is as follows:
[Python]
[Dragon. caol @ hd19-vm1 logs] $ tree
.
| -- Hadoop-dragon.caol-datanode-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-datanode-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-jobtracker-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-jobtracker-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-namenode-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-namenode-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-secondarynamenode-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-secondarynamenode-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-tasktracker-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-tasktracker-hd19-vm1.yunti.yh.aliyun.com.out
| -- History
'-- Userlogs
'-- ToBeDeleted
 
3 directories, 10 files
Note,
The log configuration is configured by the configuration file log4j. properties, but some of the configuration items are determined when the shell command is started.
For example:
[Python]
307 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. log. dir = $ HADOOP_LOG_DIR"
308 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. log. file = $ HADOOP_LOGFILE"
309 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. home. dir = $ HADOOP_HOME"
310 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. id. str = $ HADOOP_IDENT_STRING"
311 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. root. logger =$ {HADOOP_ROOT_LOGGER:-INFO, console }"
312 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. root. logger. appender =$ {HADOOP_ROOT_LOGGER_APPENDER:-console }"
313 HADOOP_OPTS = "$ HADOOP_OPTS-Dhadoop. root. logger. level =$ {HADOOP_ROOT_LOGGER_LEVEL:-info }"
After jobtracker is started, the jvm parameter is:
[Python]
5537 JobTracker-Xmx1000m-Dcom. sun. management. jmxremote-Dcom. sun. management. jmxremote-Xdebug-Xrunjdwp: transport = dt_socket, address = 1314, server = y, suspend = n-Dhadoop. log. dir =/home/dragon. caol/hadoop-0.19.1-dc/bin /.. /logs-Dhadoop. log. file = hadoop-dragon.caol-jobtracker-hd19-vm1.yunti.yh.aliyun.com.log-Dhadoop. home. dir =/home/dragon. caol/hadoop-0.19.1-dc/bin /.. -Dhadoop. id. str = dragon. caol-Dhadoop. root. logger = INFO, RFID-Dhadoop. root. logger. appender = rf-Dhadoop. root. logger. level = info-Djava. library. path =/home/dragon. caol/hadoop-0.19.1-dc/bin /.. // lib/native/Linux-amd64-64
Use these configurations to assemble log4j configurations.
After a simple task is executed, the log structure of the logs directory is roughly as follows:
[Python]
[Dragon. caol @ hd19-vm1 logs] $ tree
.
| -- Hadoop-dragon.caol-datanode-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-datanode-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-jobtracker-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-jobtracker-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-namenode-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-namenode-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-secondarynamenode-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-secondarynamenode-hd19-vm1.yunti.yh.aliyun.com.out
| -- Hadoop-dragon.caol-tasktracker-hd19-vm1.yunti.yh.aliyun.com.log
| -- Hadoop-dragon.caol-tasktracker-hd19-vm1.yunti.yh.aliyun.com.out
| -- History
| -- H1_1348474254849_job_201209241610_0001_conf.xml
| '-- H1_1348471094849_job_201209241610_00020.dragon.caol_word + count
| -- History. idx
| -- Job_201209241610_0001_conf.xml
'-- Userlogs
| -- Job_201209241610_0001
| -- Attempt_201209241610_0001_m_000000_0
| -- Log. index
| -- Stderr
| -- Stdout
| '-- Syslog
| -- Attempt_201209241610_0000000m_0000000000
| -- Log. index
| -- Stderr
| -- Stdout
| '-- Syslog
| -- Attempt_201209241610_0001_m_000002_0
| -- Log. index
| -- Stderr
| -- Stdout
| '-- Syslog
| '-- Attempt_201209241610_0001_r_000000_0
| -- Log. index
| -- Stderr
| -- Stdout
| '-- Syslog
'-- ToBeDeleted
 
8 directories, 30 files
We can see that there are subfiles in history and userlogs, and there are more history. idx and job_201209241610_0001_conf.xml files. These files are actually generated by the Framework in different periods. Some are index files, basically for page query. It will be deleted after a period of time.
Ii. user logs userlogs
We mainly focus on the userlogs file, because most of the content is generated by the customer's code, if the customer writes the system in the code. the log appears in the corresponding attempt_YYYYMMDDHHMM_000x _ (r/m) _ 000000_x/stdout file, and err will appear in ***/stderr. For log4j
Syslog.
Why is the log located? The System. err/out log actually changes the standard input stream and standard output stream to a file when the shell script calls java. Log4j is determined by the aforementioned log4j. properties and TaskRunner. log4j adopts the org. apache. hadoop. mapred. TaskLogAppender implemented by hadoop. TaskLogAppender makes a restriction here, if mapred. userlog. limit. if the size of kb is greater than 0, the tail-c mode is adopted, that is, the FIFO mode is adopted. When the task ends, all logs in the queue are flushed to the disk. Www.2cto.com

Currently. out/err is not restricted. I will adopt a restricted solution. The Code is as follows: the basic idea is to use LimitOutputSteam to redecorate the System. out/err limits the size by counting. If the value exceeds the limit, an exception is thrown, which will be perceived by the user. In fact, it is not recommended to use system. out/err to print logs.
[Java]
Import java. io. FilterOutputStream;
Import java. io. IOException;
Import java. io. OutputStream;
 
/**
* Limit writing data. if writing data size has reached limit, system writes log
* Message and throws IOException
*
* @ Author dragon. caol
* @ Date 2012-09-21
*
*/
Public class LimitOutputSteam extends FilterOutputStream {
Private long remaining;
Private String msg;
Private boolean limit = Boolean. FALSE;
Private boolean isRestriction = Boolean. FALSE;
 
Public LimitOutputSteam (OutputStream out, long maxSize ){
Super (out );
If (maxSize> 0 ){
This. isRestriction = Boolean. TRUE;
This. remaining = maxSize;
This. msg = "\ nException: Written to stdout or stderr cann't exceed"
+ (Float) Max size/1024) + "kb. \ n ";
}
}
 
Public void write (int B) throws IOException {
If (isRestriction ){
If (! Limit ){
If (remaining --> 0 ){
Super. write (B );
} Else {
For (int I = 0; I <msg. length (); I ++ ){
Super. write (msg. getBytes () [I]);
}
This. limit = Boolean. TRUE;
Throw new LimitOutputSteamException (msg );
}
}
} Else {
Super. write (B );
}
}
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.