Log aggregation is the log centralized management feature provided by yarn that uploads the completed container/task log to HDFs, reducing the nodemanager load and providing a centralized storage and analysis mechanism. By default, the container/task log exists on each NodeManager, and additional configuration is required if the Log aggregation feature is enabled.
Parameter configuration yarn-site.xml
1.yarn.log-aggregation-enable
Parameter description: Whether the Log aggregation feature is enabled, log aggregation is turned on and saved to HDFs.
Default value: False
2.yarn.log-aggregation.retain-seconds
Parameter description: How long the aggregated log is stored in HDFs, in S.
Default value:-1 (does not enable log aggregation), for example set to 86400, 24 hours
3.yarn.log-aggregation.retain-check-interval-seconds
Parameter description: Removes the interval that the task performs on HDFs, executes the log deletion that satisfies the condition (the log that exceeds the time set by parameter 2), and if 0 or negative, sets the value of 1/10 for the parameter 2, where the previous example value is 8640s.
Default value: 1
4.yarn.nodemanager.log.retain-seconds
Parameter description: When log aggregation is not enabled this parameter takes effect, the log file is saved in the local time, in units of S
Default value: 10800
5.yarn.nodemanager.remote-app-log-dir
Parameter description: When the application finishes running, the log is transferred to the HDFs directory (valid when log aggregation is enabled) and is modified to the saved log folder.
Default value:/tmp/logs
6.yarn.nodemanager.remote-app-log-dir-suffix
Parameter description: Remote log directory subdirectory name (valid when log aggregation is enabled).
Default value: Logs log will be transferred to directory ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisparam}
Refer to Dong's blog: http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-configurations-log-aggregation/
Yarn Log Aggregation Related parameter configuration