Object Healthstatus in Tasktracker saves the current node's health state, The corresponding class is org.apache.hadoop.mapred.TaskTrackerStatus.TaskTrackerHealthStatus. Defined as follows:
static class Tasktrackerhealthstatus implements writable { /span>private // node is healthy private String Healthreport; // private long lastreported; // recent reporting time, last reported time ..." ....................
The Healthstatus object is a property in the Tasktrackerstatus instance status that is sent to jobtracker along with other node information, such as memory capacity. The properties in Healthstatus are computed by thread nodehealthcheckerservice. This thread allows an administrator to configure a health monitoring script to detect node health. The only thing to note is that if the script monitors the node in an unhealthy state, you need to print a statement that begins with "ERROR" in standard output. The nodehealthcheckerservice thread monitors the output of the script periodically, and if the statement in the output has a statement that begins with an error, then it is set to an unhealthy node, and Jobtracker adds the node to the blacklist. Assignment tasks are not sent to the node again.
Benefits of this mechanism:
1, can be as the load of the node feedback, such as: When the script monitoring network, IO, file system and other busy time, can be notified to set as unhealthy nodes, reduce the allocation of tasks.
2, artificial temporary maintenance tasktracker, when the occurrence of tasktracker failure, you can temporarily let the Tasktracker stop receiving new tasks, after maintenance, in the set to be able to receive the state.
Configuration parameters:
Parameter name |
Parameter meaning |
Mapred.healthChecker.script.path |
The absolute path where the Health check script is located, the thread bodehealthcheckerservice the new execution of the script to determine if the node is healthy, and if it is empty, the thread is not started. |
Mapred.healthChecker.interval |
Frequency of thread nodehealthcheckerservice execution of monitoring scripts, in milliseconds |
Mapred.healthChecker.script.timeout |
If the monitoring script does not respond within a certain amount of time, it is set to unhealthy |
Mapred.healthChecker.script.args |
Monitor script parameters, if there are multiple parameters separated by commas |
Example: Print the statement at the beginning of an error when the idle memory of a node is less than 10% of the total amount of memory.
#! /bin/= 0.1= ' grep memfree/proc/meminfo | awk ' {print $} '= ' grep memtotal/proc/meminfo | awk ' {print $} '= ' echo | awk ' {print int ("' $totalMem '" * "' $MEMORY _ratio ')} 'if [$freeMem-lt $limitMem]; Then "ERROR, totalmem= $totalMem, freemem= $freeMem, limitmem= $limitMem"else "OK"fi
[Hadoop]-Tasktracker Source Analysis (Tasktracker node health monitoring)