Set up CDH and run the example program Word-count. The map 0% reduce 0% is always displayed on the console interface, and the job status is run on the web page, but the map is not executed. It seems that there is a problem with resource allocation. Then you can view the task log.
2014-07-04 17:30:37,492 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=02014-07-04 17:30:37,492 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 22014-07-04 17:30:38,496 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:02014-07-04 17:30:38,496 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 0
There are no errors in the log, but this information is always printed. It should be because RM resources are not allocated enough.
In yarn, resources include memory and CPU. Resource Management is completed by ResourceManager and nodemanager. ResourceManager is responsible for managing and scheduling all node resources. Nodemanager allocates and isolates the node resources of the process. ResourceManager assigns resources on a nodemanager to the task. The following describes some important parameters in detail.
Yarn. nodemanager. Resource. Memory-MB
Memory available for each node, in MB. The default value is 8 GB, which is used for nodemanager allocation. The problem I encountered was that the resource allocation was too small and only 1 GB.
Yarn. scheduler. Minimum-allocation-MB
The minimum memory size that can be applied for by a single task. The default value is 1024 MB, which is slightly larger. This avoids small resource waste. I allocate 512 MB to the task because I have fewer local resources, the cause of the failure is that the allocation is too large.
Yarn. scheduler. Maximum-allocation-MB
Maximum memory size that can be applied for by a single task. The default value is 8192 mb. If it is a spark task, increase it here.
Mapreduce. Map. Memory. MB
The physical memory limit of each map task should be greater than or equal to yarn. scheduler. Minimum-allocation-MB.
Mapreduce. Reduce. Memory. MB
Physical memory limit for each reduce task
Mapreduce. Map. java. opts
JVM heap size of each map process
Mapreduce. Reduce. java. opts
JVM heap size of each reduce Process
Each node can run the number of maps and EMR inputs, which are obtained by yarn. nodemanager. Resource. Memory-MB except mapreduce. Map. Memory. mb and mapreduce. Reduce. Memory. MB.
Http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-memory-cpu-scheduling/ has some parameters for reference here