Today, the MapReduce wrote a job, the purpose is to read the data in the database of multiple tables, and then in Java based on the specific business situation to do filtering, and the results of the data written to the HDFs, in the eclipse to submit a job to debug, found in the reduce stage, Always throw out the exception of Java heap space, which is very obvious, is the heap memory overflow caused, and then scattered fairy carefully looked at the code of the business block, in reduce read the database, there are several tables of the return data amount of about 500,000, because the number of specific is not too big, So there is no paging to return, read after the data, using the map set package, in the business process for a period of time, will always stay in memory, the original mapred-site.xml inside the configuration reduce memory is relatively small, just adjust the memory can be larger.
<property>
<name>mapreduce.map.memory.mb</name>
<value>215</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>- xmx215m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb </name>
<value>1024</value>
</property>
<property>
<name >mapreduce.reduce.java.opts</name>
<value>-Xmx1024M</value>
</property>
Several important parameters of hadoop2.2 memory control:
YARN
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
Yarn.nodemanager.vmem-pmem-ratio
yarn.nodemanager.resource.memory.mb
mapreuce
Map Memory
Mapreduce.map.java.opts
mapreduce.map.memory.mb
Reduce memory
mapreduce.reduce.java.opts
Mapreduce.reduce.memory.mb
If an exception occurs:
Container [pid=17645,containerid=container_1415210272486_0013_01_000004] is running beyond physical memory. Current usage:1.0 GB of 1 GB physical memory used; 1.6 GB of 2.1 GB virtual memory used. Killing container.
Dump of the Process-tree for container_1415210272486_0013_01_000004:
Can adjust the ratio of yarn.nodemanager.vmem-pmem-ratio, The default is 2.1, or increase the number of runs of program reduce to try, this ratio of control affects the use of virtual memory, when yarn calculated virtual memory, than in Mapred-site.xml mapreduce.map.memory.mb or mapreduce.reduce MEMORY.MB 2.1 times times more, it will occur in the screenshot above the exception, and the default MAPREDUCE.MAP.MEMORY.MB or
Mapreduce.reduce.memory.mb Initial size of 1024M, and then based on the exception of the yarn itself based on the operating environment of the calculated virtual memory to do comparisons, found larger than 1024*2.1, So the nodemanage daemon will kill the AM container, causing the entire Mr Job to fail, and now we just need to increase the ratio to avoid this anomaly. The specific tone is mostly small, can be set according to the specific situation.