Two JTCDH4.2.0) OOM problems occurred in the previous phase, leading to an error in the ETL process. Because most of the cluster parameters that were just taken over were default, the CMS related to the JVM parameters of JT was modified, at the same time, the interval and cachesize of the retireJob are reduced to see if it works. after three days, I started to report an alarm. I can see that the Old gen has been rising and cannot be released. It is estimated that it is a memory leak. I will analyze the memory dump with a heap size of 10 Gb ), after two dump operations, we found that the FileSystem $ Cache section has been increasing:
650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/1510094291-0.png "title =" j1.png "alt =" 103638996.png"/>
Google found the related Bug: https://issues.apache.org/jira/browse/mapreduce-5351?solution:
1. Monitoring JT, restart according to a certain threshold
2. Modify the following parameters and write a script to manually clean up the job file.
<Property>
<Name> keep. failed. task. files </name>
<Value> true </value>
</Property>
<Property>
<Name> keep. task. files. pattern </name>
<Value> buhuibeipipeidaodezhengzebiaodashi </value>
</Property>
However, method 2 has a bug in the kerboros environment: https://issues.apache.org/jira/browse/mapreduce-5047. the final solution was found.
This article is from "MIKE's old blog" blog, please be sure to keep this source http://boylook.blog.51cto.com/7934327/1298929