Problem:
I wrote a yarn on the application, found that nodemanager over time, will out of memory, the NodeManager heap memory from 1G to 2G is also unable to avoid the NM program Oom
JMX monitoring with NM enabled
-dcom.sun.management.jmxremote-dcom.sun.management.jmxremote.port=50079 - Dcom.sun.management.jmxremote.local.only=false -dcom.sun.management.jmxremote.authenticate=false -dcom.sun.management.jmxremote.ssl=false
Then connect with Jconsole, as shown in
Then the main memory is occupied in the old Gen, perform GC also has no effect, indicating that the memory is being referenced.
Wrote a crontab half an hour to generate a copy of the memory object data
*/18114 >/home/yarn/log/'date'. Log
The comparison found that the main memory growth in almost all of the [C this class, because the number of instances basically does not grow, just size growth, presumably because StringBuilder's constant append cause
:/home/yarn/log#diffmon\ aug\ -\Geneva\: -\: Geneva\ pdt\ -. log mon\ aug\ -\ +\: -\: on\ pdt\ -. Log4, 10C4,Ten<1:33014 312864176[C<2:90784 11817968<constMethodKlass><3:90784 11632640<methodKlass><4:7756 8908000<constantPoolKlass><5:7756 5780072<instanceKlassKlass><6:6493 4882976<constantPoolCacheKlass><7:21287 3447608[B--->1:33219 622929048[C>2:90897 11830824<constMethodKlass>>3:90897 11647104<methodKlass>>4:7807 8948632<constantPoolKlass>>5:7807 5810392<instanceKlassKlass>>6:6543 4913344<constantPoolCacheKlass>>7:21346 3470040[B
Using Java Tools
Jmap-dump:live,format=b,file=-j-xmx1024m [file]
It's not clear what you see.
The final plan is to use the mat to graphically analyze the problem, https://eclipse.org/mat/
Code to navigate to the Shell.java
Originally NodeManager this side to start a command, will always record the standard error output to a variable, this variable is not released during the program run, the GC can not reclaim space, after finding the problem, the solution is very simple. When you start a command, both the standard output and the error output are positioned to a file, and the NodeManager is not allowed to receive it. As follows
// Add Log redirect params Vargs.add ("1>>" + Applicationconstants.log_dir_expansion_var + "/" + Voidboxconfiguration.voidbox_ Proxy_log_filename); Vargs.add ("2>>" + Applicationconstants.log_dir_expansion_var + "/" + voidboxconfiguration.voidbox_ Proxy_log_filename);
But this is a potential bug that needs fix, not because of some of the program's considerations, and affect its own stability.
Suggest can do rotate to errmsg.
NodeManager out of heap memory[fix bug whole process]