Problem
Dubbo Internal custom version, when processing more than 10K packets, there will be a memory overflow phenomenon
The reason is that when we customized the Dubbo HTTP protocol, we used the Httprequestdecoder HTTP decoder method inside the JBoss package to parse the HTTP protocol content.
This method can present a memory overflow condition when parsing large contents of non-HTTP protocol.
A service because of this problem, there is a full GC situation
Reproduce the problem
Reproduce the problem according to the description
- Specify Dubbo version
- Dubbo request, non-HTTP request
- Message body greater than 10K
JVM heap configuration, jmap-heap pid
Heap Configuration: MinHeapFreeRatio = 0 MaxHeapFreeRatio = 100 MaxHeapSize = 2147483648 (2048.0MB) NewSize = 44564480 (42.5MB) MaxNewSize = 715653120 (682.5MB) OldSize = 89653248 (85.5MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB)
Constructs provider and consumer, then adds parameters to print GC details at the time of the consumer's entry of approximately 100K, starting Tomcat (JVM), and observing the GC when initiating the call
-XX:+PrintGCDetails -XX:+PrintGCDateStamps
When the first call did not appear full GC situation, think is not concurrency a bit small, after all, there is a certain amount of concurrency on the line, so the consumer-side start thread call, start the number of threads 1, 10, 100, 1000, the number of threads is 1000 when the provider side appears FULLGC situation
2018-01-19t21:01:23.758-0800: [GC (Allocation Failure) [psyounggen:454144k->696k (468992K)] 1394833k->1383056k (1867264K), 0.1343399 secs] [times:user=0.26 sys=0.01, real=0.14 secs] 2018-01-19t21:01:23.892-0800: [Full GC (ergonomic s) [psyounggen:696k->0k (468992K)] [paroldgen:1382359k->940805k (1398272K)] 1383056k->940805k (1867264K), [ Metaspace:52098k->52098k (1097728K)], 0.2305879 secs] [times:user=0.54 sys=0.01, real=0.23 secs] 2018-01-19t21:0 1:24.629-0800: [Full GC (ergonomics) [psyounggen:464621k->0k (468992K)] [paroldgen:1382601k->941472k (1398272K) ] 1847223k->941472k (1867264K), [metaspace:52098k->52098k (1097728K)], 0.2063340 secs] [times:user=0.59 sys=0.00 , real=0.21 secs] 2018-01-19t21:01:25.305-0800: [Full GC (ergonomics) [psyounggen:454693k->0k (468992K)] [Paroldgen: 1383395k->499265k (1398272K)] 1838088k->499265k (1867264K), [metaspace:52098k->52098k (1097728K)], 0.1478104 secs] [times:user=0.34 sys=0.01, real=0.15 secs] 2018-01-19t21:01:25.945-0800: [Full GC (ergonomics) [psyounggen:457504k->0k (468992K)] [paroldgen:1383424k->499950k ( 1398272K)] 1840928k->499950k (1867264K), [metaspace:52098k->52098k (1097728K)], 0.1390671 secs] [times:user= 0.36 sys=0.00, real=0.14 secs] 2018-01-19t21:01:26.585-0800: [Full GC (ergonomics) [psyounggen:456673k->0k (468992K) ] [paroldgen:1384488k->499639k (1398272K)] 1841162k->499639k (1867264K), [Metaspace:52098k->52098k ( 1097728K)], 0.1344279 secs] [times:user=0.32 sys=0.01, real=0.14 secs]
JSTAT-GC pid-t 1s
Timestamp s0c s1c s0u s1u EC EU OC OU MC MU ccsc CCSU ygc Y GCT FGC fgct GCT 1438.8 1024.0 5120.0 0.0 0.0 443392.0 430848.7 1398272.0 478878.3 53464.0 5211 7.3 6144.0 5945.4 1733 97.928 949 166.820 264.748 1440.0 1024.0 5120.0 0.0 0.0 443392.0 435843.6 139 8272.0 1321492.8 53464.0 52117.3 6144.0 5945.4 1733 97.928 950 166.962 264.889 1441.0 1024.0 5120.0 0.0 0.0 443392.0 433460.4 1398272.0 900800.9 53464.0 52117.3 6144.0 5945.4 1733 97.928 952 167.226 265.153 1441.9 1024.0 5120.0 0.0 0.0 443392.0 0.0 1398272.0 479419.0 53464.0 52117.3 6144.0 5945.4 1733 97 .928 954 167.491 265.419 1443.0 1024.0 5120.0 0.0 0.0 443392.0 438270.0 1398272.0 1324328.4 53464.0 52 117.3 6144.0 5945.4 1733 97.928 955 167.632 265.560 1444.0 1024.0 5120.0 0.0 0.0 443392.0 437239.3 1 398272.0 902696.2 53464.0 52117.3 6144.0 5945.4 1733 97.928 957 167.902 265.830 1445.0 1024.0 5120.0 0.0 0.0 443392.0 440 249.7 1398272.0 1326030.9 53464.0 52117.3 6144.0 5945.4 1733 97.928 959 168.046 265.974 1446.0 1024.0 51 20.0 0.0 0.0 443392.0 434997.3 1398272.0 903830.7 53464.0 52117.3 6144.0 5945.4 1733 97.928 960 168.341 2 66.269 1447.0 1024.0 5120.0 0.0 0.0 443392.0 423591.6 1398272.0 480931.8 53464.0 52117.3 6144.0 5945.4 1733 97.928 962) 168.610 266.537
Cause of the problem
FULLGC, take out the heap dump now to see how the current JVM stacks memory usage
jmap -dump:format=b,file=dump.bin pid
Get the dump file, get the mat analysis, in the Dominator_tree view to find more memory-intensive objects, here is a byte array, next find the byte array belongs to which class, on the byte data right-click "Path to GC Roots", " With all reference "is
See org.jboss.netty.handler.codec.frame.framedecoder#cumulation This object is very big, this class is Httprequestdecoder super Class, Next is Debug source code, See why this field is so big
Find all references to this field, and see where to write to a byte array (cumulation This field contains a byte array), mainly the following two methods
// 这个方法只是把原来的cumulation和新增的input复合到一个对象中,CompositeChannelBufferorg.jboss.netty.handler.codec.frame.FrameDecoder#appendToCumulation// 这个方法会对cumulation重新赋值,并把input写入cumulation中,也就是这个时候byte数组会增大org.jboss.netty.handler.codec.frame.FrameDecoder#updateCumulation
The above two methods are called in the Org.jboss.netty.handler.codec.replay.replayingdecoder#messagereceived method, Replayingdecoder is a httprequestdecoder superclass that calls the Messagereceived method when the request is received, so the data is written to the cumulation when the request is received.
So now the problem is basically clear, constantly writing data to a byte array causes the byte array to grow, and the problem should be that the byte array is not emptied, so the question now is
Why didn't cumulation be emptied?
Find out how all cumulation are emptied
org.jboss.netty.handler.codec.replay.ReplayingDecoder#cleanup
Look who called this method.
// 连接断开的时候调用这个方法org.jboss.netty.handler.codec.frame.FrameDecoder#channelDisconnected// 连接关闭的时候调用这个方法org.jboss.netty.handler.codec.frame.FrameDecoder#channelClosed
So the cumulation can only be emptied when the connection is broken or closed, and the conclusion is ready to go back to the beginning of the problem
- Dubbo request, non-HTTP request
- Large number of concurrent requests
When the Dubbo request is received, although not the HTTP protocol, but will still execute Httprequestdecoder this handler, will write to cumulation, but the Dubbo protocol uses a long connection (Netty maintained long connection), So the contents of the Dubbo request will be added to the cumulation until the connection is closed or disconnected before emptying, if the concurrency is large, it will result in multiple cumulation large objects, if the space is not enough time will cause FULLGC.
The HTTP protocol is generally a short connection, or there is a time-out (server timeout or client timeout), so that cumulation will be emptied in time, so the HTTP protocol is not prone to this problem, and the Dubbo protocol is more likely to occur.
Solve the problem
Since non-HTTP requests do not need to use Httprequestdecoder for further processing, a new class Httprequestwrapdecoder inherits from Httprequestdecoder, Override the Messagereceived method, in which the method determines whether the HTTP request, if it is the HTTP request will be processed further, or directly return to do not process, execute pipeline in the next handler
Dubbo after extending the HTTP protocol FULLGC