From the above information, you can see the problem, the server and the client connection information is not on, the server has a lot of established connection, in fact, useless. This situation, at first, I was also very strange, did not find the reason, can only view the log.
Through the log information, it was found that an exception occurred, but it is strange that before the exception information, there is an RPC sink {} Closing RPC client: {}
Here DestroyConnection, destroyed a connection, why would you destroy the connection, from the Flume source, flume itself will not appear this low-end bug bar, good, destroy their connection dry what, so from flume own reasons for a few days, Can not find the why.
Finally asked the OPS colleague, there is a firewall time limit, 2 hours
2 hours, connection idle, disconnect
Through the log analysis, found that the basic occurrence of anomalies, the distance from the last send data is more than 2 hours, sure enough the problem here ah.
So we should note that, although this problem, the short time may not affect the flume transmission data, because there is data to find no connection, will automatically create a connection, but if the time is long, the connection will be more and more, the impact on the system performance is larger, so the extension of the firewall time, It's not possible for your app to come back in 24 hours!
"Flume" RPC sink XX closing rpc client:nettyavrorpcclient {xx} ... Failed to send events problem solving