From the original author's side of the understanding that the overall can, but do not 99.99% stable. The primary connection memory footprint is not protected.
PubSub is stable when it is balanced, but it crashes when a cluster or a large number of messages are released to a few subscriptions, a small probability situation.
The CPU in the EMQ is fairly assigned to an MQTT session, a large number of pub messages to a subscription, the subscription does not get more CPUs, resulting in message accumulation and memory overflow downtime.
Crashes can occur in network fluctuations, a large number of messages to a small number of subscriptions released, insufficient capacity, cluster brain fissure, the occurrence of abnormal subscription release and so on.
Therefore, the client should do a good job of connecting back off, that is, connection avoidance to prevent connection storms. That is, the server crashes, restart, to prevent the massive client access at the same time.
EMQ---v2.3.11 source maturity