The Erlang version of the company development and test environment is 19.0.3,RABBITMQ version 3.6.10. Stable use for nearly a year under the cluster condition, no problem.
In order to remain consistent with the line, the online production environment uses the same version, and after several months of operation the problem occurs. The phenomenon is as follows:
In a few days, there were three queues with no consumer problems. View the logs for RABBITMQ.
Operation Queue.declare caused a channel exception not_found:failed to perform operation on queue ' problematic queue ' in vhost '/' Due to timeout
According to https://bugzilla.redhat.com/show_bug.cgi?id=1418668, this bug has been fixed in 3.6.3, we still have a problem.
At this time, through the Administration page, unable to delete the queue, error similar, is Queue.delete timeout. Based on https://github.com/rabbitmq/rabbitmq-server/issues/1333 's feedback
by command
Rabbitmqctl eval ' rabbit_amqqueue:internal_delete ({resource,<<"where Vhost">>,queue, <<" problem queue ">>}). '
After you close the app and delete the queue, it's possible to reply.
Finally, continue to drill down into the error log
{gen_server2,call,[<0.26274.8>{init,new},infinity]}}, [{Gen_server2,call,3,[{file,"Src/gen_server2.erl"},{line,327}]}, {Rabbit_channel,handle_method,3, [{file,"Src/rabbit_channel.erl"},{line,1335}]}, {rabbit_channel,handle_cast,2, [{file,"Src/rabbit_channel.erl"},{line,459}]}, {gen_server2,handle_msg,2,[{file,"Src/gen_server2.erl"},{line,1048}]}, {proc_lib,init_p_do_apply,3,[{file,"Proc_lib.erl"},{line,247}]}]}=error report==== 2-may-2018::20:26:08 = = =restarting crashed queue ' problem queues 'inchVhost '/'.
At this point, the suspected Erlang language has a bug.
View RABBITMQ official website.
http://www.rabbitmq.com/which-erlang.html, this page describes the Erlang version required by RABBITMQ and prompts for fixed bugs
The two solutions are obviously our problem. So the Erlang and RABBITMQ are upgraded.
To ensure a smooth upgrade without downtime, we only upgraded Erlang to 19.3.6.8,RABBITMQ upgrade to 3.6.14. Otherwise you need to stop the cluster
The specific upgrade is simple, stop rabbitmq-server service, and then uninstall Erlang, which will also uninstall RABBITMQ. Then install the new Erlang and RABBITMQ, and after starting the service, it's automatically in the cluster.
Remember that Erlang language bug causes RABBITMQ queue to have no consumer problem