Sporadic error starting host with RabbitMQ in cluster mode

I’ve encountered a RabbitMQ cluster error with autoheal failing: Redirecting to Google Groups

Based on the comment, I’ve updated to RabbitMQ 3.8.2 Erlang 22.2
Now all 3 nodes are active and running and green in the RabbitMQ management console.
All services on node1 works perfectly, but on node2 and node3 they refuse to start randomly with the errors:

020-02-07 22:06:44.3955 NServiceBus.ReceiveComponent Receiver Main failed to initialize. System.TimeoutException: The operation has timed out.
   at RabbitMQ.Util.BlockingCell.GetValue(TimeSpan timeout)
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.ModelRpc(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Framing.Impl.Model._Private_QueuePurge(String queue, Boolean nowait)
   at NServiceBus.Transport.RabbitMQ.QueuePurger.Purge(String queue)
   at NServiceBus.Transport.RabbitMQ.MessagePump.Init(Func`2 onMessage, Func`2 onError, CriticalError criticalError, PushSettings settings)
   at NServiceBus.TransportReceiver.Init()
   at NServiceBus.ReceiveComponent.<Initialize>d__2.MoveNext()

or this:

2020-02-07 22:24:09.8529 #4 Failed to start. System.TimeoutException: The operation has timed out.
   at RabbitMQ.Util.BlockingCell.GetValue(TimeSpan timeout)
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.QueueDeclare(String queue, Boolean passive, Boolean durable, Boolean exclusive, Boolean autoDelete, IDictionary`2 arguments)
   at RabbitMQ.Client.Impl.ModelBase.QueueDeclare(String queue, Boolean durable, Boolean exclusive, Boolean autoDelete, IDictionary`2 arguments)
   at NServiceBus.Transport.RabbitMQ.DelayInfrastructure.Build(IModel channel)
   at NServiceBus.Transport.RabbitMQ.QueueCreator.CreateQueueIfNecessary(QueueBindings queueBindings, String identity)
   at NServiceBus.InitializableEndpoint.<Initialize>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at NServiceBus.Endpoint.<Start>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at VerMac.JamLogic.Server.TaskManager.Host.<Start>d__8.MoveNext() in C:\TeamCity\buildAgent\work\5b1345fe6501ed74\Sources\JamLogic\Server\TaskManager\Host.cs:line 41

I don’t understand, the service is running on the same computer, why can’t it connect?

I also got a lot of those error in my log files:

2020-02-07 21:59:26.2380 RingBufferExtensions Failed to buffer metrics data for $ProcessingTime after $10 attempts.

And I sometimes have this error in the rabbitmq server logs:
2020-02-07 22:36:37.249 [error] <0.13385.0> Supervisor {<0.13385.0>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.13380.0>, <0.13386.0>, <0.13380.0>, <<"[::1]:50453 -> [::1]:5672">>, rabbit_framing_amqp_0_9_1, {user,<<"guest">>,[administrator],[{rabbit_auth_backend_internal,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.13381.0>, <0.13387.0>) at <0.13388.0> exit with reason killed in context shutdown_error

I also have another service that started successfully with no errors, but does not receive any message and the queue keeps growing. I’m positive the names are set correctly.

It all seems so random, sometimes some start, sometimes they don’t. It seems that if I wait for around 1hour and then try again then sometimes it will start. I can’t understand anything of what’s going on or how to troubleshoot my issues. Please help!