A solution using NServiceBus7 and NServiceBus.RabbitMQ is working perfectly in a single host system.
We developed some logic to have redundancy servers to have high availability (all on the same local network). RabbitMQ is set up in cluster mode with queues replicated using HA policies. Every NServiceBus services connect to the localhost RabbitMQ node living on their own host.
Up to now tests are working fine from a logical and functional point of view, but there’s some strange sporadic behavior when starting multiple services at the same time, as if RabbitMQ is not always responding (even though its uptime remains and the management webpage say all nodes are green).
This happens on a await Endpoint.Start(configuration);
command:
Error 1 (timeout which shouldn’t happen because the service is alive):
2020-01-23 13:36:41.8380 #5 Failed to start. System.TimeoutException: The operation has timed out.
at RabbitMQ.Util.BlockingCell.GetValue(TimeSpan timeout)
at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
at RabbitMQ.Client.Impl.ModelBase.QueueDeclare(String queue, Boolean passive, Boolean durable, Boolean exclusive, Boolean autoDelete, IDictionary2 arguments) at RabbitMQ.Client.Impl.ModelBase.QueueDeclare(String queue, Boolean durable, Boolean exclusive, Boolean autoDelete, IDictionary
2 arguments)
at NServiceBus.Transport.RabbitMQ.ConventionalRoutingTopology.Initialize(IModel channel, IEnumerable1 receivingAddresses, IEnumerable
1 sendingAddresses)
at NServiceBus.Transport.RabbitMQ.QueueCreator.CreateQueueIfNecessary(QueueBindings queueBindings, String identity)
at NServiceBus.InitializableEndpoint.d__1.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NServiceBus.Endpoint.d__1.MoveNext()
— End of stack trace from previous location where exception was thrown —
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Server.TaskManager.Host.d__8.MoveNext() in C:\TeamCity\buildAgent\work\5b1345fe6501ed74\Sources\JamLogic\Server\TaskManager\Host.cs:line 60
Error2:
2020-01-23 18:17:49.6237 Server.ProgramScheduler.Host Failed to start. System.Exception: Channel has been closed: AMQP close-reason, initiated by Application, code=200, text=“Connection close forced”, classId=0, methodId=0, cause=
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at NServiceBus.SerializeMessageConnector.d__1.MoveNext()
Both seems to be errors coming from RabbitMQ and not NServiceBus directly, but from my setup I would expect RabbitMQ to be available at this point so I wonder why it returns these errors. Those errors also happens on a single service out of ten randomly when they all start simultaneously. And of course, just retrying a start again and it works.
I’m just wondering if there’s a way to solidify the connection to make sure they always starts correctly, and if any of you have an idea why RabbitMQ throw these errors.