Timeout Errors Occasionally When Sending Using Azure Service Bus Transport

pstewart · May 11, 2020, 6:47pm

We are currently using:
NServiceBus 7.2.3
NServiceBus.Transport.AzureServiceBus 1.4.0

And are occasionally seeing errors that look like:

ServiceBusTimeoutException: The operation did not complete within the allocated time 00:00:59.9999102 for object message

Microsoft.Azure.ServiceBus.Core.MessageSender+d__58.MoveNext():884
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw():17
Microsoft.Azure.ServiceBus.RetryPolicy+d__19.MoveNext():449
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw():17
Microsoft.Azure.ServiceBus.RetryPolicy+d__19.MoveNext():701
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw():17
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task):46

Microsoft.Azure.ServiceBus.Core.MessageSender+d__45.MoveNext():591
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw():17
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task):46

NServiceBus.SerializeMessageConnector+d__1.MoveNext():565
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw():17
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task):46

PingPost.Apps.PingApp+d__14.MoveNext() in d:\build-dir\PING-PROD-JOB1\PingPost\PingPost\Apps\PingApp.cs:92

In this instance we have a web application and a class called “PingApp.cs” which after it is finished processing a call via http publishes a message based on the result:

            await _messageSession.Publish(new PingFailed
            { 
                // object initiated here
            });

We’ve run into this problem several times in a few apps and are wondering what to do about it. It happens very rarely but any time it does there is often missing data. I believe errors like these are considered transient and are chocked up to transport issues. At one point I added in custom retry logic but within that one message session it was still unable to send it, even after retrying for a few minutes.

A lot of times we are taking HTTP requests from third-parties and need to process things then and there in the scope of the request and use NSB to handle any outcomes on our end.

I noticed in the latest library for NServiceBus.Transport.AzureServiceBus (1.5.0) there is the ability to add custom retry logic. Should I be attempting to use this? Is that retry logic on failed sends to azure or custom retry logic for message processing? The docs aren’t really up to date so I’m not what is advised for these kinds of problems.

Thanks,
Phill

SeanFeldman · May 11, 2020, 9:04pm

Hi Phill,

From Microsoft documentation it’s not very clear wherever retrying on ServiceBusTimeoutException will or will not help.

Retry might help in some cases; add retry logic to code.

As the send operation is taking place in the user code outside of the context of an incoming message, retries would be ASB SDK retries provided by the default RetryPolicy shipped with the SDK. The timeout of 60 seconds is the operation timeout, where the SDK could not get to complete everything that is required to send a message to the broker. When that happens, do you get a server error with a TrackingId? If you do, opening a support case with Azure could provide some information about the nature of the error or what caused it.

At one point I added in custom retry logic but within that one message session it was still unable to send it, even after retrying for a few minutes.

Was that a custom retry policy or was it a retry mechanism around _messageSession.Publish(...)?

A few more questions, if you don’t mind:

Is your web application running in Azure data centre or on-premises?
When a timeout exception is logged, does the web app recover or does every subsequent message is failing?
What ASB tier are you using and in what region?

I noticed in the latest library for NServiceBus.Transport.AzureServiceBus (1.5.0) there is the ability to add custom retry logic. Should I be attempting to use this? Is that retry logic on failed sends to azure or custom retry logic for message processing? The docs aren’t really up to date so I’m not what is advised for these kinds of problems.

As documented here, you should override the default RetryPolicy if the defaults are not working for you.

pstewart · May 12, 2020, 2:41pm

Hi Sean,

Thanks for your reply.

When that happens, do you get a server error with a TrackingId?

From what I can tell I’m not getting a server error with a tracking ID. Right now the error is getting caught by raygun, which we have logging all uncaught exceptions so that’s all I have to go on right now.

Was that a custom retry policy or was it a retry mechanism around _messageSession.Publish(...) ?

It was a retry mechanism around _messageSession.Publish(...)

Is your web application running in Azure data centre or on-premises?

Our web apps are hosted in AWS. We just use Azure Service Bus for our transport with NSB.

When a timeout exception is logged, does the web app recover or does every subsequent message is failing?

It only fails for that ONE request. The sender generally time outs as they do not expect our operation to take more than a few seconds.

What ASB tier are you using and in what region?

Standard Tier - East US

Thanks,
Phill

SeanFeldman · May 13, 2020, 9:07pm

I would still advise opening a support case with Azure to see what was happening with the namespace. Microsoft owns the infrastructure and they’d be able to at least suggest something. If you don’t get much traction, you can always raise a support case with Particular and provide this thread as a context.