With NSB 8 I'm getting ServiceBusException: Cannot allocate more handles. The maximum number of handles is 255

remyvd · December 5, 2022, 5:28am

I’m running NSB 8 in Azure Function v4 host (.NET6 isolated) using Azure Service Bus with the latest NuGet versions of NSB.

I have 3 scheduled functions that run every 15min to send a couple of messages.

After upgrade to NSB 8 on Saturday 1PM local time I’m experiencing strange behavior. Incoming+outgoing is normally below <1k-2k is now growing to 44k within 6h and it crashes with error below and restarts.

Failed to receive a message on pump 'Processor-Main-AV.Spotler-5f94b9c4-64d8-4854-8ba7-5448d81a5210' listening on 'AV.Spotler' connected to 'av-spotlerautomate.servicebus.windows.net' due to 'Receive'. Exception: Azure.Messaging.ServiceBus.ServiceBusException: Cannot allocate more handles. The maximum number of handles is 255. (QuotaExceeded). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
   at Azure.Messaging.ServiceBus.Amqp.AmqpConnectionScope.CreateReceivingLinkAsync(String entityPath, String identifier, AmqpConnection connection, Uri endpoint, TimeSpan timeout, UInt32 prefetchCount, ServiceBusReceiveMode receiveMode, String sessionId, Boolean isSessionReceiver, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpConnectionScope.OpenReceiverLinkAsync(String identifier, String entityPath, TimeSpan timeout, UInt32 prefetchCount, ServiceBusReceiveMode receiveMode, String sessionId, Boolean isSessionReceiver, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.OpenReceiverLinkAsync(TimeSpan timeout, UInt32 prefetchCount, ServiceBusReceiveMode receiveMode, String identifier, CancellationToken cancellationToken)
   at Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1.OnCreateAsync(TimeSpan timeout, CancellationToken cancellationToken)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout, CancellationToken cancellationToken)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.ReceiveMessagesAsyncInternal(Int32 maxMessages, Nullable`1 maxWaitTime, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.ReceiveMessagesAsyncInternal(Int32 maxMessages, Nullable`1 maxWaitTime, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.<>c.<<ReceiveMessagesAsync>b__41_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.ReceiveMessagesAsync(Int32 maxMessages, Nullable`1 maxWaitTime, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ServiceBusReceiver.ReceiveMessagesAsync(Int32 maxMessages, Nullable`1 maxWaitTime, Boolean isProcessor, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ReceiverManager.ReceiveAndProcessMessagesAsync(CancellationToken cancellationToken)


AzureFunctions_FunctionName	CheckForUpdatedContactsFunction

Azure.Messaging.ServiceBus.ServiceBusException:
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<RenewMessageLockInternalAsync>d__56.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<>c+<<RenewMessageLockAsync>b__55_0>d.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy+<RunOperation>d__21`2.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy+<RunOperation>d__21`2.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<RenewMessageLockAsync>d__55.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusReceiver+<RenewMessageLockAsync>d__62.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusReceiver+<RenewMessageLockAsync>d__61.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ReceiverManager+<RenewMessageLockAsync>d__18.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)

This is 7days request and messages where you can see it going up at the end when I upgraded to NSB 8.

Here is the detail from the last 12hours were its buidling up and than comes with that expection.

The number of dependencies seems to be building up. I am also using the new OpenTelemetry in NSB 8.

I’m still looking into the functions that are schedules and see what could case this, but they haven’t been changed for months

Any idea or direction what could be the cause of this?

ramonsmits · December 5, 2022, 10:29am

@remyvd I’ll try to reproduce. Can you share:

Which package versions you are using for Azure.Messaging.ServiceBus?
The transaction mode if not using the default
Something about the workload

remyvd · December 5, 2022, 2:27pm

Azure.Messaging.ServiceBus 7.11.11 (implicitly installed)

Transaction mode is default.

I have an NServiceBus Azure Function Host and added a couple of timer functions that run every 15min and send a CheckUpdate message (to the same endpoint, but not using SendLocal).

CheckUpdate is handled to check for updated contacts. This is usually a very low number and for each Contact a ContactUpdated event is published.

It can happen that there are more then 100 contact updates (I got an error for this), so I have changed the code to use options.RequireImmediateDispatch(). I don’t know if this is even related to the issue, but I will monitor this to see if there is any improvement on the issue.

danielmarbach · December 5, 2022, 10:47pm

Discussing this today with @SzymonPobiega we had a hunch that somehow the endpoint is started multiple times which could also cause the multiple diagnostics entry to occur. We are still puzzled though how that could happen.

Is there any way @remyvd you could try to verify whether the endpoint creating happens multiple times instead of just once like the code assumes and is guarded against with a semaphore?

remyvd · December 6, 2022, 3:06am

I am going to see if I can find out the endpoint creating is happening multiple times.

Yes, all the code paths that could generate a lot of messages are guarded with a semaphore.

The Azure Function (not on a consumer plan) is showing high memory spikes since the upgrade to 8.0. The memory builds up every time until it chreases and restarts.

See also the requests and message on Service Bus the last 12 hours. Something is generating a lot of incoming messages, which didn’t happen before the upgrade to 8.0.

The big change I did when upgrading to 8.0 was moving to your build-in OpenTelemetry. Before I used my own version AutomateValue/NServiceBus.Extensions.Diagnostics.ApplicationInsights that extends NServiceBus.Extensions.Diagnostics.of Jimmy Bogard.

I’m thinking about disabling OpenTelemetry to see if this is causing the issue.

Btw. the code is in a private github repository. Am ok to give you guys read access, if I know how to do that (probably need to add a user licence).

SzymonPobiega · December 6, 2022, 9:01am

Thanks for the details information. It allowed us to progress with the investigations. We failed, however, for now to reproduce it. Could you grant me read access to the code for the scheduled trigger? My github handle is szymonpobiega.

Szymon

remyvd · December 7, 2022, 7:46am

I have given you read acces to the GitHub repo within AutomateValue (github.com). The /src/AV.Spotler.FunctionHost is the Azure function, but the same code can also be run as AV.Spotler.WorkerHost.

Most of the configuration of the endpoint is done by..ConfigureEndpointDefaults() which is located in AV.Endpoint

SzymonPobiega · December 7, 2022, 1:38pm

Thanks! I’ve looked at the code and tried to create a reproduction. So far I was not able to reproduce the problem on my side but we are working on it continuously.

remyvd · December 12, 2022, 8:27am

I investigate more and it seems to be related when there is a ‘high’ number of updated contacts. Normally this is below 100, but I see strange things happening with one that has 670 contacts.

The logic is:

Endpoint1: Scheduled Azure Function send msg CheckForUpdatedContacts every 15min.
Endpoint1: FindOrCheckForUpdatedContactsHandler process the msg
a. calls webservice to find updated contacts
b. publish ContactUpdated for each contact
Endpoint1: Saga SyncContactWorkflow is started by ContactUpdated
a. Send ‘FindCustomerOrLeadRequest’ to Endpoint2
b. Receives ‘FindCustomerOrLeadResponse’ from Endpoint2
c. Send UpdateCustomerOrLeadRequest to Endpoint2
d. Receives ‘UpdateCustomerOrLeadResponse’ from Endpoint2
e. Completes, publish SyncContactCompleted

Big Dependency tree
I see a big dependency tree being build up in Application Insights because of the publish event in step 2, that is generating for each event a lot of traces from step 3:

Each publish event in step 2 should start a new conversation, but I now have this:

        var options = new PublishOptions();
        // options.StartNewConversation(); // doesn't work with publish
        options.SetHeader(Headers.CorrelationId, Guid.NewGuid().ToString());
        options.RequireImmediateDispatch();

I should have added options.SetHeader(Headers.ConversationId, Guid.NewGuid().ToString()); but looking at the tracing in Application Insights I don’t see the CorrelationId changing for each publish event. They are all the same. How do I start a new conversation for published events?

ServiceBusException
Then I see the duration to publish each event of step 2b building up from 300-400ms:

To 4-7sec:

Then after x publish events, I see the following error from ServiceBus being generated:

Azure.Messaging.ServiceBus.ServiceBusException: The operation was canceled. (ServiceCommunicationProblem). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
 ---> System.OperationCanceledException: The operation was canceled.
   at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at Microsoft.Azure.Amqp.SendingAmqpLink.<>c.<SendMessageAsync>b__12_1(IAsyncResult r)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(AmqpMessage batchMessage, TimeSpan timeout, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(AmqpMessage batchMessage, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(IReadOnlyCollection`1 messages, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.<>c.<<SendBatchAsync>b__21_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<>c__22`1.<<RunOperation>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchAsync(ServiceBusMessageBatch messageBatch, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ServiceBusSender.SendMessagesAsync(ServiceBusMessageBatch messageBatch, CancellationToken cancellationToken)
   at NServiceBus.Transport.AzureServiceBus.MessageDispatcher.DispatchBatchForDestination(String destination, ServiceBusClient client, Transaction transaction, Queue`1 messagesToSend, CancellationToken cancellationToken) in /_/src/Transport/Sending/MessageDispatcher.cs:line 189
   at NServiceBus.Transport.AzureServiceBus.MessageDispatcher.Dispatch(TransportOperations outgoingMessages, TransportTransaction transaction, CancellationToken cancellationToken) in /_/src/Transport/Sending/MessageDispatcher.cs:line 92
   at NServiceBus.TransportReceiveToPhysicalMessageConnector.Invoke(ITransportReceiveContext context, Func`2 next) in /_/src/NServiceBus.Core/Pipeline/Incoming/TransportReceiveToPhysicalMessageConnector.cs:line 68
   at NServiceBus.RetryAcknowledgementBehavior.Invoke(ITransportReceiveContext context, Func`2 next) in /_/src/NServiceBus.Core/ServicePlatform/Retries/RetryAcknowledgementBehavior.cs:line 46
   at NServiceBus.TracingExtensions.<>c__DisplayClass0_0`1.<<Invoke>g__TracePipelineStatus|0>d.MoveNext() in /_/src/NServiceBus.Core/OpenTelemetry/Tracing/TracingExtensions.cs:line 28

Then at the end of the telemetry is see the exception:

Azure.Messaging.ServiceBus.ServiceBusException:
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<RenewMessageLockInternalAsync>d__56.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<>c+<<RenewMessageLockAsync>b__55_0>d.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy+<RunOperation>d__21`2.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy+<RunOperation>d__21`2.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<RenewMessageLockAsync>d__55.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusReceiver+<RenewMessageLockAsync>d__62.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ServiceBusReceiver+<RenewMessageLockAsync>d__61.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.ReceiverManager+<RenewMessageLockAsync>d__18.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)

This customer with 670 updated contacts every x hours is a new customer around the same time we deployed NSB 8 so I can’t exclude 100% that this wouldn’t happen with NSB 7. But these numbers also happend before NSB 8 and that wasn’t an issue (at least not noticeable).

I got a feeling it is related to the telemetry, but that is just a guess. I didn’t disable this yet.

Can we improve this process or are there things we do wrong, that we can quickly fix?

remyvd · December 13, 2022, 1:04am

Ok, setting the ConversationId directly using the headers is not allowed . See the error below.

But there isn’t a StartNewConversation() on PublishOptions. Is there a reason for this?

System.Exception: Cannot set the NServiceBus.ConversationId header to '76ac9948-b68a-426e-ba6e-469a1bb5f80d' as it cannot override the incoming header value ('17ddb68f-d00d-4cfa-97df-af6a000844da'). To start a new conversation use sendOptions.StartNewConversation().
   at NServiceBus.AttachCausationHeadersBehavior.SetConversationIdHeader(IOutgoingLogicalMessageContext context, IncomingMessage incomingMessage) in /_/src/NServiceBus.Core/Causation/AttachCausationHeadersBehavior.cs:line 60
   at NServiceBus.AttachCausationHeadersBehavior.Invoke(IOutgoingLogicalMessageContext context, Func`2 next) in /_/src/NServiceBus.Core/Causation/AttachCausationHeadersBehavior.cs:line 23

remyvd · December 21, 2022, 5:50am

@SzymonPobiega @danielmarbach I change a lot in our code to see if I could fix it. We disabled the OpenTelemetry. We made things simpler (get rid of a saga), solved some exceptions on our end. I also increased our hardware for the moment, so we have higher costs. But the issue still remaining.

We noticed that azure service bus connections are building up and that after an hour azure service bus starts to throttle requests. Which is very strange because we are normally very low on the # of request.

I think I have pinpointed the issue to the scheduled Azure function that we have. When we disable these functions (but the main generated NServiceBus function is still enabled), then we see our environment return to normal behavior.

Our Azure Function memory is not spiking since we disabled it Dec20 around 10am. See image:

Our ServiceBus requests/messages are normal and don’t show anymore the built up.

I am still investigating our scheduled Azure functions, mostly CheckForUpdatedContactsFunction & CheckForBouncedContactsFunction, but the code itself looks fine to me.

SzymonPobiega · December 21, 2022, 8:14am

Hi

Thanks for the updates. So far we were not able to reproduce it but we have some follow up question that can direct our efforts better:

When you disable the timer trigger, does it mean that the messages stop flowing? We are especially interested if CheckForUpdatedContacts handler gets executed when this trigger is disabled
Do you have the timer trigger logs stored? We are interested in how the NullReferenceException you mentioned in the other thread look in the log files. We are also interested if there are exception that would indicated failed endpoint startup attempts
Can you also share (over a support case) the NServiceBus-specific logs
What is the scalability policy for the function app?

Thanks,
Szymon

SzymonPobiega · December 21, 2022, 8:16am

Also, would it be possible to also give read access rights to the code to my colleagues Mike and Andreas?

remyvd · December 23, 2022, 6:44am

When you disable the timer trigger, does it mean that the messages stop flowing? We are especially interested if CheckForUpdatedContacts handler gets executed when this trigger is disabled

Yes, the number of requests/connections go down and no more throttling. CheckForUpdatedContacts handler is not being executed any more.

If we sent a CheckForUpdatedContacts event through another way, it seems to work ok.

Keep in mind that both schedule Azure Function and the handler itself are running on the same endpoint. Don’t know if this could be the issue that is running in the same Scope/Transaction? My assumption is no.

p.s. we saw some issues with some customers doing an initial load of their customers, where the number of updated contacts would raise above 50k. The CheckForUpdatedContacts handler was sending >50k events and would run out of the 10min duration of the function and the locking (5min) of the message Azure Service Bus. We are sending log warnings when above 10k updated contacts and will stop processing when 25k customers is reached so that the handler can finish ok (instead of retrying the message)

===
Do you have the timer trigger logs stored? We are interested in how the NullReferenceException you mentioned in the other thread look in the log files. We are also interested if there are exception that would indicated failed endpoint startup attempts

I have added them to this post.
function_logs.zip (12.3 KB)

What I don’t understand that in this timer log, it is also showing logging from the CheckForUpdatedContacts handler. It looks it is running inside the scheduled function, in the same process? Should I run the scheduled azure functions in a separate endpoint?

What I see after the NullReferenceException in the RcpException is also an ServiceBusException. But keep in mind, I’m also see this when I have disabled the scheduled functions that run daily or shorter.

This ServiceBusException is:

The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue. For more information please see https://aka.ms/ServiceBusExceptions . Reference:9a874b69-5f77-446f-beeb-ade92fee53d3, TrackingId:f87b59da-90b3-45d5-bdc3-9f976aa72011_B8, SystemTracker:av-spotlerautomate:Queue:av.spotler, Timestamp:2022-12-23T06:07:11 (MessageLockLost)

===
Can you also share (over a support case) the NServiceBus-specific logs
Yes, I will try to do that.

===
What is the scalability policy for the function app?
I have 2 endpoints, 1 azure function app and 1 web app, running on the same App Plan. Default to run 1 instance, but will scale up to 3 instances with a lot of messages. But already with one instance I see this behavior.

p.s. We have a staging slot that is stopped. It is only started on deployment with all functions disabled. We then deploy to this and then execute a slot swap. After this we stop the staging slot again.

==
Also, would it be possible to also give read access rights to the code to my colleagues Mike and Andreas?

No sorry, that would cost me 2 additional licences.

SzymonPobiega · December 29, 2022, 9:51am

Hi

I think you lost me here. You said this

and then

My understanding was that there is a single function app that has the timer triggers and also processes the messages that are generated by these triggers and two non-function endpoints that are processing other messages. These workers are doing fine, right?

SzymonPobiega · December 29, 2022, 10:11am

Sorry, I think I now understand what you meant. There are two separate log files because each trigger has its own log file and you were wondering why the entries related to the handler are in the log files that should only log the executions of the timer triggers.

I am pretty sure this is the reason:

FunctionsLoggerFactory.Instance.SetCurrentLogger(functionContext.GetLogger("NServiceBus"));

This code is executed both in the context of handling a message and in the context of sending a message so what is happening is the triggers “fight” for NSB logging, resulting in the log statements being scattered between the two files.

We are not looking at the logs side-by-side. It looks like the NREs are coming from the handler and we suspect that they might be related with the parallel sends in the handler.

SzymonPobiega · December 30, 2022, 12:14pm

Hi!

We have another question. We were looking at this fragment of the logs:

2022-12-23T06:10:01.428 [Warning] A duplicate diagnostic entry was renamed from Azure Service Bus transport to Azure Service Bus transport-20.
2022-12-23T06:10:01.551 [Information] Executed 'Functions.CheckForUpdatedContactsFunction' (Succeeded, Id=e37a2cb2-b839-49e8-acf0-a8fd11774e12, Duration=1560ms)
2022-12-23T06:40:00.027 [Error] Timeout value of 00:10:00 exceeded by function 'Functions.CheckForUpdatedContactsFunction' (Id: '599ee642-96ff-4b58-b2e0-a6e8bedfe8d2'). Initiating cancellation.

The last line hints at a timer trigger execution that started around 6:30 but there is no indication of it in either of the log files. Could there be another log file missing?

remyvd · January 5, 2023, 4:18am

@SzymonPobiega Sorry for the late response. Holiday time

Yes, that’s what I meant about the logging. I saw messages in the function log, logging from other functions (but smae function app). I was confused by that.

I am not sure this is connected. The strange thing is the handlers don’t do a lot most of the time. They are checking for updates, but most of the time there are no updates. When I turn on the timer functions in the middle of the night (=zero updates from customers), the build up from open connections still happens. Once I disable a timer functions, it drops all the connections and every returns to normal.

BTW What I see happen in the Azure Functions is sometimes this error (after a couple of RcpExeptions) which I don’t understand why this is happening. It also happens when the timer functions are off.

Azure.Messaging.ServiceBus.ServiceBusException:
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<RenewMessageLockInternalAsync>d__56.MoveNext (Azure.Messaging.ServiceBus, Version=7.8.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver+<>.......

"The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue. For more information please see Azure Service Bus - messaging exceptions (deprecated) - Azure Service Bus | Microsoft Learn . Reference:5500395d-88b7-4409-b3f1-9b9db1baa4d0, TrackingId:bf88ad02-dabe-457b-ae15-3db1f133ab54_B8, SystemTracker:av-spotlerautomate:Queue:av.spotler, Timestamp:2023-01-05T03:49:51 (MessageLockLost) "

I created a support ticket just now, with all the function logs of today added.

SzymonPobiega · January 5, 2023, 12:17pm

Hi

We’ve been able to identify a likely cause of the issue. In line 57 in the ApplyDefaultConfiguration method there is a call to UseTransport. While it looks innocent, it seems to cause the problem as I was able to reproduce it using our minimal functions sample

The root cause is the fact that in order for the functions to work, the functions setup replaces the transport you configure with a shim serverless transport that does not pump messages (like regular transports do) but offers API so that we can pass the message that we get from the functions host via the trigger.

At the point of time when line 57 is executed, the transport of NServiceBus is already set to that shim and line 57 replaces it with a regular transport. This causes the shim to be not started so the messages passed from the host are not being processed (resulting in lock renewal problems) while the process pumps messages like a regular (non-serverless) endpoint.

We are fairly sure that removing the code in line 57 should fix the issue. We are also actively working on reviewing the APIs of our serverless packages (both for Functions and for AWS) to remove glitches such us this one.

Sorry for the problems and thank you for your patience and all the information you shared with us. Without it we would not be able to identify the issue.

Sz

remyvd · January 6, 2023, 5:20am

That is great news to hear! Thanks for finding the possible cause.

I have implemented the workaround (not calling UseTransport in an Azure Function) and deployed this. I enabled the timer functions again, and I will monitor this. I will let you know if it was effective.

============
Update 2h later => So far, so good. No building up connections and very stable!