Endpoint message processing slows down over time

Brenton_Pugh · June 2, 2022, 2:07pm

Greetings,

We have an endpoint (console application) that is running as a windows service and is using MSMQ. This is occurring in both my local (dev) environment and on a UA server when deployed. The endpoint is processing messages more slowly over time. When the endpoint is first started, we place have about 9000+ messages in the queue and it begins processing them quickly. At about the 700-1000 messages processed mark it really begins to slow down. This is causing transaction timeouts and other exceptions. If the services is restarted, it speeds back up again.

CPU usage is 50% when the endpoint is running as well as memory consumption seems to just rise as it starts at about 150 MB and then once everything slows down it’s at about 500 MB.

Raising the transaction scope timeout allows the messages to be processed without timing out, as a work around, but after about 8 hours it was only able to process roughly 2500 messages. Is there anything specific that would cause an endpoint to process messages more slowly over time?

We are using:
NServiceBus version 7.4.7.
.NET Framework 4.7.2 Console application, running as a windows service.

ramonsmits · June 3, 2022, 10:16am

Message processing shouldn’t slowdown. This often happens due to a resource not being freed

2,500 msg in 8 hours is on average 1 message every 10 seconds which shouldn’t be causing any throughput issues for NServiceBus.

What persistence are you using?
Are you using sagas?
What kind of workload is this primarily doing?
If doing any storage IO does monitoring that indicate that maybe not all connections are released over time?
Do you have any custom storage connection handling? If so, validate if this is release connections correctly in happy and error flows.
Maybe there is a resource/memory leak. You could validate with a tool like dotMemory if there are any resource that aren’t released correctly.
Do any handlers invoke code that (in)directly locks in memory data like a cache? Maybe this resource is growing significantly that this is causing a slowdown.

What you could try is limit the maximum concurrency and validate if that make processing more stable. Maybe the default is flooding your storage:

Tuning message throughput performance and concurrency • NServiceBus • Particular Docs

– Ramon

Brenton_Pugh · June 3, 2022, 1:07pm

Ramon,

Thanks for the response!

I’ll first say, we have recently move from using NServiceBus.Host.exe as a service to a console application as a service. We upgraded NServiceBus from 2.x to 7.x, so we are seeing very different results. Normally, when running on the old version through Host.exe, the throughput was much faster and seemed to not use as many resources overtime, which leads me to believe it could be how we have set up our IoC Container (Autofac) to work with the new version of NServiceBus.

I am using MSMQ for persistence.
No.
It is performing quite a bit of work. It’s only 1 handler, but it performs many database calls + calls domain event logic throughout.
It appears the transactions scopes set up by NServiceBus are being closed properly, although I am not well versed in debugging these operations. We are using SQL Server 2016.
We use a custom UnitOfWork that calls an overridden Dispose() method which just calls SaveChanges() on the Dependency Injected Database context. I’m not sure if this is getting disposed properly, since the contexts could be different inside of the domain events. I’m sorry for my lack of knowledge surrounding Autofac and lifetimescoping. I was handed this legacy project to upgrade so am trying to learn as I go.
A memory leak is possible, I’ll try and monitor it through a tool as you’ve mentioned. Through the Diagnostic Tool in VS I can see the memory usage growing overtime. Over the 8+ hours it has ran last night, it grew from 80 MB of memory to around 470 MB.
Not to my knowledge.

We do set the max concurrency manually to 1 to ensure all DB connections are freed for the next Handler call to use.

Brenton_Pugh · June 3, 2022, 3:52pm

Ramon,

We discovered the issue. There was a problem with how we were resolving some of the DbContexts from our Autofac container that weren’t being disposed properly, along with some other objects. Did a little refactoring and made some changes to the life time scoping of the objects. The messages are being processed much more quickly and the memory leak seems to have disappeared!

Sorry this turned out to be more of an Autofac/memory leak issue! Thanks for the advice though, as dotMemory did help us identify the leak!