Endpoint stops processing messages for no obvious reason

MiBo · August 31, 2021, 8:42am

Hello,

Now and then we experience that some of our endpoints (not a specific endpoint) stops processing messages for no obvious reason (no errors). Today it happened again - after a few weeks of not occuring. When checking the issue, we saw the following:

The endpoint in Kubernetes still has the status “Running”
RabbitMq shows there are NO messages UNACKED, so all are on the queue.
The last log lines (before it stop processing) states it picked up messages - so we would expect it should have UNACKED messages. We log this via a Pipeline behavior (LogIncomingMessageBehavior)

After restarting the pod - it just grabs all the messages from the queue and succesfully processes them.

I have checked the following topics, which look similar, but didn’t provide a solution for us.
Our endpoints run in Kubernetes (linux - DotNetCore 3.1).

We are hoping if someone recognizes this issue or could help us with giving more ideas to check where the issue comes from.

laila.bougria · August 31, 2021, 12:11pm

Hi Michael,

Can you share which version of NServiceBus and NServiceBus.RabbitMQ that you’re using?
If possible, could you verify if there’s anything that stands out in the NServiceBus logs?

You can always choose to open a support case as well.

Kind regards,
Laila

MiBo · August 31, 2021, 1:27pm

Hello Laila,

The versions we use are:

NServiceBus - 7.2.3
NServiceBus.RabbitMQ - 6.0.0

In our NServiceBus logs (we use NLog extension) we got no specific log lines that stands out.

My first thought to open the issue here, to see if there were more people having this issue.
If this isn’t the case - we can move this to a support case. If that is fine?

Regards,

Michael

laila.bougria · September 1, 2021, 7:10am

Hi Michael,

Sure, that’s fine.
Do you have any long-running handlers you’re aware of?

If so, it might be related to some behavior we noticed some time ago, as described here.

Kind regards,
Laila

MiBo · September 1, 2021, 7:21am

Hi Laila,

In these cases there are no long-running handlers (more than 30min) - that I know of. But I will have a look in the RabbitMQ logs, to see if it indicates it maybe did ran into a timeout (like described in the link).

Will let you know once I can find something there.

Thank you for the hint.

MiBo · September 1, 2021, 9:08am

I think I found the cause of the problem:

So looks like it is the timeout after all, so looks like the handler takes longer than we expect - but also the first line “exceeded the limit of 50 messages/sec” is something which we need to check.

laila.bougria · September 2, 2021, 7:37am

Hi Michael,

Thanks for the feedback, that seems to align with the same issue. I will link this discussion to the public issue to connect the dots. Once it’s fixed, our team will let you know. Feel free to subscribe to the GitHub issue to stay in the loop as well.

For the log message “exceeded the limit of 50 messages/sec”: Lager is the logging framework used by RabbitMQ. It looks like the logger is dropping some of its messages, so it might be worth figuring out why that is and what exactly is being logged there.

Laila

MiBo · September 3, 2021, 6:32am

Thank you for the help, Laila!
We will dive into the logging message limit, to see what is going on there.