Missing Queue Retry Behavior?

Phil_Sandler · March 31, 2021, 10:21pm

Hello All,

We are using:
• NServiceBus 7.1.6
• NServiceBus.SqlServer 4.1

In an effort to clean up data for an old, non-existent endpoint, I foolishly deleted some queues (tables) manually without removing their subscriptions from the publisher. We’ve now gotten everything cleaned up, so no real harm except to my ego.

That said, I want to better understand the behavior that occurred when we saw these in the logs:

NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘xxxxxxxxxxxxxxxxxxxxxxxxxx’ because of an exception:
NServiceBus.Unicast.Queuing.QueueNotFoundException: Failed to send message to [QUEUENAME] —> System.Data.SqlClient.SqlException: Invalid object name ‘[QUEUENAME]’.

This was a published event, with multiple subscribers–only one of the queues was missing. Would the “valid” subscribers have received the message multiple times? In other words, would the retry attempt to republish the message again to all subscribers, or just to the subscriber whose queue was invalid?

Thanks,

Phil

andreasohlund · April 1, 2021, 6:01am

All transports make sure that all subscriber gets a copy of the published event in their input queue so in this case this only affected the subscriber whose queue you deleted.

This important detail is buried in Publish-Subscribe • NServiceBus • Particular Docs and Publish-Subscribe • NServiceBus • Particular Docs, I’ll try to make it more obvious.

Does this help?

andreasohlund · April 1, 2021, 6:04am

I’ve raised Clarify that events are copied into each subscribers input queue by andreasohlund · Pull Request #5290 · Particular/docs.particular.net · GitHub to make this more clear

andreasohlund · April 1, 2021, 8:04am

Just to clarify, messages already published would be lost from the queue that you deleted but since SQLT does the publishing in a transaction the publish operation would fail until you remove the subscription for the removed endpoint from the subscription table, see SQL Server Native Publish Subscribe • SQL Server Transport • Particular Docs

Phil_Sandler · April 1, 2021, 3:08pm

Hey Andreas,

So to summarize: a publisher identifies its subscribers, and then independently tries to put a copy of the event into each input queue, and if any of those discrete attempts fail, it begins retrying that discrete attempt completely isolated from the logical publish. So . . . none of the subscribers should receive a second copy of the event in their input queue when one input queue is missing or unavailable. Is that correct?

The outcome we saw seemed to indicate that a subscriber received multiple copies of the same event. However, that was not based on any direct observation of the queues or logs; it was based on unexplained (business) data, and may have had nothing to do with NSB.

One additional item I wanted to note, since you touched on it in your latest reply: we are currently using Backward Compatibility subscriptions: SQL Server Transport Upgrade Version 4 to 5 • SQL Server Transport • Particular Docs. So each of our publishers has its own subscription table. I don’t know if that would have any impact one way or another.

Again, the problem has been mitigated, so it’s just a matter of trying to make sense of what happened.

Thanks for your help!

Phil

andreasohlund · April 2, 2021, 5:57am

So to summarize: a publisher identifies its subscribers, and then independently tries to put a copy of the event into each input queue, and if any of those discrete attempts fail, it begins retrying that discrete attempt completely isolated from the logical publish.

I would rephrase it slightly to:

a publisher identifies its subscribers and then tries to put a copy of the event into each input queue. This happens in a transaction so if any of the queues are missing the entire operation will rollback and the publish operation will throw. Once the operation succeeds as a whole each subscriber endpoint will independently process its copy of the event, retry in case of failure etc.

So . . . none of the subscribers should receive a second copy of the event in their input queue when one input queue is missing or unavailable. Is that correct?

Based on the above that is correct since the publish operation is atomic so either all subscribers get the event or none of them in the case of a failure.

One additional item I wanted to note, since you touched on it in your latest reply: we are currently using Backward Compatibility subscriptions: SQL Server Transport Upgrade Version 4 to 5 • SQL Server Transport • Particular Docs. So each of our publishers has its own subscription table. I don’t know if that would have any impact one way or another.

That should not affect this since it doesn’t matter if the publisher gets the list of subscribers from its own table or a shared one.

Hope this helps!

Phil_Sandler · April 5, 2021, 4:50pm

I think that answers everything.

Thanks for your detailed replies!

Phil