Is there a way to use retry policies, but bypass error queue behavior?

I’m considering a full-duplex scenario where we use async request-response messaging. The client sends a message asking for some work to be done, and we 1) use NServiceBus retry policies to try multiple times if it fails, but 2) if it exhausts the retries, respond with a message indicating failure to the client, instead of dropping the message on the error queue.

There may be some categories of exceptions where we would use the error queue, but is there a way for the handler to know that it’s processing a message that has already been tried x times? This sample seems to do something similar, but relies on an in-memory dictionary to track the retries. I’d prefer to do something like add the retry count to a custom header or something. How might that work?

Or is there a solution I’m not thinking of?

You could also use immediate retries but immediate retries rely on local state. If your endpoint is scaled out the immediate retry count can be unreliable.

I advise to check for the header values af the start of the handler before calling any api’s or data stores to make sure consistency isn’t affected and then Reply with your response indicating failure. The reply will then result in the original message to be send to the audit queue as "succesfully processed. I personally don’t really like that approach.

An alternative would be to have a compensating action at the caller like “if we don’t get a response back within 1 minute, we assume failure and do X”. It would be very advisable to use such compensating actions anyway because what if the other system is down or just stops to exist?

Last, why do you do request/response over messaging in the first place? If this is a query like behavior I would consider to have the client querying the storage directly.

Can you elaborate more on the type of request/response and the coupling request sender and processor?

– Ramon

You could also use immediate retries but immediate retries rely on local state. If your endpoint is scaled out the immediate retry count can be unreliable.

Yes, that’s the problem I’m trying to address. One solution would be to keep that state in the message itself…trying to figure out how that might work.

Request/response is messaging, but no, it doesn’t fall into the commands vs. events hierarchy that NSB is best for. This is a bit of a niche solution in our code as well, but it isn’t a query.

If you’re curious, it’s intended as something like a splitter pattern, where we have a large message that our endpoint decomposes and “orchestrates” the processing of each piece, then returning a response indicating success or failure (perhaps partial failure) to the client. Long-term a saga might work best. We are still early in our journey with NSB, so that is part of this as well.

You can’t, with immediate retries the message is not removed/acknowledged from the messaging transports. Meaning the message cannot be altered which is the reason this state is tracked in memory.

If it is not a query, then why do you want to return a response upon failure? If there is not a user actually awaiting the response.

What kind of response is it? Is it a technical transient issue of a functional error? If it is the last, then in most scenarios these should not be using the regular recovery model. \

– Ramon

This definitely sounds like the sweet spot for sagas.

– Udi