I started working on implementing a mitigation strategy for the ASB outages and I was looking at what you recommended: FailOverNamespacePartitioning. But while doing so, I found the following open issue:
Support FailOverNamespacePartitioning and SendsAtomicWithReceive together
We are currently using SendsAtomicWithReceive which I believe is a very valuable feature and I don’t think it’s a good idea to lose that one, which has a huge impact, just to mitigate an issue that happens a few minutes every second day.
So, what are my options here?
There are two places where ASB outages affect us:
- Sending messages from a handler context
- Sending messages from web requests
From this scenarios, number 2 is far more important, because it means either that the error ends up in the customer face or that we lose an operation. For example, our current Place Order implementation is half synchronous/half asynchronous. The order is created synchronously in the API process and the OrderPlacedEvent is published. It is now possible that the order is placed, but the event is never published, which has bad consequences. Having 100% (or close) ASB availability is very desirable for messages from web requests.
Regarding messages from handler context, what will happen is that if ASB goes down between peaking the message and sending the messages from the handler, the whole message will be retried when ASB comes back online and it will eventually succeed. Therefore I don’t see this as too critical.
So, I guess the question is, can I use the FailOverNamespacePartitioning on the WebAPI (and other send only) endpoints and not in the backend services to keep the SendsAtomicWithReceive?