Alternative to FailOverNamespacePartitioning when using SendsAtomicWithReceive

fcastells · May 11, 2018, 11:49am

I started working on implementing a mitigation strategy for the ASB outages and I was looking at what you recommended: FailOverNamespacePartitioning. But while doing so, I found the following open issue:
Support FailOverNamespacePartitioning and SendsAtomicWithReceive together

We are currently using SendsAtomicWithReceive which I believe is a very valuable feature and I don’t think it’s a good idea to lose that one, which has a huge impact, just to mitigate an issue that happens a few minutes every second day.

So, what are my options here?

There are two places where ASB outages affect us:

Sending messages from a handler context
Sending messages from web requests

From this scenarios, number 2 is far more important, because it means either that the error ends up in the customer face or that we lose an operation. For example, our current Place Order implementation is half synchronous/half asynchronous. The order is created synchronously in the API process and the OrderPlacedEvent is published. It is now possible that the order is placed, but the event is never published, which has bad consequences. Having 100% (or close) ASB availability is very desirable for messages from web requests.

Regarding messages from handler context, what will happen is that if ASB goes down between peaking the message and sending the messages from the handler, the whole message will be retried when ASB comes back online and it will eventually succeed. Therefore I don’t see this as too critical.

So, I guess the question is, can I use the FailOverNamespacePartitioning on the WebAPI (and other send only) endpoints and not in the backend services to keep the SendsAtomicWithReceive?

SeanFeldman · May 11, 2018, 3:28pm

Hi Fransesc,

You’re bringing a good point of reliable message dispatching outside of message context. Prior to going into discussion, lets review the differences between the two options you’ve listed.

Sending from within a handler context

This is happening also in the context of incoming message. When combined with ASB’s SendsAtomicWithReceive transport transaction mode, as you’ve pointed out correctly, there’s a transport transaction that spans incoming and outgoing messages. Therefore it’s reliably going to be retried whenever anything fails. There’s a caveat though. When FailOverNamespacePartitioning is used and the primary (receiving) namespace is failing, there’s no way to guaratee that transaction as Azure Service Bus service does not support cross-namespace transactions (not at the moment this post is written). In such case, all we got is to rely on the retry logic built into ASB client. If that fails as well, the incoming message will be reverted, but there will be a chance for ghost messages. That’s why having idempotent handlers is always a good idea.

Sending from outside a handler context

Messages sent out will still go through the same dispatcher as in the first scenario, but outside of a handler context. Therefore ,SendsAtomicWithReceive cannot be applied as there’s no what Azure Service Bus is calling a “transfer queue” (a receiving queue that is also used to dispatch outgoing messages). It leaves you with the built into ASB client retries and additional retry logic ASB transport is implementing to handle throttling. If a call is failing, you’re in charge of deciding what to do.

Yes. It shouldn’t result in a lost operation. Instead, I would try to present an error to user with options to retry. In the real world failures happen. Not always a phone company can connect you through when you dial. As a phone user, you’re facing that error and retry. When you’re trying to load a page and there’s a problem, the browser shows you an error and you can retry. What I’m trying to say is that this is not a problem that can be solved in a generic way to be applied to all use cases. It really depends on the details of your scenario. We could store the outgoing operations on a disk and in case of failure read those and retry. Though is disk going to be there? If you’re on a Azure Cloud Service or Service Fabric, the disc might be gone. You could store to the database. Databases could become unavailable. Could save to the browser web store. But that’s more of a UI/client side concern each developer chooses how to handle.

Have I answered your question? Wait, what was it

You can use the strategy to fail over to another namespace, but you’ll have to handle send/publish operation failure gracefully for the reasons described above. You will not be able to use SendsAtomicWithReceive with send-only endpoints as there will be no incoming messages.

fcastells · May 11, 2018, 4:21pm

Hi Sean,
yes, I think the 2 scenarios are clear and we were discussing internally how to make our web api operations idempotent, because now if we place an order and fail sending a message, the order gets “stuck” because side effects don’t happen. Rather, I would:
1.- try a backup namespace
2.- fail the whole request and not place the order.

Regarding your comment:
“It shouldn’t result in a lost operation. Instead, I would try to present an error to user”

Well, that’s a UX decision that depends on the scenario. In our case, when the user submits a rating, we don’t want to bother her if it fails, but also, we want to avoid/reduce failures because we are very interested in the ratings.

So, going to your answer to my specific question:

I don’t understand what I should do. I know SendsAtomicWithReceive don’t make sense outside the handler context, but currently the endpoint configuration is the same for both types of endpoints. I can easily change that and this will allow me to configure the namespace partitioning strategy in the send only endpoints. But should I not configure the same strategy in the receiving endpoints so that they can consume the messages from the fallback namespace? or how can I consume those messages without this strategy?
Also, what does it mean “handle send/publish operation failure gracefully”? isn’t this was the FailOverNamespacePartitioning does for me?

SeanFeldman · May 11, 2018, 4:48pm

When FailOverNamespacePartitioning strategy is used, primary namespace is used by default to send. Receives are happening on all namespaces. Both sending and receiving enpoints should be configured to use this strategy with the namespaces they share for primary and secondary.

Not quite. With send-only endpoint there will be no incoming context, therefore you cannot rely on transport transaction to be rolled back (what you get with SendsAtomicWithReceive). For that reason, when you do messageSession.Send(msg) in your web controller, for example, operation will be attempted with the primary namespace. Then with alternative namespace (assuming you’re using multiple namespaces). Eventually, if dispatch fails, exception will propagate into your controller. And that one should be handled by your code.

The topic of reliable sends from a web context is something we’re discussing internally. At the moment I don’t have an answer other than “handle exception and retry send operation”.

I understand the concert. At the same time, as a user, I’d rather see a message telling me my response (which I worked hard on) could not be sent. And I’d be happy to retry if would be presented with such option

fcastells · May 11, 2018, 5:05pm

OK, if I understand you correctly, the FailOverNamespacePartitioning would work fine in the web context: try the first namespace and if it fails, try the fallback one. If that fails too, I will have to deal with it, which is fine with me.

Then in the backend, messages will be handled from both namespaces, which is what I want, but with the FailOverNamespacePartitioning the SendsAtomicWithRecive won’t work if the receive is on one namespace and the send is on another one. That must really an edge case, as if I can receive messages I can likely send them too (unless reaching the limits of the namespace, but that’s not our case now). But from what I’ve read the FailOverNamespacePartitioning won’t allow me to start the endpoint together with SendsAtomitcWithRecieve.
So, is there a way to say “ok, I accept that in some edge case the sends won’t be atomic with receive”? even if it’s creating a custom strategy based on the FileOverNamespacePartitioning?

SeanFeldman · May 11, 2018, 9:19pm

I wish it would be that simple

With FailOverNamespacePartitioning we cannot guarantee atomic operations if an incoming message is not from the same namespace that is used for sending messages out. If you explicitely request SendsAtomicWithReceive transport transaction mode with FailOverNamespacePartitioning, the endpoint will throw, forcing you to downgrade transport transaction mode to ReceiveOnly. In this scenario, dispatch will fail over to the secondary namespace, but you won’t be able to use SendsAtomicWithReceive. For a send-only endpoint that’s not a problem. For a backend endpoint that recieves and sends it would be a challenge if you want to take advantage of the atomic operations.

What you’re after,

is not possible. The transport is making an assumption - atomic opearations are possible only when the following criteria is fully met:

There’s only one namespace in use
SBMP protocol is used (not AMQP)
ReceiveMode is set to PeekLock (not ReceiveAndDelete)

When criteria is not met, transport transaction mode is downgraded.

First requirement is due to ASB service not able to support transaction between two namespaces.
Second requriement is a Microsoft ASB client issue (can’t support transactions with AMQP protocol).
Thried requriement is a requirement for message to remain on the server for transaction to take place.

NServiceBus endpoint cannot change its transaction mode at runtime. Hence transaction mode is locked when it starts. If requested transaction mode is higher than what can be supported, endpoint will throw. If no explicit transaction mode was requested, endpoint will downgrade transaction mode to respect the settings user has provided to use.

fcastells · May 12, 2018, 8:48am

OK, I understand. So, a more simple request (I think):

Use the Failover outside handler context (or on send only endpoints)
On the receiving endpoints, don’t use it, but consume the messages from both namespaces

That would solve our critical problem (failing on web requests).

SeanFeldman · May 15, 2018, 5:11am

This would not be possible with a single endpoint and current ASB transport. For the same reason as above, you’d be using more than a single namespace which in turn will disable SendsAtomicWithReceive.

Also, depending on what you do when you receive a message, you could try multi-hosting approach where the same endpoint on two different namespace would consume messages. Still, without atomic transport operations.

fcastells · May 15, 2018, 9:39am

OK, that’s unfortunate. I don’t like any of the solutions (disable SendsAtomicWithReceive nor multi-hosting), specially because it seems the FailOverNamespacePartitioning strategy does what I want for SendOnly endpoints and almost what I want for the receiving endpoints.

Isn’t there really no way of creating a custom one that does what I need? Receive from 2 namespaces and send to the same it received from. I will try to develop this myself, but some directions would be appreciated.

Thanks

SeanFeldman · May 15, 2018, 7:40pm

I wish magic would exist, but it doesn’t. It’s not about NServiceBus transport implementation only in this case, but about what the underlying service can and cannot support. If you seek for SendsAtomitcWithRecieve while working with multiple namespace, you’re asking to support atomic messaging operations that would span more than a single namespace. Azure Service Bus cannot perform cross-namespace operations atomically.When and if that support is available, we could evaluate this again. As of today, to use SendsAtomitcWithRecieve only a single namespace can be used.

fcastells · May 25, 2018, 9:36am

(I wrote this message a while back, but it seems it was not published)

Well I am not really asking for this. From my point of view, a single message comes only from a single namespace and I only need SendAtomicWithReceive within that namespace, because I don’t want to send messages to namespace B if I’m handling a message from namespace A. I think this would work well enough for us.

SeanFeldman · May 25, 2018, 3:02pm

a single message comes only from a single namespace and I only need SendAtomicWithReceive within that namespace

That’s the default behaviour when a single namespace is used indeed.