NServiceBus.AzureServiceBus.Transport outage protection

mrentier · March 31, 2020, 3:26pm

Hi,

With the new NServiceBus.AzureServiceBus.Transport some of the features like high availability were dropped. I understand that Premium has these features built-in, but let’s for arguments sake say that Premium does not make a whole lot of sense given our volume, but we do need outage protection.

Microsoft’s guidance here Insulate Azure Service Bus applications against outages and disasters - Azure Service Bus | Microsoft Learn outlines an approach with the standard tier. The biggest issue to tackle is message deduplication. Is it possible to implement this approach on top of the current nServiceBus transport? It appears like you can spin up a multi-host scenario listening to multiple namespaces, the biggest issue being deduplication across different endpoints.

SeanFeldman · March 31, 2020, 10:05pm

Hi Marco,

When the new transport was designed, there were various considerations in favour of or against each feature that was created in the legacy ASB transport. Each decision was not taken lightly and we’ve worked with the updated Microsoft Service Bus team guidelines to better position the transport.

You’ve made some analysis and concluded that the biggest issue is de-duplication. It’s not the biggest issue, far from that. There are more interesting scenarios when it comes to re-inventing the wheel and solving the problem the Premium tier has solved. Let me try and list a few.

High Availability

The article you’ve pointed to was not re-written when the Premium tier was introduced. It was updated with the Premium tier part, leaving the Standard tier as-is. I don’t know if that was intentional or just overlooked, but not mentioning that would be omitting the fact that Microsoft itself has been saying for quite a while now use Premium tier for production. Standard tier is intended for testing, QA, or light production that can accept the drawbacks of the tier (throttling, decreased performance, latency, etc).

When it comes to HA, the Standard tier didn’t offer any. The recommendation that you find in the document refers to the days when workloads could only be executed in that tier. It’s more of a workaround than a solution. Here’s why.

Assume your endpoint is running with SendsAtomicWithReceive transport transaction mode. We’ve received a message and lost the connection just before dispatching an outgoing message from the processing handler. We can’t send the message to another (fail-over) namespace. If we do, we violate the transactional guarantee. This is what happened with the legacy ASB transport and customers didn’t even know it was happening. And then we fail to complete the incoming message. Meaning we’ll repeat the processing and send a duplicate when the failing namespace becomes active.

Regarding de-duplication - we can’t do that either. Not without having some additional persistence mechanism. Or using the Outbox feature which is not always possible (in case you’re using the Azure Storage Persistence).

Premium is too much

You’re correct. I also think that the way the Premium tier is marketed leaves a certain segment of customers left with no option but to use the Standard tier. Does that mean Particular should build it? Or rather provide feedback to Microsoft to allow something in the middle? Not as expensive as the Premium, but not as feature stripped as Standard when it comes to premium features.

At the same time, we don’t want to compete with the broker. The broker can implement features we won’t be able to compete with. And frankly shouldn’t. E.g. cross-namespace forwarding or Geo-DR with a data plane.

Conclusion

We’re listening to the customers and carefully considering feature requests. Features added are going through a rigorous cost and benefits analysis before approved. Multi-namespace support is no exception.

mrentier · March 31, 2020, 11:33pm

Hi,

Certainly, I can think of several other scenario’s besides transactions as well. Azure Service Bus can implement guarantees that you cannot provide from the outside.

This is really a scenario where I am not overly worried about transactions. I’d be willing to have some functional impact from a fail-over (and potentially even a few messages lost). It is mostly about not having to scramble to reconfigure each application to point to another service bus instance during an outage.

The Premium tier is really over-priced for an environment where you are talking about hundred’s of messages at most during a day. Even unpredictable performance is within reason acceptable. But having no support for geo redundancy is a really difficult pill to swallow with anything cloud-based.

I’d guess AWS SQS/SNS is worth consideration for lower volumes, since they do have availability zones.

SeanFeldman · April 1, 2020, 2:26am

I think now you’ve provided more information to be able to make recommendations. As you’ve pointed out, your throughput barely justifies Service Bus selection.

Perhaps transport such as Storage Queues would be more suitable? It will give you the availability you’re looking for for a fraction of the cost. And it’s Azure transport, unlike SQS. All NServiceBus features will continue to work and no major changes would be required. There might be a few little things that would require minor adjustments, but those are straight forward and very manageable.

Messages larger than 64KB would require Azure data bus
Pub/sub will be accomplished with Azure Storage persistence

SendsAtomicWithReceive will not be available, but in your scenario, it’s not needed.

mrentier · April 1, 2020, 4:29am

I have considered storage queues. We are currently on msmq and looking for a replacement. Storage queues would make perfect sense. My biggest issue is that pub/dub is our major use case. A transport that does not natively support pub/sub (like our current msmq transport ) is a major inconvenience when 99.99% of the time you need pub/sub.

SeanFeldman · April 1, 2020, 5:41am

Considering you’re migrating from MSMQ transport with a smooth pub/sub experience that was storage based as well, it is interesting to see your hesitation to switch to Azure Storage Queues (ASQ) transport.

I’m assuming a successful past experience with storage based subscription as you haven’t indicated otherwise. And given the original question about ASB transport, it sounds as if you’re moving towards Azure. And if so, SQS would be a strange choice for transport.

Personally, I would recommend putting together a validation spike to get an idea of what ASQ would look like. You can use an existing Request/Reply sample and convert it into Pub/Sub. Or, alternatively, take the Pub/Sub sample and convert to use the ASQ transport with ASP persistence.

And it doesn’t stop there. You can go with RabbitMQ transport, which has native pub/sub support. You can evaluate SQL Server transport with an internal pub/sub. There are options, you just need to match your requirements and see what you really want and need.

Hope that helps.