Highly Available Scaled Out Subscriber Endpoints with MSMQ

rossbeehler · August 8, 2018, 3:22pm

We have a requirement to create highly available subscriber endpoints that utilize the MSMQ transport. We’ve done this well in the command sending scenario using an instance mapping file and a custom distribution strategy that checks that a particular endpoint instance’s Windows Service is running before sending it a message. However, that mechanism isn’t available (as we’ve experienced) when a publisher sends a message to a particular subscriber since the subscriptions specify physical endpoint instances, and not just logical endpoints. Also, the autosubscribe behavior of the subscriber always sends the subscription request pointing back to the physical endpoint instance that sent it, not a logical endpoint.

Are we possibly missing something, or is this functionality not supported?

If not supported, what is particular’s recommendation on supporting highly available scaled out subscriber endpoints with MSMQ?

Thanks!
Ross Beehler

SzymonPobiega · August 9, 2018, 5:07am

Hi

What are your exact availability requirements? Is it defined in terms of maximum time between sending a message and successfully processing it? The possible problems I see with the approach to commands you described are

The fact that the Windows Service is up when sending a message does not guarantee it will still be up later, when the message is picked up from the queue so you can still end up in a scenario where messages are waiting in the queue
If you check the availability of the receiver for each sent out message that can seriously affect the sending throughput

With MSMQ being distributed the biggest issue when talking about high-availability is handling messages that have been delivered to a processing machine that subsequently failed (either the endpoint died or the whole machine). You can’t just easily receive (steal) these machines remotely so they are “locked” until that machine/endpoint is back up again.

You can counter this by deploying your subscribers to a Windows failover cluster (as described here).

Is switching to a different transport an option?

Szymon