RabbitMQ Transport, blocked connections handling

dagapitov · January 5, 2018, 8:38pm

Hi All,

We are using NServiceBus with RabbitMQ transport:

NSB 6.4.0
NSB.RabbitMQ 4.4.0
RabbitMQ 3.6.10

Endpoint is configured as SendOnly.

When RabbitMQ server raises the memory alarm and blocks all connections from sending/publishing messages call to the NSB EndPoint.Send method hangs without any error or log message.

I cannot find any reference about this behavior in the NSB documentation or here. Ideally, we want to cancel the current send operation after some timeout and prevent future sends until connection is unblocked.

Do we have any way to define timeout for send operation via NSB settings or Cancel the pending send operation/Task?
Is it possible to subscribe to the blocked/unblocked connection events (Blocked Connection Notifications — RabbitMQ) provided by underlying RabbitMQ.Client .NET library?

Thank you in advance

bording · January 5, 2018, 10:50pm

We don’t currently expose the Blocked/Unblocked event handlers. The main reason for this is that is isn’t clear what benefit there would be for doing this. What logic would you put in those handlers? If you need to send a message but you’ve been told the connection was blocked, what would you do with that message?

There is no way to cancel a send operation or define a timeout for that send. Adding a per-message timeout value would add a lot of overhead for each message. The vast majority of messages sent would have no need for it, so it would be purely cost for no benefit.

When the memory alarm was cleared, what behavior did you see for the messages that were attempted to be sent during the alarm? Once the memory alarm is cleared, those sends should receive the publisher confirm message they are waiting for, and then complete successfully.

dagapitov · January 6, 2018, 7:59am

Thank you for the fast reply.

When the memory alarm is cleared I see messages successfully sent and appeared in the queue. It’s working fine when connection blocked for a short time.

In our case we trying to send message to the queue in the scope of Web API request and waiting for the send task to be completed that is not working well when RabbitMQ connection is blocked for a long time.

Another confusion was that we don’t see any warning/error in the log that messages are not send because of blocked connection and there is no way to subscribe for blocked/unblocked events to know it.

bording · January 8, 2018, 8:46pm

I’m not currently aware of a reason why the length of time that the connection was blocked because of a memory alarm would make a difference. Once the alarm has cleared, I would expect any pending messages to be sent.

Since you mentioned sending message from a Web API request, is there a timeout value on the API request that is being exceeded when your broker is in alarm for too long? That could possibly explain why you aren’t seeing messages sent in that scenario.

If you did have access to the underlying client’s blocked/unblocked events, what code would you want to execute in those event handlers? Just adding log entries to indicate that they happened? We could consider exposing them, or perhaps just adding log entries ourselves to indicate they’ve happened.

However, a larger concern that I have is that you’re seeing memory alarms in the first place. That is not normal behavior for the broker and indicates you’ve got a larger problem to solve. Once you get your memory alarm problem under control, then this whole scenario disappears.

dagapitov · January 9, 2018, 5:30pm

In our current implementation we assumed that message can be successfully send to the queue or failed because of connectivity issues. Web request sending the message will be completed with success or failure. But in fact when connection is blocked originated web request is pending for potentially infinite time as the client application. I agreed that we had infrastructural/environmental issue with the broker in our test environment causing client application hangs. So the question is what is the recommended approach if we want guarantee that message is registered in the system or original operation (web request) failed?

Right now I see next ways:

Keep existing approach (wait for the message to be sent/failed) and rely on broker HA and external monitoring. Web request can be blocked for infinite time but the risk is minimal.
Have external timeout for EndPoint.Send, web request will be unblocked with the failed result. Handler should be ready to ignore such ‘timeouted’ messages.
Do not wait for the message to be send in the web request (for example using another thread). Can loose messages for successful web requests - need to register pending message outside of the broker with ability to resend.
Combination of 2nd and 3rd when only ‘timeouted’ messages are stored externally for future resend.

For the blocked/unblocked events: initial idea was to treat messaging system as non-accessible and fail the original web request when it connection is already known to be blocked via events.

bording · January 9, 2018, 8:52pm

In our current implementation we assumed that message can be successfully send to the queue or failed because of connectivity issues. Web request sending the message will be completed with success or failure.

For actual connectivity problems, where the connection to the broker is lost, this will be true. The connection will be attempting to automatically reconnect, and any messages sent during this time will result in an exception being thrown.

So the question is what is the recommended approach if we want guarantee that message is registered in the system or original operation (web request) failed?

The only time you’ll see the behavior you’re describing is when the broker has gone into a memory or disk alarm. It then pauses publishing connections by blocking them until the conditions causing the alarms are resolved.

As long as you’ve allocated enough resources for your broker, you should not generally be seeing blocked connections.You definitely should be considering setting up monitoring for your broker to be able to detect any alarm conditions that might occur.

Based on that, I’m not sure there’s anything else at the application level that needs to change.

For the blocked/unblocked events: initial idea was to treat messaging system as non-accessible and fail the original web request when it connection is already known to be blocked via events.

I’m not sure that this would be reliable enough to be worth doing. Since the connection could be blocked at any time, it would be possible for your check to pass and start sending a message, but then the connection is blocked before the message is sent, and you’d still end up waiting for the connection to be unblocked.

dagapitov · January 10, 2018, 2:25pm

Thank you.

I still think it would be beneficial for debugging/troubleshooting to have ‘connection blocked/unblocked’ events to be logged at least within the NSB as warnings

bording · January 10, 2018, 6:28pm

I still think it would be beneficial for debugging/troubleshooting to have ‘connection blocked/unblocked’ events to be logged at least within the NSB as warnings

I can see how that might be useful, so I’ve opened Log blocked connections using blocked/unblocked events · Issue #468 · Particular/NServiceBus.RabbitMQ · GitHub for us to consider adding it.