We have a financial processing system based on NSB on top of RabbitMQ. We are experiencing occasional message latencies and I am not sure where to look to find the issue. We are processing max 4RPS as reported by RabbitMQ management. The problematic messages are just doing some database work and are then returned to the requesting server.
1% of the messages take more than 5s to return. 10% take between 100ms and 5s to return. 89% behave normally and return in an average of 22ms. There are even a few stragglers that take over 10s to return, which causes timeouts for the clients.
I have confirmed that the long duration messages are delayed at transport, not during the database work. Indeed, I created a ping tester using the same NSB configuration with no database activity, and it experiences the same issues, however, with shorter durations overall.
Here is an example of the ping tester output with the first number in pink being the round-trip ping time. (The second is roughly the one-way time from source to target).
Why would this occasionally take 1000ms instead of the usual 3ms?
I have confirmed that CPU and memory are never an issue, and there are no CPU spikes when the message is delayed.
This behavior seems to be the same no matter what version of erlang/RabbitMQ are installed. I currently have the latest installed on our test system.
Endpoint concurrency is set to 25 for the ping test and 100 for the production system.
Thanks very much for any assistance!