Would it be possible to do alerting based on a queue size exceeding a specified limit? I have alerting around heartbeats and failed messages and was wondering if queue size would be a possibility.
Unfortunately, our Platform does not have any out-of-the-box solution for this at this time.
It may be possible to do it, but since Queue Length is a metric only generated and consumed by ServiceControl monitoring it would require you to pull together several capabilities to emit that metric elsewhere, which isn’t trivial.
I would be willing to help you with it. If you would like for me to reach out let me know and we can schedule some time to go over your needs.
Thanks Bob. I will reach out sometime next week. Thanks.
In my class I explain why queue length is an overly basic metric to rely on.
Explained briefly, the greater the throughput of an endpoint, both due to the incoming load and due to the amount of processing pulling off the queue, will result in higher average queue length, which is just fine. You wouldn’t want to get alerts about it.
A better metric to look at is the TIME messages are waiting in the queue.
An even better metric includes the queue wait time plus the network time as well as the processing time. This is called Critical Time in the platform and is available out of the box:
I great way to get alerts on this is to integrate with a tool like New Relic, Datadog, Prometheus or any other tool via our metrics extension API.
Samples are available at: Logging and Metrics Samples • NServiceBus • Particular Docs
These tools will allow you to define any alerting thresholds.
On top of that I would always recommend to also as @boblangley setup an alert based on queue size. Especially if you do not have your endpoints setup in a fail-over / high availability setup. This would behave like a fail safe in case no endpoint instances are running but I would recommend to setup monitoring for critical applications/services too.