Service Insights doesn't show any messages that were discarded due to custom recoverability policy

seankearney · January 20, 2023, 10:33pm

Service Insights (SI) is part of the Particular platform designed to provide a

Complete under-the-hood visualization of a system’s behaviour

Unfortunately, we’ve run into a situation when the tool fails to deliver visibility, creating a needle in a haystack situation when trying to troubleshoot what has happened. Here’s a scenario to provide the context.

There are two endpoints. Sync and Transcoder. The Sync endpoint is responsible for raising an event to let the Transcoder endpoint perform a media file processing. Transcoder kicks in when Sync raises an event about a file that’s been handled. The issue is that some files cannot be transcoded at all. As a result, some files will fail (DRM problems, incorrect file types with a wrong extension, etc.). In that case, Transcoder throws an unrecoverable exception to avoid unnecessary retries.

A custom recoverability policy is defined for the Transcoder endpoint that instructs to discard erred messages caused by unrecoverable exceptions. There’s no point in sending those messages to the error queue - they cannot be processed no matter what. So NSB framework does the right thing. An event raised by Sync failed processing in Transcoder, and the message is discarded due to the exception being unrecoverable and the policy instructing to discard the failed message. Except the real world poses a challenge. SI, when used to understand what happened, is absolutely blind and clueless to what took place. From SC’s perspective, it never saw the event from Sync to Transcoder because it never hit the audit or the error queue. But, as a prod-ops, you’d really want to know that information. And if it feels like there’s a gap that needs to be filled to allow troubleshooting in production, see a message that was published, received, attempted for processing, failed as unrecoverable, and eventually discarded.

It would be great to see SI providing the necessary visibility.

Example:

DavidBoike · January 23, 2023, 3:06pm

Hi @seankearney,

That’s an interesting use case for sure. I’m curious, rather than using an UnrecoverableException in the message handler and then dealing with the fallout in the custom recoverability policy, why not catch the exception in the handler, and then log and return from the message handler so that the message can just be consumed naturally and then audited?

The reason why I ask is that it seems as if you’re using the custom recoverability policy to do part of your business logic rather than handle infrastructural tasks alone.

seankearney · January 24, 2023, 12:17am

David,
Thanks for the reply.

After some conversations internally, as well as with Particular support, we have decided to go down ~~this~~ a similar route.

Thanks,
Sean

seankearney · January 24, 2023, 7:30pm

The documentation changes in Clearer xmldoc on RecoverabilityActions by DavidBoike · Pull Request #6674 · Particular/NServiceBus · GitHub are certainly helpful to future folks. Thanks.