Environment.FailFast in OnCriticalErrorAsync do not restart TopShelf service on Error (SQL Server down)

We recently upgraded to NServiceBus 8.2.X and we noticed an issue whereby if our service encounters an issue, in our case our SQL Server data goes down, the Recover-ability mechanism configured in Topshelf is not get triggered.
Example:

x.EnableServiceRecovery(recovery =>
{
    recovery.OnCrashOnly();
    recovery.RestartService(delayInMinutes: 0); //Restart immediately
    recovery.RestartService(delayInMinutes: 1); //1 min delay
    recovery.RestartService(delayInMinutes: 5); //// Corresponds to 'Subsequent failures': Restart the Service no matter how many times it crashes
    recovery.SetResetPeriod(1);
});

Our OnCriticalErrorAsync is configured as recommended by NServiceBus:

    private static async Task OnCriticalErrorAsync(ICriticalErrorContext context, CancellationToken cancellationToken)
    {
        var fatalMessage =
            $"The following critical error was encountered:{Environment.NewLine}{context.Error}{Environment.NewLine}Process is shutting down. StackTrace: {Environment.NewLine}{context.Exception.StackTrace}";

        try
        {
            await context.Stop(cancellationToken).ConfigureAwait(false);
        }
        finally
        {
            Environment.FailFast(fatalMessage, context.Exception);
        }
    }

We believe Environment.FailFast attempts a graceful shutdown and the correct Exit Code is not send, thus Topshelf does not attempt the service restart.

I tried to throw an error instead of the Environment.FailFast, it still does not seem to work.

Any idea how we can solve this?

This is a Topshelf limitation. You can try to use Environment.Exit(1) instead but I would recommend to move away from Topshelf.

Reason for recommending FailFast is that it is guaranteed that this will stop the process where with Exit background threads could keep the process alive which is unwanted in situations where we just want to abort.

1 Like