Hey folks looking for some quick advice here. In a small project for a friend we’ve seen a weird situation where the NSB message pump appears to just stop, no errors, no exceptions, no logs, but the app is still running and going fine otherwise and you can attach a debugger and the app’s timer based logic is going fine. Just that randomly we get a situation where messages simply don’t process in the queue. This situation is “Solved” with restarting the app and the message queue will drain. Details as follows:
So helping some folks on a net 8 base upgrade on a well baked stable Net Framework service that uses NSB. Normally this should be a quick deal (and at first, it appeared so). But we’ve noticed the queue issue noted above. Let me outline all the packages and moving parts and what I did.
So first of all this is windows 8 console app targeting windows (they’re a MS shop so fair enough) so using -Microsoft.Extensions.Hosting.WindowsServices 8.0 package as the base and the typical IHostBuilder.UseWindowsService() call followed by using -NServiceBus.Extensions.Hosting 2.0.1, again with the typical IHostBuilder.UserNServceBus() call. Nothing fancy or going off the reservation. This is fundamentally a very simple application.
Rest of the NSB related packages are
-NServiceBus 8.2.4
-NServiceBus.Newtonsoft.Json 3.0.1
-NServiceBus.Persistence.Sql 7.0.6
-NServiceBus.SqlServer 7.0.9
Other important bits are
-EntityFramework 6.5.1
-Microsoft.EntitytFramework.SqlServer 6.5.1
(If you ask “Why EF not EF Core”, mainly as I wanted to focus on migrating the host vs their internal handler/entity logic at the same time)
In the process I migrated handlers one or two at a time from “instance 1” to “instance 2” endpoint slowly strangling the old implementation until (hopefully) all that’s left is the new migrated application and the old Net Framework instance gets decommissioned.
So immediate things to note
Right as the app starts up were making sure we’ve opted in for DTC support as well as using the EF provider that uses Microsoft.Data.SqlCLient 6.0.1 so we got that covered
// Used to instruct Net8 that yes, we do want DTC
TransactionManager.ImplicitDistributedTransactions = true;
//Override to use new EF provider - must happen before any DB code
System.Data.Entity.DbConfiguration.SetConfiguration(new MicrosoftSqlDbConfiguration());
At first I figured “Oh we failed the log and restart the service on critical error” but I’ve made sure that they’re logging out failed messages and critical error action is set. I also extended this to support recovery logging events as well like this
// Recoverability settings
{
var recoverability = endpointConfiguration.Recoverability();
recoverability.Immediate(settings => settings.OnMessageBeingRetried((retry, ct) =>
Logger.LogEventAsync(retry.Exception, Severity.Medium, $"Message {retry.MessageId} will be retried immediately.", ct)));
recoverability.Delayed(settings => settings.OnMessageBeingRetried((retry, ct) =>
Logger.LogEventAsync(retry.Exception, Severity.Medium, $"Message {retry.MessageId} will be retried after a delay.", ct)));
recoverability.Failed(settings => settings.OnMessageSentToErrorQueue((failed, ct) =>
Logger.LogEventAsync(failed.Exception, Severity.High, $"Message {failed.MessageId} will be sent to the error queue.", ct)));
endpointConfiguration.DefineCriticalErrorAction(OnCriticalError);
endpointConfiguration.Recoverability().Failed(s =>
{
s.OnMessageSentToErrorQueue(OnMessageSentToErrorQueue);
});
}
The implementation of the OnCriticalError is
private static async Task OnCriticalError(ICriticalErrorContext context, CancellationToken cancellation)
{
if (Debugger.IsAttached) Debugger.Break();
var exception = context.Exception;
var errorMessage = context.Error;
await Logger.LogEventAsync(exception, Severity.Fatal, $"Fatal Bus exception: {errorMessage}", cancellation);
try
{
await context.Stop(cancellationToken).ConfigureAwait(false);
}
finally
{
Environment.FailFast(fatalMessage, context.Exception);
}
}
Again, this is pretty standard stuff here.
We never see a log about failures. We never get errors into the log. Nothing. Just some number of messages process and suddenly, nothing.
I am 100% certain I am missing something but for the life of me I can’t see what. Any ideas?
Have I missed another bit of logging or tracing that could uncover what is going on here?
thanks in advance!