Endpoint is missing when NServiceBus creates it

We are seeing some pretty weird issues when starting NServiceBus, specifically it seems some of the queues NServiceBus is supposed to be creating, is gone.
The specific Exception we get is this:

2020-08-04 17:50:52.487 FATAL Receiver Main failed to start.
RabbitMQ.Client.Exceptions.OperationInterruptedException: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text='NOT_FOUND - no queue 'batch.event2job-instance-1' in vhost 'rhdh2'', classId=60, methodId=20
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.BasicConsume(String queue, Boolean autoAck, String consumerTag, Boolean noLocal, Boolean exclusive, IDictionary`2 arguments, IBasicConsumer consumer)
   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicConsume(String queue, Boolean autoAck, String consumerTag, Boolean noLocal, Boolean exclusive, IDictionary`2 arguments, IBasicConsumer consumer)
   at RabbitMQ.Client.IModelExensions.BasicConsume(IModel model, String queue, Boolean autoAck, String consumerTag, IBasicConsumer consumer)
   at NServiceBus.Transport.RabbitMQ.MessagePump.Start(PushRuntimeSettings limitations) in /_/src/NServiceBus.Transport.RabbitMQ/Receiving/MessagePump.cs:line 113
   at NServiceBus.TransportReceiver.Start()
   at NServiceBus.ReceiveComponent.Start()

04-08-2020 17:50:52 Critical: Application startup exception {} 
   at Microsoft.AspNetCore.Hosting.HostingLoggerExtensions.ApplicationError(ILogger logger, EventId eventId, String message, Exception exception)
   at Microsoft.AspNetCore.Hosting.HostingLoggerExtensions.ApplicationError(ILogger logger, Exception exception)
   at Microsoft.AspNetCore.Hosting.GenericWebHostService.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Event2Job.Program.Main(String[] args)
   at Event2Job.Program.<Main>(String[] args)



Unhandled exception. RabbitMQ.Client.Exceptions.OperationInterruptedException: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text='NOT_FOUND - no queue 'batch.event2job-instance-1' in vhost 'rhdh2'', classId=60, methodId=20
   at RabbitMQ.Client.Impl.SimpleBlockingRpcContinuation.GetReply(TimeSpan timeout)
   at RabbitMQ.Client.Impl.ModelBase.BasicConsume(String queue, Boolean autoAck, String consumerTag, Boolean noLocal, Boolean exclusive, IDictionary`2 arguments, IBasicConsumer consumer)
   at RabbitMQ.Client.Impl.AutorecoveringModel.BasicConsume(String queue, Boolean autoAck, String consumerTag, Boolean noLocal, Boolean exclusive, IDictionary`2 arguments, IBasicConsumer consumer)
   at RabbitMQ.Client.IModelExensions.BasicConsume(IModel model, String queue, Boolean autoAck, String consumerTag, IBasicConsumer consumer)
   at NServiceBus.Transport.RabbitMQ.MessagePump.Start(PushRuntimeSettings limitations) in /_/src/NServiceBus.Transport.RabbitMQ/Receiving/MessagePump.cs:line 113
   at NServiceBus.TransportReceiver.Start()
   at NServiceBus.ReceiveComponent.Start()
   at NServiceBus.StartableEndpoint.Start()
   at NServiceBus.HostingComponent.Start(IStartableEndpoint startableEndpoint)
   at NServiceBus.ExternallyManagedContainerHost.Start(IBuilder externalBuilder)
   at Core.Messaging.ServiceBus.Queue.EndpointFactory.StartNServiceBus(IServiceProvider services) in C:\BuildAgent\work\9e2bcdf0585101e4\Libs\Core.Messaging\ServiceBus.Queue\EndpointFactory.cs:line 181
   at ServiceBase.Setup.BaseStartup`1.PreConfigureAsync(IApplicationBuilder app) in C:\BuildAgent\work\9e2bcdf0585101e4\Libs\ServiceBase\Setup\BaseStartup.cs:line 244
   at ServiceBase.Setup.BaseStartup`1.Configure(IApplicationBuilder app) in C:\BuildAgent\work\9e2bcdf0585101e4\Libs\ServiceBase\Setup\BaseStartup.cs:line 140
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.AspNetCore.Hosting.ConfigureBuilder.Invoke(Object instance, IApplicationBuilder builder)
   at Microsoft.AspNetCore.Hosting.ConfigureBuilder.<>c__DisplayClass4_0.<Build>b__0(IApplicationBuilder builder)
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.<>c__DisplayClass13_0.<UseStartup>b__2(IApplicationBuilder app)
   at Microsoft.AspNetCore.Mvc.Filters.MiddlewareFilterBuilderStartupFilter.<>c__DisplayClass0_0.<Configure>g__MiddlewareFilterBuilder|0(IApplicationBuilder builder)
   at Microsoft.AspNetCore.Server.IIS.Core.IISServerSetupFilter.<>c__DisplayClass2_0.<Configure>b__0(IApplicationBuilder app)
   at Microsoft.AspNetCore.HostFilteringStartupFilter.<>c__DisplayClass0_0.<Configure>b__0(IApplicationBuilder app)
   at Microsoft.AspNetCore.Hosting.GenericWebHostService.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Event2Job.Program.Main(String[] args) in C:\BuildAgent\work\9e2bcdf0585101e4\Apps\event2job\Program.cs:line 11
   at Event2Job.Program.<Main>(String[] args)

Looking in Rabbitmq, that queue is indeed missing.

Is there something known here? And should we just retry the start operation? Manually restarting the service makes the queue appear, so it does get created properly (Apparently)

Hi @Rasmus_Hansen,

Do you have installers enabled in your endpoint configuration (endpointConfiguration.EnableInstallers();)?

@Tim Yep

 var endpointConfiguration = new EndpointConfiguration(name);
            endpointConfiguration.EnableInstallers();
            endpointConfiguration.MakeInstanceUniquelyAddressable(instanceId);
            endpointConfiguration.EnableCallbacks();

Is there something known here? And should we just retry the start operation? Manually restarting the service makes the queue appear, so it does get created properly (Apparently)

Are you saying it makes a difference whether you start the endpoint manually (what would this look like?) or as part of ASP.NET whether the queue gets created or not?

NServiceBus does not delete queues, so it seems more likely that something else has removed the queue after the endpoint had been started?

Are you saying it makes a difference whether you start the endpoint manually (what would this look like?) or as part of ASP.NET whether the queue gets created or not?

We do not use your dotnet core setup library, since we need to add some dynamically registered middleware.

We use the following code to actually start the endpoint:

            var lifecycle = services.GetRequiredService<IHostApplicationLifetime>();
            var startableEndpoint = services.GetRequiredService<IStartableEndpointWithExternallyManagedContainer>();
            var options = services.GetRequiredService<ActualEndpointOptions>();
            var logger = services.GetRequiredService<ILogger<EndpointFactory>>();

            IEndpointInstance endpoint = null;
            // Use less than or equal, so we can easily know if it was the last time
            // Also we don't really care that much if we retry an extra time
            for (var i = 0; i <= MaxRetryCount; i++)
            {
                try
                {
                    logger.LogInformation("Starting NServiceBus");
                    endpoint = await startableEndpoint.Start(new ServiceProviderAdapter(services));
                    logger.LogInformation("Started NServiceBus");
                    break;
                }
                catch (BrokerUnreachableException exception)
                {
                    logger.LogWarning(exception, "Failed to connect to RabbitMQ, waiting for retry...", "retryCount", i);
                    await Task.Delay(TimeSpan.FromMinutes(1));
                    logger.LogInformation("Waiting finished, retrying...", "retryCount", i);

                    if (i == MaxRetryCount)
                    {
                        logger.LogCritical(exception, "Failed to start NServiceBus, giving up on life");
                    }
                }
            }
```

So i can just change it to a general exception, which i then assume would work around the issue, due to the retrying.  

> NServiceBus does not delete queues, so it seems more likely that something else has removed the queue after the endpoint had been started?

That's what is making me wonder, it seems more like the queue doesn't get properly created, since it happened on a system where nothing was running (A local VM).

How does the NServiceBus.Extensions.Hosting package prevent you from register middleware?

That’s what is making me wonder, it seems more like the queue doesn’t get properly created, since it happened on a system where nothing was running (A local VM).

With installers enabled, the endpoint shouldn’t be able to start without successfully creating the queues.
It would be useful to have access to a repro sample or the full source code of your project. As you might not want to share this on this public forum, you can also open a support case and provide those information via private channels for further investigation.

How does the NServiceBus.Extensions.Hosting package prevent you from register middleware?

We provide some hooks for when we are setting up the endpoint, where packages can register themselves during startup, and they are then allowed to do things like adding middleware, or for setting up some internal routing they need to talk to their origin service.

Furthermore back when the package came out, we were still hosting two endpoints in each service (Because there is no way to send a message to all instances of a given service otherwise) (Though this is no longer relevant, we have since then gotten a separate solution for this)

Lastly the ExternallyManagedContainerHost configures the message session in a very strange way, such that if you try to use it before it is completely ready, it stays broken until the service is restarted (Where the same problem will likely repeat), so we manage that out of NSB, with a wrapper, such that it cannot even be accessed before it has actually been started. (The problem here is specifically that Lazy<T>, if it encounters an exception when trying to get the value, “caches” that exception, so you can’t even keep retrying until it’s available).

With installers enabled, the endpoint shouldn’t be able to start without successfully creating the queues.
It would be useful to have access to a repro sample or the full source code of your project. As you might not want to share this on this public forum, you can also open a support case and provide those information via private channels for further investigation.

I have not been able to reproduce this consistently since then, i just happened to get the logs for this once :confused:, so sadly i cannot send a minimal case, and trying to ship the entire solution is probably not going to work, since it depends on some internal nuget packages.

So would it be safe to just retry in the above code, if the OperationException happens again?

have you considered using Feature for this? Those are pulled in automatically and can extend the pipeline and more during startup: Features • NServiceBus • Particular Docs

Lastly the ExternallyManagedContainerHost configures the message session in a very strange way, such that if you try to use it before it is completely ready, it stays broken until the service is restarted

AFAIK this is being fixed with the next release.

So would it be safe to just retry in the above code, if the OperationException happens again?

I can’t guarantee this so far as to my knowledge this isn’t explicitly verified via tests but I’m not aware of an obvious problem using this approach.

btw. seeing that you are using IStartableEndpointWithExternallyManagedContainer and a ServiceProvider adapter, is there a reason you’re not using the types provided by NServiceBus.Extensions.DependencyInjection that allow you to use ServiceCollection and ServiceProvider directly via EndpointWithExternallyManagedServiceProvider?

have you considered using Feature for this? Those are pulled in automatically and can extend the pipeline and more during startup: Features • NServiceBus • Particular Docs

Even in external assemblies? That would make it easy then.

AFAIK this is being fixed with the next release.

That would be very nice, do you have a GitHub ticket i can subscribe to?

btw. seeing that you are using IStartableEndpointWithExternallyManagedContainer and a ServiceProvider adapter, is there a reason you’re not using the types provided by NServiceBus.Extensions.DependencyInjection that allow you to use ServiceCollection and ServiceProvider directly via EndpointWithExternallyManagedServiceProvider ?

Those types are the NServiceBus.Extensions.Hosting, for which i just imported the sources into our project, so i could access the internal classes, and bootstrap NSB myself. I wrote this code before NServiceBus.Extensions.Dependency was a thing. Back then the only alternative for the netcore DI was a 3. party package, which created a separate DI instance, thus causing singletons to not be singletons. But it might be worth it to look into that new package instead.

Even in external assemblies? That would make it easy then.

NServiceBus by default scans all assemblies deployed to the same folder for features. You only have to make sure the external assembly is deployed correctly.

That would be very nice, do you have a GitHub ticket i can subscribe to?

Here’s the PR that should address this: Improved message session management and better exception thrown when used too early by andreasohlund · Pull Request #85 · Particular/NServiceBus.Extensions.Hosting · GitHub

Those types are the NServiceBus.Extensions.Hosting , for which i just imported the sources into our project, so i could access the internal classes

that’s basically what NServiceBus.Extensions.DependencyInjection is for. It provides public adapter classes and allows you to manage the DI container yourself (Extensions.Hosting assumes the DI container to be managed by the Microsoft Generic Host).