Request/Reply that hangs

jgauffin · February 20, 2020, 9:54am

I started to get a weird problem that I can’t figure out.

I’ve been starting to make our client more asynchronous. In during that work the endpoint instance stopped returning a reply.

I can see that the microserver sends the reply, and if I stop the client the messages goes to the error queue (but not before). It gets the typical “No handler could be found” message.

NSB config:

var endpointName = $"{_clientEndpointNameDefault}.Control2";
var endpointConfiguration = new EndpointConfiguration(endpointName);
endpointConfiguration.DisableFeature<AutoSubscribe>();
endpointConfiguration.UsePersistence<InMemoryPersistence>();
endpointConfiguration.MakeInstanceUniquelyAddressable("Callbacks");
endpointConfiguration.EnableCallbacks();
endpointConfiguration.Recoverability().Delayed(x => x.NumberOfRetries(0));
endpointConfiguration.Recoverability().Immediate(x => x.NumberOfRetries(0));

var settings = JsonSettingsFactory.Create();
var serialization = endpointConfiguration.UseSerialization<NewtonsoftSerializer>();
serialization.Settings(settings);

var transport = endpointConfiguration.UseTransport<RabbitMQTransport>();
transport.ConnectionString(GetQueueConnectionString());
transport.UseConventionalRoutingTopology();

And the simple call:

var options = new SendOptions();
options.SetDestination(RoutingModuleName);
var query = new GetRouteTable();
var routes = await endpoint.Request<GetRouteTableResult>(query, options).ConfigureAwait(false);

From the receiving microservice log:

2020-02-20 09:44:55,767 DEBUG@20 [(null)] [LogInboundMessagesHandler:0] [218a976d-d781-482b-85da-ab6700a0a7c1] EnclosedMessageTypes: GetRouteTable;LAIV.Foundation.Contracts;v1.0.6.0, ReplyToAddress: LAIV.Client.TRV-PC84655.Control2-Callbacks, CorrelationId: 218a976d-d781-482b-85da-ab6700a0a7c1, OriginatingMachine: TRV-PC84655, OriginatingEndpoint: LAIV.Client.TRV-PC84655.Control2, NServiceBus.Version: 7.2.0, TimeSent: 2020-02-20 09:44:55:801963 Z
2020-02-20 09:44:55,767 DEBUG@20 [(null)] [LogOutboundMessagesHandler:0] [d2d83dce-76db-430d-b3f5-ab6700a0a7db] EnclosedMessageTypes: GetRouteTableResult;LAIV.Foundation.Contracts;v1.0.6.0, ReplyToAddress: Routing, CorrelationId: 218a976d-d781-482b-85da-ab6700a0a7c1, OriginatingMachine: TRV27124, OriginatingEndpoint: Routing

Update:

If I add a simple handler:

    public class Handler : IHandleMessages<GetRouteTableResult>
    {
        public Task Handle(GetRouteTableResult message, IMessageHandlerContext context)
        {
            Console.WriteLine("Got it!");
            return Task.CompletedTask;
        }
    }

It will receive the message (the correlation id in the request and reply matches), so the message is delivered to the endpoint.

Update 2

I`ve also tried:

endpoint.Request<GetRouteTableResult>(query, options)
                .ContinueWith(x => { Console.WriteLine("gotit!"); });
await Task.Delay(5000).ConfigureAwait(false);

The delay goes through, but ContinueWith do not.

Can you help me figure out why it doesn’t return when the reply arrives? Is there any way that I can diagnose why the reply isn’t used by Request<T>?

danielmarbach · February 24, 2020, 7:21am

Hi Jonas,

That’s weird. By looking at the log we can see that you get the request with the correct reply-to address. Do I assume correctly that the return object is not an enum but a rich object based message? If it would be an enum you would have to enable the callback support on the replier side as well

endpointConfiguration.EnableCallbacks(makesRequests: false);

I tried to repro your case by slightly modifying the sample Callback Usage • Callbacks Samples • Particular Docs to just send object callbacks and not use the EnableCallbacks call on the receiver side. It seems to work fine.

Here is the repro I used Dropbox - File Deleted

Are your response messages properly marked as messages by either implementing IMessage or match the message convention?

Would it be possible to send us a minimal repro so that we can have a look at it to support at particular dot net?

Thanks
Daniel

jgauffin · February 25, 2020, 3:40pm

I’ve finally found the root cause.

In my other thread I’ve slimmed down which assemblies that NSB can load during startup.

The issue is because NServicebus.Callbacks.dll was excluded.

Today the documentation says:

During the scanning process, the core assembly for NServiceBus ( NServiceBus.Core.dll ) is automatically included since it is required for endpoints to properly function.

You might want to add that the Callback assembly must be included for requests to properly function. It was a bit hard to find the cause since the request just hangs and the log just prints “No handler can be found for [TheResultMessage]”.

To reproduce it, take any of your samples and do the following:

endpointConfiguration.AssemblyScanner().ExcludeAssemblies("NServiceBus.Callbacks.dll");

(In our code, we loop through all AppDomain assemblies and exlude all which are not handler assemblies).

Personally I think that the whole assembly process is backwards. Why not just have a AddHandlerAssembly method in the assembly scanner (which means that no assemblies should be automatically added). Most devs know which assemblies to include, but it’s harder to understand which ones must be exluded. That also improves the startup time since everyone that wants to exclude assemblies must scan through all assemblies (or files) to be able to exclude everything that doesn’t contain handlers, while for AddHandlerAssembly no scanning is required at all (and you typically want to include a lot less assemblies than you must exclude, especially in .NET Core projects).

danielmarbach · February 27, 2020, 12:23pm

Hi Jonas

By default any downstream package that we provide can have features contained that get activated due to assembly scanning. In general we are trying to enforce references by having extension methods that enable the features that are used. So for example the callbacks package enables the features that is uses when you add endpointConfiguration.EnableCallbacks(). So that means your code must have a reference to the callbacks assembly.

As you rightfully pointed out assembly scanning is and individual component that you can control. So if you choose to go down the path of excluding things by default (because of performance optimizations) and you then exclude an assembly that contains features the core code will not activated it anymore and thus the feature becomes unusable.

Personally I think that the whole assembly process is backwards. Why not just have a AddHandlerAssembly method in the assembly scanner (which means that no assemblies should be automatically added)

We are always trying to find the right balance between making it easy to use for common cases vs allowing more advanced scenarios. Like with any black magic like assembly scanning some people are pro those approaches and some are not. What we have seen though is that by use scanning for handlers and wiring them up automagically most of our customers like it. That doesn’t mean we could not think about having a no scanning mode where handler and feature assembly have to be registered explicitly.

I raised Add explicit handler assembly registration mode instead of relying on scanning all assemblies · Issue #5595 · Particular/NServiceBus · GitHub that as a feature request

Regards
Daniel

jgauffin · March 4, 2020, 8:40am

I’m at the client, so I don’t have my github account here.

With my assembly scanning exclusion:

Default assembly scanning:

The difference is 9 seconds. It’s AssemblyName.GetAssemblyName() that takes most of the time.

danielmarbach · March 4, 2020, 10:06am

Hi Jonas,

Is the endpoint startup time an actual problem for you? How many times do you restart the endpoint? Is it something that bothers you in production or in your integration tests?

Regards
Daniel

jgauffin · March 4, 2020, 10:13am

Yes it’s a problem. We have a non functional requirement regarding the startup time which cannot be changed. 4 seconds is almost ok, 13 is not.

I’ve used reflection a lot in different projects. I don’t really understand why you do like you do.

However, I can improve other parts of the startup time so that we can reach our requirement. This post is just to tell that your assembly handling is slow and can be improved considerably.

It’s not a requirement from me, the startup time is OK now when I excluded almost all assemblies in the bin folder. Thus, this is my last post on this issue

danielmarbach · March 4, 2020, 10:29am

We are always open for suggestions, hints or even pull requests if you see obvious things that you could improve based on your experience.

Regards
Daniel