Is there a connection retry for the transport?

I’m running into an issue when trying to start up a compose file in docker where the rabbitmq container is starting right away but not ready for a couple seconds, so when my NSB applications start up, the crash fairly quickly because they can’t connect to the transport. If I run the docker-compose file again, everything is fine because rabbit is ready by then. There’s another work around in the docker community where you add a shell script to keep pinging an endpoint before starting. This super kludgy imo. The recommended practice is that the application retry the connections on it’s own.

I’m assuming the NSB has this functionality but I haven’t been able to locate it. The only docs I’ve come across so far talke retries for the messages themselves, which is something different. So, is there a way to set the retry policy for the connection to the transport?

Hi Jeremy,

This is the behavior is the endpoint loses the connection to the transport while it already is running. In your scenario the endpoint is starting and if a connection cannot be made startup fails.

If this is required I would suggest self-hosting your endpoint and to just wrap your start in a loop with an incremental delay. I quickly create the following extension method that does this:

static class StartableEndpointExtensions
{
    public static async Task<IEndpointInstance> StartWithRetry(
        this IStartableEndpoint startableEndpoint,
        int maxAttempts = 10
        )
    {
        var attempts = 0;
        while (true)
        {
            try
            {
                return await startableEndpoint.Start()
                    .ConfigureAwait(false);
            }
            catch (Exception ex)
            {
                if (attempts > maxAttempts) throw;
                var delay = 100 * (int)Math.Pow(2, attempts++); // Exponential back off;
                await Console.Out.WriteLineAsync($"Startup failed, retry attempt {attempts} reason: {ex.Message}")
                    .ConfigureAwait(false);
                await Task.Delay(delay)
                    .ConfigureAwait(false);
            }
        }
    }
}

Instead of using the Start method now use this extension:

var startableEndpoint = await Endpoint.Create(endpointConfiguration)
     .ConfigureAwait(false);
var endpointInstance = await startableEndpoint.StartWithRetry()
    .ConfigureAwait(false);

Just copy the extension method to your project and tweak to your needs. The current implementation does max 10 retries with an increased delay (100, 200, 400, 800, etc.). I suggest you do not use catch(Exception) but to use a specific list of exceptions that you want to catch to be retried.

this doesn’t seem to work because the connection exception throws during Endpoint.Create(endpointConfiguration) I will modify the extension to work with the configuration and see if that works.

If I change it to the following, it doesn’t work because on each subsequent attempt, it throws a different exception stating that the settings have been locked for modification. I’m not sure how to rearrange this to work.

public static async Task<IEndpointInstance> StartWithRetry(
            this EndpointConfiguration configuration,
            int maxAttempts = 10)
        {
            var attempts = 0;
            while (true)
            {
                try
                {
                    return await Endpoint.Start(configuration).ConfigureAwait(false);
                }
                catch (Exception ex)
                {
                    if (attempts > maxAttempts) throw;
                    var delay = 100 * (int) Math.Pow(2, attempts++);
                    await Console.Out.WriteLineAsync($"Startup failed, retry attempt {attempts} reason: {ex.Message}")
                        .ConfigureAwait(false);

                    await Task.Delay(delay).ConfigureAwait(false);
                }
            }
        }

If you want to use Endpoint.Start(configuration) then you must add the endpoint configuration creation in it too.

I purposefully used the Create and Start API’s as I assume the queues and database schemas would already be created and then .EnableInstallers() does not need to be called which is probably the reason you currently are getting exceptions.

Either:

  • Remove .EnableInstallers() when you initialize the EndpointConfiguration
  • Add the creation of EndpointConfiguration to this method
  • Pass a func to this method to split the endpoint configuration

Implementation that uses a func:

static class EndpointHelper
{
    public static async Task<IEndpointInstance> CreateWithRetry(
        Func<EndpointConfiguration> createEndpointConfiguration,
        int maxAttempts = 10
        )
    {
        var attempts = 0;
        while (true)
        {
            try
            {
                var endpointConfiguration = createEndpointConfiguration();
                return await Endpoint.Start(endpointConfiguration)
                    .ConfigureAwait(false);
            }
            catch (Exception ex)
            {
                if (attempts > maxAttempts) throw;
                var delay = 100 * (int)Math.Pow(2, attempts++);
                await Console.Out.WriteLineAsync($"Startup failed, retry attempt {attempts} reason: {ex.Message}")
                    .ConfigureAwait(false);

                await Task.Delay(delay).ConfigureAwait(false);
            }
        }
    }
}

and you could use it like:

var endpointInstance = await EndpointHelper.CreateWithRetry(() => CreateEndpointConfiguration())
    .ConfigureAwait(false);

static EndpointConfiguration CreateEndpointConfiguration()
{
    var cfg = new EndpointConfiguration("MyEndpoint");
    cfg.EnableInstallers();
    cfg.UseTransport<LearningTransport>();
    cfg.UsePersistence<LearningPersistence>();
    //etc...
    return cfg;
}

Hi Jeffrey,

I validated the behavior with RabbitMQ. If you do not call endpointConfiguration.EnableInstallers() and then call Endpoint.Create(endpointConfiguration) then the Create will not try to create a connection with RabbitMQ.

However, this does mean that you have to make sure that any queues or schemas in your database(s) are already created. We recommend to only use the .EnableInstallers during a specific Installation/Upgrade stage and not when you would only start the endpoint. This splits the installation from the actual messages processing which is often required by IT operations.

In general, the installation part uses administrative privileges which are not required when you are processing messages which you only want to run least privilege.

Hope this additional info helps you understand the current behavior of these methods.

Regards,
Ramon

I ended up doing the helper method similar to how you mentioned. Using the factory delegate, it simply created a new endpoint configuration on each attempt until succeeded. In this use case, I need the enableinstallers as this is intended for use in a docker cluster. That is, each time we stand up a new stack, it will be creating new queues (e.g. local sandbox environments, qa environments, feature test environments, etc.). Thanks!

I did want to mention that .EnableInstallers impacts startup performance a bit as it needs to inspect the infrastructure. However, that makes sense your case as it will always be used to initialize a fully fresh sandbox.

Just keep in mind that your endpoint will run with elevated privileges, thus also could drop databases, tables, queues, etc. and probably less suitable for production environments in case you want to apply a locked down execution environment.

What is the recommended practice for installing then? I can’t imagine the intent is to EnableInstallers(), deploy to production, let it run. Then remove the code and redeploy to production.