Is there a way to pause an endpoint?

I’m used to host processes that handle queue messages in Azure WebJobs, where it is easy to go to the Azure portal and stop a WebJob.

Is there a similar way to do this with NSB endpoints hosted in ServiceFabric?

No, there isn’t any Pause feature, yet.
One option is to stop and restart the endpoint using the same configuration. There is actually a bug that prevents an EndpointConfiguration instance to be reused under certain circumstances.

I knows that @ramonsmits has some suggestions on this topic.

.m

The workaround for the EndpointConfiguration reuse bug is outlined here:

Cheers,

Andreas

OK. I read that thread and it makes sense in itself, but I’m not sure how I’d use this to pause and continue the processing of messages. What mechanism could I use to trigger the Stop of the endpoint and what to Start?

There is an explicit API to stop and teardown the endpoint

await endpointInstance.Stop().ConfigureAwait(false);

It comes down to just completely recreating the instance.

What I read in your question that you would like to see a web dashboard where you are able to stop/start/restart an instance. We do not have tools to control your endpoints instances. In most scenarios, we advise to only host a single endpoint instance in a process and to just gracefully stop the endpoint which will also stop the process. Starting would then just be launching the process again. That is the easiest approach and works well with any sort of host process/appdomain. Like a Windows Service, IIS AppPool, etc. I’m not that familiar with ServiceFabric in knowing how that is controlled.

I see. The problem with Service Fabric is that as soon as it’ll see that the process is not running it will start another one. That’s why I think the process should stay alive and well, but the endpoint should stop. So, if I can start and stop the endpoint instance, I might be able to figure out how to do it, although it’d be great to know if someone has already achieved this.

Regarding the dashboard, I don’t particularly need a ready made dashboard, because this will have to be embedded in our ops dashboard. It’s better than having several places where to control the system.

Hi Francesc,

There is nothing like a Pause out of the box in Service Fabric. You can pause a node which means

Pause - effectively “pauses” the node: Services on it will continue to run, but no services should move in or out of the node unless they fail on their own, or unless moving a service to the node is necessary to prevent outage or inconsistency.

But that is probably not what you want since it affects the whole cluster.

If you require endpoints to stop and then restart you have to build a management service that den dynamically deploys services (starts the endpoint) and removes the service when no longer needed (stops the endpoint). When the service is undeployed and removed all the state that is associated with that service would go as well.

Regards
Daniel

Francesc,

To clarify, what process are you talking about?

When you use Service Fabric SDK with stateful or stateless services those will be co-hosted inside the same “process” and share a common x64 address space. So any static data is shared. That is by design.

Endpoint Communication listeners are designed to deal with things starting, stopping and restarting or being made primary. Anything that needs to hook into that lifecycle should be implemented as a communication listener as shown in our hosting guidance.

Does that help?

Regards
Daniel

Hi Daniel, I think we need a lighter approach. Probably I haven’t given enough thought to this, but there is a difference between pausing a service or just parts of it. For example, we could just pause an ICommunicationListener and the service could have several (remoting, plus messaging).

I’m just thinking that I could have the NSB communication listener to periodically monitor a flag somewhere (I don’t know where yet), and if that flag is set, stop the endpoint and be able to restart it when the flag is cleared.

Exactly, Daniel, what I believe I need is a way to tell the NSB communication listener to stop the endpoint or restart it.

Can you elaborate why you think you need this? I’m curious to understand the business reason behind it.

In the past I’ve found myself stopping webjobs (to prevent them to process messages) for two reasons at least:

  1. There is a bug or issue that causes that processing the messages is doing harm. Then I want to stop the processing while we fix the issue.
  2. I want to debug some handlers running on on a hosted environment (develop, staging or even production), not my own local machine environment. The problem is that the services running in the environment suck all the messages faster than I can debug them. Then I simply stop the hosted service and just run it locally so I can pick the messages one by one on my VS.

An alternative that I think might be simpler to implement is that at startup the service checks a setting which specifies if a listener should be created or not. If I configure a service from a central location to not start the NSB listener, then I just need to set this setting and restart all instances of a given service (assuming there is an easy way to do this in SF). This would avoid having to monitor that setting all the time and implement something to restart an endpoint.

Hi Francesc

Would this help?

It uses Service Fabric Remoting to start and stop the endpoint in the cluster

Have a great day

Daniel

Looks interesting. Much nicer than polling a setting or restarting the service. But what happens when there are multiple instances of the service? this will only stop one, right? The stop method should either stop the other instances or we should enumerate all the instances and create a proxy for each (I’m just assuming that this is possible) and stop them all one by one from our Ops application.

Hi Francesc,

Well in general with Stateless Services when you have multiple instances the remoting Service Proxy will initiate the communication with one of the instances available. Any random instance. That is by design.

If you want to address a specific instance, it is usually recommended to turn that service into a stateful service. Then you can address an individual partition by providing the partition key into the service client like so

ServiceProxy.Create<IStartableAndStoppableService>(serviceUri, new ServicePartitionKey("partitionKey")

You can query the available partitions by using the partition list available via the query manager on the fabric client.

If turned into a stateful service is not an option you could try to uniquely name communication listener and then specify the listener name when you create the proxy like shown below

ServiceProxy.Create<IStartableAndStoppableService>(serviceUri, listenerName: "name")

one way to do it is to abuse the instance id

new ServiceInstanceListener(c => communicationListener, $"NServiceBus-{Context.InstanceId}")

and then you can query for the deployed instances with the fabric client

var fabricClient = new FabricClient();
var queryManager = fabricClient.QueryManager;
var nodes = await fabricClient.QueryManager.GetNodeListAsync();
foreach (var node in nodes)
{
    var replicas = await queryManager.GetDeployedReplicaListAsync(node.NodeName, new Uri("fabric:/Application1"));
    var instances = replicas.OfType<DeployedStatelessServiceInstance>()
        .Where(c => c.ServiceTypeName == "Stateless1Type");

    foreach (var serviceInstance in instances)
    {
        var anotherProxy = ServiceProxy.Create<IStartableAndStoppableService>(serviceUri, listenerName: $"NServiceBus-{serviceInstance.InstanceId}");
        await anotherProxy.Start();
    }
}

but be aware of the following caveats with the instance id:

InstanceId is generally only meant for internal consumption, and it identifies the entire service package. That means if you have multiple code packages (executables) in your service package, they will all get the same ID.

So this is a bit hacky at best. But generally it should show you that all the info that you require should be available and queryable and when you use the right service type for the right job things get easier :wink:

Regards
Daniel

Hi Daniel,
thanks for this.

Yes, it looks a bit hacky. But I don’t think being able to pause the endpoints is a good reason to make the services stateful… That would be hacky too IMO.

Maybe my original idea of polling a setting would be better than this at the end. Because it would work automatically with any number of instances.