What is the best way to delete Inactive endpoints automatically.
Both from Heartbeats in SerivcePulse and Endpoint list in ServiceInsight (is it the same source maybe?)
We have retention policy (24h) in our short retention environment. I would be nice if endpoints would be cleaned up as well. The messages are cleaned up automatically.
I was thinking about calling Delete on ServiceControl api is there any simpler way to achieve this.
The reason they become inactive is because they are running in a container as part of a pull request, every container has a unique comptername and the container is automatically stopped after 4 hours.
Each branch consumes a isolated queue (using a postfix in the name)
One option would be to use ServiceControl integration events and use the HeartbeatStopped event to determine when to invoke the delete API in ServiceControl.
@mauroservienti wouldn’t deleting a perceived inactive endpoint based on a single HeartbeatStopped be risky? If heartbeats failed within heartbeat grace period, and that fine has exceeded but the endpoint is still alive, the endpoint would be removed while still running.
That’s correct, in a rush I didn’t provide more details. The endpoint that subscribes to ServiceControl events should have a saga with a timeout that allows to invoke the (not documented) HTTP API after a user-defined period of time.
ServicePulse issues a DELETE request to ServicerControl API endpoints/<endpoint-id>, see the source code here:
The endpoint ID is not it’s name, it’s the endpoint instance unique identifier and it’s available, for example, in the HeartbeatStopped ServiceControl integration event as HostId.
I have the same issue but not with endpoints, but with instances.
Every time it is auto-scaling from 1 to 10 instances and then back to 1, I need to do some cleaning of inactive endpoints and instances .
But I see in the post of @ramonsmits that there is already a github issue for that
A retention period to remove dead endpoint/instances for known scaling endpoints sounds good. But I think it is the wrong fix.
I think it would be better to be able to config some minimum instance count into a policy and only report when the number of instances goes below this. I only want to be alerted when this happens.
I don’t want to clean up dead instances in bulk or after a retention period.