I’m trying to configure the heartbeat on all endpoints in a generic way, so all endpoints will have heartbeat, and I want to have critical endpoints with a fast heartbeat (let’s say 30 seconds) and less critical endpoints with a slower one (let’s say 2 minutes).
But it seems that configuring the heartbeat this way doesn’t really have an effect, because what defines if an endpoint is down is a single setting in ServiceControl: HeartbeatGracePeriod. The documentation has the following note:
When monitoring multiple endpoints, ensure that heartbeat grace period is larger than any individual heartbeat interval set by the endpoints.
So, I will need to send this value to, let’s say 2 minutes and 10 seconds. Doesn’t this mean that I am actually adjusting the alarms to the less critical endpoints and ignoring the critical ones?
An alternative implementation would be that each heartbeat message would send when the next heartbeat is expected. This way every endpoint would trigger the alarm at the right time and it could even be scheduled (make heartbeats faster during certain periods during the day), although I’m not sure if this would be useful or not.