Using external monitoring to flag errors in endpoints (nagios/xymon/etc)

Anton_Sigurjonsson · August 22, 2019, 4:45pm

Hi,

We need to monitor our instance(s) with external monitoring tool - create flags if there is something that needs attention for OP team.

Is it possible to use something, just simple as a powershell script, or any coding for that manner, to read (every x minutes) the status we can currently see and view on the Pulse website. Looking for the failed messages / message groups preferable by endpoint statistics.

We will then feed this information into our external monitoring tools so we have a flag if there are errors we need to review and keep long term status for that.

I didn’t find any relevant information on how to query this data.

I know we have the performance counters - but they only show what’s actually being processed

danielmarbach · August 27, 2019, 12:04pm

Hi

We have a bunch of samples that show how you can use the live data and feed it into a monitoring tool of your liking. For example

Another option would be to write it the trace and then use that to feed your monitoring tool

Does that help?

Regards
Daniel

Anton_Sigurjonsson · August 27, 2019, 5:10pm

Ok. Thank you for the documentations, they did not popup during my search. Might be more extensive solution to my current problem.

I was looking/poking around in the ServicePulse. Found more what I was looking for. Can do a http request to get a JSON file for different groups of messages.

In Powershell, i run:
$obj = Invoke-RestMethod -Uri ‘http://localhost:33333/api/errors?status=unresolved’
Write-Host Total unresolved messages: $obj.Count

Wrapping this into a monitoring script, I can monitor total failed messages (unresolved) and someone will look into the issue. We aim for a zero unresolved messages in Pulse

I could also run a quick loop and group them together by message types etc for a tiny bit more adanced monitoring.

Does someone see this as a negative way of monitoring if there are any failed (unresolved) messages on a given node?

danielmarbach · August 30, 2019, 12:44pm

Anton,

Just as an FYI. The APIs that we expose are not officially supported. Therefore if you rely on an API being there and returning a certain shape of data that might break between releases.

Regards
Daniel

MarcS · September 6, 2019, 8:05am

We are looking for something similar. We have a few (10+) of ServiceControl instances (each owned by different teams). During day time these teams monitor their own ServicePulse instance to see what is going on in their system.

At night we have only one operator that needs to monitor more than 100 different applications and systems on different platforms (mainframe, databases, servers in multiple data centers, …). It is impossible for them to also open these ServicePulse instances to look if there are any failed messages that need urgent attention.

These operators are currently using a monitorring system (I don’t know which) that can do web calls and act on the result returned. At the moment they are also querying the different ServiceControl API’s, but they would prefer an officially supported solution.