Service Control Retried Failed Message Twice?

MarkBricker · February 28, 2023, 9:43pm

We recently had a message go into the error queue in production. We used ServicePulse to retry the message and it processed successfully. About 90 minutes later we noticed the same message was processed by the NServiceBus endpoint, resulting in a customer being contacted twice about the same matter.

After doing a fair bit of investigation, we found that the IIS logs for ServicePulse (we host ServicePulse and associated apps in IIS) indicated that the processing of the message the 2nd time originated from ServicePulse, likely from the same client machine that requested the first (successful) retry.

I have a couple of questions:

Is it possible for ServicePulse / ServiceControl to successfully retry a failed message that has already been successfully retried?
Any idea what action in ServicePulse would result in this? Would refreshing the browser window at just the right time/place result in a duplicate form or AJAX post?

Thanks,
Mark

dan.kent · March 2, 2023, 8:34pm

Hi Mark,

I don’t want to say it is impossible, but it is highly unlikely that ServiceControl will retry a message that was already been successfully retried. Once a message is retried, it is removed from the error queue and ServiceControl no longer has any knowledge of it.

One possible scenario: The message failed and was retried and failed again which would put it back in the error queue. Depending on the logic in the handler and when the failure happened, it is possible some of the code was executed before it failed.

You mentioned the 2nd processing of the message originated from ServicePulse. Are you able to see where the 1st processing of the message originated?

MarkBricker · March 3, 2023, 12:12am

Dan,

Thanks for getting back to me. The sequence of events was this:

Sunday morning about 5:54 a.m. one of web services sent a command to one of our endpoints.
Almost immediately, the endpoint picked up the command but our database server was having issues so the command was retried multiple times until it ended up in the error queue.
Monday morning about 10:23 a.m. one of my fellow architects used ServicePulse to resubmit the message. I was present for this and I also think I see a record of this in our ServicePulse IIS log (note my previous times were local and this is GMT):
2023-02-27 19:23:01 XXXX.XX.XXX.XXX POST /api/errors/retry
By all accounts, the message processed successfully which caused one of our customers to receive a notification.
Sometime later Monday our customer service team was contacted by a customer saying they had received two notifications from us of the same event.
Various logs indicate the same exact message, with the same message ID, was processed again at 11:53 AM, including the ServicePulse IIS log:
2023-02-27 18:53:23 XXXX.XX.XXX.XXX POST /api/errors/retry

Thanks for your help on this,
Mark

dan.kent · March 7, 2023, 6:44pm

Is there a possibility that if the message fails late in the pipeline (after the handlers are completed) that the customer notification is already sent from the handler?

MarkBricker · March 7, 2023, 7:17pm

I hope not as we try and build are systems so that if they fail, we don’t do partial work, but let me investigate that possibility and get back to you.

-Mark

ramonsmits · March 8, 2023, 12:02pm

@MarkBricker Just to mention a potential improvement regardless of the error retry behavior via ServicePulse. Non-transactional API invocations are usually best dealt with in isolation. Sending emails is such a type of invocation.

If your system is currently sending out an email as part of other tasks then we recommend offloading this so a separate handler. This way such an “email message” will only be sent to the queue once all invoked handlers completed successfully. This further minimized duplicate API invocations.

Second, when sending emails it’s recommended to set the Message-ID SMTP header to a deterministic value. If you use an “email message” you could use the message identifier for the id part and extend it with your system its FQDN similar to the following:

<01000186c105c49a-6b215eef-ca24-4588-a8fd-9dc478325b2e-000000@email.amazonses.com>

This way the receiver email application can also apply email dedupe in the mail client and should also prevent duplicates from showing in the mail client.

If your transport configuration is behaving in “receive only” it is also highly recommended to use the Outbox feature to guarantee consistency

– Ramon