ServiceControl - A message import has failed. A log file has been written

servicecontrol
nservicebus

(Ron) #1

Simple walk through first on what I’m trying to accomplish.
I’ve set up a NServiceBus endpoint that calls a WCF service that sits on another server. We’ve purposly broke the reference to the service to help our support team walk through replaying messages through ServicePulse. The expected outcome is that all 12 messages will fail FLR after 3 attempts and they will end up in ServicePulse for our team to replay.

What is happening is different. With several attempts at doing so we are seeing anyware from 0 messages showing up in ServicePulse, to 3 or 4. Missing the remaining to make 12. I’ve verified connectivity and that ServiceControl is working as expected.

In troubleshooting this issue i’ve signed into our server for ServiceControl/Pulse and noticed something in the event logs. For each message that didn’t show up in ServicePulse, there is an Error event (source ServiceControl) in the event logs. Each one states:
A message import has failed. A log file has been written to C:\ProgramData\Particular\ServiceControl\Particular.ServiceControl\Logs\FailedImports\Error\REPLACEFILENAMEHERE.txt

When i open the corrisponding log file here is what they state:
Exception:
The RegEx engine has timed out while trying to match a pattern to an input string. This can occur for many reasons, including very large inputs or excessive backtracking caused by nested quantifiers, back-references and other factors.
StackTrace:

   at System.Text.RegularExpressions.RegexRunner.DoCheckTimeout()
   at Go91(RegexRunner )
   at System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text, Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean quick, TimeSpan timeout)
   at System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen, String input, Int32 beginning, Int32 length, Int32 startat)
   at System.Text.RegularExpressions.MatchCollection.GetMatch(Int32 i)
   at System.Text.RegularExpressions.MatchEnumerator.MoveNext()
   at System.Linq.Enumerable.<CastIterator>d__94`1.MoveNext()
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at ServiceControl.Recoverability.ExceptionTypeAndStackTraceFailureClassifier.ClassifyFailure(ClassifiableMessageDetails failure) in C:\BuildAgent\work\7189a56f9f44affc\src\ServiceControl\Recoverability\Grouping\Groupers\ExceptionTypeAndStackTraceFailureClassifier.cs:line 21
   at ServiceControl.Recoverability.ClassifyFailedMessageEnricher.<Enrich>d__2.MoveNext() in C:\BuildAgent\work\7189a56f9f44affc\src\ServiceControl\Recoverability\Grouping\ClassifyFailedMessageEnricher.cs:line 24
   at System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
   at ServiceControl.Operations.FailedMessageFactory.GetGroups(String messageType, FailureDetails failureDetails, ProcessingAttempt processingAttempt) in C:\BuildAgent\work\7189a56f9f44affc\src\ServiceControl\Operations\FailedMessageFactory.cs:line 26
   at ServiceControl.Operations.ErrorQueueImport.InnerHandle(TransportMessage message) in C:\BuildAgent\work\7189a56f9f44affc\src\ServiceControl\Operations\ErrorQueueImport.cs:line 151
   at ServiceControl.Operations.ErrorQueueImport.Handle(TransportMessage message) in C:\BuildAgent\work\7189a56f9f44affc\src\ServiceControl\Operations\ErrorQueueImport.cs:line 81
   at NServiceBus.Satellites.SatelliteLauncher.HandleMessageReceived(Object sender, TransportMessageReceivedEventArgs e, ISatellite satellite) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Satellites\SatelliteLauncher.cs:line 89
   at NServiceBus.Unicast.Transport.TransportReceiver.OnTransportMessageReceived(TransportMessage msg) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Unicast\Transport\TransportReceiver.cs:line 413
   at NServiceBus.Unicast.Transport.TransportReceiver.ProcessMessage(TransportMessage message) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Unicast\Transport\TransportReceiver.cs:line 344
   at NServiceBus.Unicast.Transport.TransportReceiver.TryProcess(TransportMessage message) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Unicast\Transport\TransportReceiver.cs:line 230
   at NServiceBus.Transports.Msmq.MsmqDequeueStrategy.Action(TransactionSettings transactionSettings, TransactionOptions transactionOptions, MsmqUnitOfWork unitOfWork, MessageQueue receiveQueue, MessageQueue errorQueue, CircuitBreaker circuitBreaker, CriticalError criticalError, AutoResetEvent peekResetEvent, TimeSpan receiveTimeout, SemaphoreSlim throttlingSemaphore, Func`2 tryProcessMessage, Action`2 endProcessMessage) in C:\BuildAgent\work\3206e2123f54fce4\src\NServiceBus.Core\Transports\Msmq\MsmqDequeueStrategy.cs:line 229
Source:
System
TargetSite:
Void DoCheckTimeout()

I find this issue to be odd because, it is hit or miss on the number of messages that actually end up in ServicePulse. Also when I replay the failed ones, some will come back (as expected) and some will disappear with a Event of Error showing up in the event logs.

Any help would be greatly appreciated.
NSB version 5.2.14

Thanks,
Ron


(Ramon Smits) #2

Hi Ron,

It seems that our stack trace parser has an issue. We have fixed an issue with that in the past. Which version of ServiceControl are you running? The current version is 1.43.1

Run the platform installer and update to the latest versions or download the latest releases here: https://particular.net/downloads

If you are running the latest version would it be possible that you share one of the failed messages with us?

Regards,
Ramon


(Ron) #3

Sorry for the late response. Been a little bit before I could get back to this issue.

We tried the upgrade with no success. So, I took a step back. I pulled down the code we currently have running in production (without the issues in this topic). Ran it through the same scenario in our development environment and got what would be expected to be the correct results. All 12 messages showed up in servicepulse no problem.

This had me confused as no code was changed from the message standpoint. That’s when I remembered. I changed up our messages to be NuGet packages. In doing so, I failed to remember the references on those packages to other nuget packages. Long story short… My endpoints were a couple major versions higher on Newtonsoft.Json then the nuget packages I made for my messages. The references ended up being a few layers deep, so it took some digging to figure this out. Once I updated all packages for all sources to use the same version, my problem went away. Now this doesn’t point out exactly why they worked “some of the time”. At this point I’m going to move forward, as I’ve already surpassed the allotted time I have to work on this project. More power to anyone who is willing to dig further.

For those interested, the message (a command) referenced a common library (also nuget package) that referenced Newtonsoft.Json 6.0.8. When I updated the common library to a greater version, it bumped to Newtonsoft.Json 9.0.1. So an endpoint that references 9.0.1 and sends a message referencing 6.0.8.

I will say though, one thing scary about the whole situation is the lost messages. Those that did not get processed by service control properly, where never found. And if I wouldn’t have had other logging/alerting measures in place, no one would have know any better until the system became corrupted by the missing data those messages provided.

Thanks for helping me push forward ramonsmits! Hopefully if anyone else runs across this issue, this post can save them a few days/hours of headache!

–Ron


(Ramon Smits) #4

Thanks for that followup.

It seems your messages are not POCO’s. In general, the recommendation is that messages are simple interfaces or classes and to put these in their own assembly that doesn’t reference any other assemblies. You can even remove the dependency on NServiceBus if you want to by using unobtrusive mode

I do want to get back to your remark about lost messages. You didn’t lose any messages and will respond a bit later on how you can recover those messages.

Regards,
Ramon


(Ron) #5

Yeah after posting my last reply I had a discussion with another developer here that mentioned the same thing, The reason the other libraries were brought in was to use an attribute on one of the properties that needed to be masked when logging. Looking at that situation know it makes sense that, that should belong somewhere else.

Thanks again! Looking forward to your response on how to recover the lost messages!

Thanks
Ron


(Ramon Smits) #6

Hi Ron, we’ve made improvements regarding those import errors. See the following SC release:

See the following guidance on how to import the failed audits again:

Regards,
Ramon


(Ron) #7

Thanks for getting back to me!