Audit Dead Letter Queue ASB

servicecontrol

#1

Hi,
We have a process that generates over a million messages over the course of a hour or two every morning. My audit queue starts to fill up and messages get processed but eventually, messages start failing, get retried, and then end up in the audit queue’s DLQ with the reason of “MaxDeliveryCountExceeded”. The reply to is from the queue that is doing the processing and the target is the audit queue.

I have watched the server that is running service control during this time and find the service control process runs between 25% and 100% but is generally around 50%. It’s an m4.xlarge AWS EC2 instance (4 vcpus/16gb ram).

Do I just need a bigger service control box or is there something else to look at? Let me know what else you need to know.
Thanks,
Nate


(Daniel Marbach) #2

Hi nate

Which version of ServiceControl are you using?

What you can try to do is to configure the batch size to something lower than the default 1000. For example you could set in to 1 to get started. Generally SC in its current configuration depending on the hardware can handle hundreds of msgs per second but the default prefetching of a 1000 can be a bit harmful under high load because the lock renewal time starts counting once the messages are on the client even though they are not handled yet. This can lead to message lock lost problems and thus exceeding the delivery count.

https://docs.particular.net/transports/azure-service-bus/configuration/azureservicebusqueueconfig see BatchSize. Once this sorts it you can start experiment with different batchsize, maxdeliverycount and lock renewal settings

We arr currently working on the next version of ServiceControl that should ship with better defaults and better thtoughput and ressource usage. Would you be interested in joining a beta program?

Daniel


#3

Hi.
1.47.5. Sorry I meant to include that.
Thanks,
Nate


#4

You are suggesting that I change the app.config for SC, correct? When I change it per the docs, it fails to start. They don’t specify if I should leave the connection string in, but it seemed redundant so i took it out.

App Config

<configuration>
<configSections>
  <section name="AzureServiceBusQueueConfig"
		   type="NServiceBus.Config.AzureServiceBusQueueConfig, NServiceBus.Azure.Transports.WindowsAzureServiceBus" />
</configSections>
<AzureServiceBusQueueConfig ConnectionString="Endpoint=sb://xxxx.servicebus.windows.net/;SharedAccessKeyName=xxxxxx;SharedAccessKey=xxxx=" 
	BatchSize="1"
/>
<appSettings>
    <add key="ServiceControl/DBPath" value="d:\ServiceControl\Particular.ServiceControl.PROD\DB" />
    <add key="ServiceControl/TransportType" value="NServiceBus.AzureServiceBusTransport, NServiceBus.Azure.Transports.WindowsAzureServiceBus" />
    <add key="Raven/Esent/MaxVerPages" value="2048" />
    <add key="ServiceControl/HostName" value="xxxx.io" />
    <add key="ServiceControl/Port" value="33333" />
    <add key="ServiceControl/LogPath" value="d:\ServiceControl\Particular.ServiceControl.PROD\Logs" />
    <add key="ServiceControl/ForwardAuditMessages" value="False" />
    <add key="ServiceControl/ForwardErrorMessages" value="False" />
    <add key="ServiceControl/AuditRetentionPeriod" value="7.00:12:56.3580000" />
    <add key="ServiceControl/ErrorRetentionPeriod" value="10.13:59:36.9970000" />
    <add key="ServiceBus/AuditQueue" value="prod.audit" />
    <add key="ServiceBus/ErrorQueue" value="prod.error" />
</appSettings>
<connectionStrings>
</connectionStrings>
<runtime>
    <gcServer enabled="true" />
</runtime>
</configuration>

Event Log Error
Log Name: Application
Source: Particular.ServiceControl.PROD
Date: 7/25/2018 5:39:34 AM
Event ID: 0
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ProdNServiceBusMonitor
Description:
Service cannot be started. Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor ‘Void .ctor(ServiceBus.Management.Infrastructure.Settings.Settings)’ on type ‘CheckDeadLetterQueue’. —> Object reference not set to an instance of an object. (See inner exception for details.) —> System.NullReferenceException: Object reference not set to an instance of an object.
at ServiceControl.ASB.DLQMonitor.CheckDeadLetterQueue…ctor(Settings settings) in C:\BuildAgent\work\7189a56f9f44affc\src\ServiceControl.ASB.DLQMonitor\CheckDeadLetterQueue.cs:line 19
at lambda_method(Closure , Object[] )
at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
— End of inner exception stack trace —
at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
at Autofac.Core.Resolving.InstanceLookup.Activate(IEnume…


(Daniel Marbach) #5

Hi nate,

Yes sorry I should have been more specific. I only realized it after I hit submit button. I was just about to make that comment. Sorry for letting you hang in there!

The connection string can be removed correct. Put the config under the appSettings collection please.

here is how my config looks like

<?xml version="1.0" encoding="utf-8"?>
<configuration>
<configSections>
  <section name="AzureServiceBusQueueConfig"
           type="NServiceBus.Config.AzureServiceBusQueueConfig, NServiceBus.Azure.Transports.WindowsAzureServiceBus" />
</configSections>    
    <appSettings>
...
    </appSettings>
    <connectionStrings>
        <add name="NServiceBus/Transport" connectionString="yeahforsure" />
    </connectionStrings>
    <AzureServiceBusQueueConfig BatchSize="1" />
    <runtime>
        <gcServer enabled="true" />
    </runtime>
</configuration>

An additional note. If you can I would suggest that you plan an upgrade to 2.1.3 version of ServiceControl. It slightly increases the throughput but overall makes the storage layer underneath much more stable and robust.

https://docs.particular.net/servicecontrol/upgrades/1to2

We can also help you with the upgrade and discuss the different options. Just reach out to support at particular dot net once it is time

Regards

Daniel


#6

That worked better. I’ll let you know if I still have issue with the audit DLQ.
I’ll look in to upgrading service control. I am also going to start looking in to going to NSB7.
Thanks.