Data not being there in time after successful handle

Konstantinos_Konstan · February 2, 2018, 11:29am

Hi

I have encountered a problem which is described below:
When a handler finishes processing a message, it stores some data in the database, the raises a message or an event and a local handler or an external endpoint is picking up that message to do something.

What I have noticed though, is that in a minority of cases, perhaps 5-10% of the times, the second handler does not find the data updated or inserted into the database by the time it handles the message or event. A retry fixes the situation as the data is now available.

This has been observed in two different projects which are described below:
Project 1-
NServiceBus" version=“5.2.14"
NServiceBus.Autofac” version=“5.0.0"
NServiceBus.CommonLogging” version=“2.1.0"
NServiceBus.NHibernate” version=“6.2.7"
Autofac” version=“3.5.2"
FluentNHibernate” version=“2.0.3.0”

MSMQ + MSDTC + SQL Server 2008 + Windows 10

Project 2

NServiceBus" version=“6.2.1"
NServiceBus.Autofac” version=“6.0.1"
NServiceBus.NHibernate” version=“7.2.1"
NServiceBus.RabbitMQ” version=“4.3.1"
RabbitMQ.Client” version=“4.1.3"
Autofac” version=“4.6.2"
FluentNHibernate” version=“2.0.3.0”

RabbitMQ + Outbox + SQL Server 2008 + Windows 10

My understanding on how nservicebus works is that when a message is published or sent and before something else picks it up, that the changes in the database from the first handled message will be persisted and available. Is that not always the case? If not so, what are the conditions that this may happen?

Thanks,
Kostas

andreasohlund · February 8, 2018, 1:04pm

This behavior will depend on a few things:

Since batched dispatch isn’t available in v5 those endpoints would have to rely on the outgoing message operations being enlisted in the incoming receive transaction. This depends on the transport transaction mode you are using and if the transport supports it. (Rabbit has no support for this)
In v6 and above batched dispatch would make sure that this is true unless you are running in TransactionMode.None

Does the above apply to your endpoints?

Konstantinos_Konstan · February 12, 2018, 3:01pm

Hey Andreas,

Thanks for the reply.

We are running with v6 on the latest project and Rabbit. The transactionMode is set as ReceiveOnly currently. I tried None but the issue is still manifesting.

We have an service test endpoint subscribing to an event that is raised as part of a handler which also changes the business data. Once the test framework receives that event it’s trying to immediately post a rest api request which is invalid unless the business data is updated from the first handler. It looks like the data is not always updated in time though(maybe 5-10 times out of a 100 repeats). Everything runs locally for this in Dev which includes the RabbitMQ instance, the nsb endpoint and the service test’s framework nsb endpoint

Here’s how we configure the RabbitMQ aspect of this:

        var transport = endpointConfiguration.UseTransport<RabbitMQTransport>();
        transport.ConnectionStringName(rabbitConnectionStringName);
        transport.Transactions(TransportTransactionMode.None);
        routingConfigurator.Configure(transport);
        endpointConfiguration.UseSerialization<JsonSerializer>();
        endpointConfiguration.EnableOutbox();

andreasohlund · February 12, 2018, 3:22pm

transport.Transactions(TransportTransactionMode.None);

This is not correct and would cause the behavior you’re seeing since the operations would be executed immediately (but if read you correctly that was just a test and you are having the same issues with .ReceiveOnly)

With the outbox enbled its important to make sure that your “business data operations” enlist in the same storage transaction as the outbox data.

Can you verify that you’re following the instructions outlined below?

Konstantinos_Konstan · February 12, 2018, 5:01pm

That’s correct, we are using ReceiveOnly and None was a test.

In the link that you posted, under nsb6 it instructs to use NHibernateStorageContext.Session(passed from the constructor) instead of the IMessageHandlerContext.SynchronizedStorageSession.Session()(Which is passed in the Handle method in the nsb7 documentation). The former, is marked as obsolete and I can’t use it in case it makes a difference. For the record, we are using the nsb7 method which is the IMessageHandlerContext.SynchronizedStorageSession.Session()

I can’t spot anything else Outbox specific in terms of the configuration in the wiki… here’s our nhibernate configuration in case it helps figure out the issue:

    public static EndpointConfiguration EnableNHibernatePersistence(
        this EndpointConfiguration endpointConfiguration)
    {
        var nhibernateConfig = new NHibernate.Cfg.Configuration();
        nhibernateConfig.SetProperty(Environment.ConnectionProvider,
        "NHibernate.Connection.DriverConnectionProvider");
        nhibernateConfig.SetProperty(Environment.ConnectionDriver, "NHibernate.Driver.Sql2008ClientDriver");
        nhibernateConfig.SetProperty(Environment.Dialect, "NHibernate.Dialect.MsSql2008Dialect");
        nhibernateConfig.SetProperty(Environment.DefaultSchema, ConfigurationManager.AppSettings["NSBDefaultSchema"]);
        nhibernateConfig.SetProperty(Environment.ConnectionStringName, NHibernateConnectionStringAppConfigVariable);
        Fluently.Configure(nhibernateConfig)
            .Mappings(m => m.FluentMappings.AddFromAssembly(Assembly.Load("MyFluentNHibernateMappings")))
            .BuildConfiguration();
        var persistence = endpointConfiguration.UsePersistence<NHibernatePersistence>();
        persistence.UseConfiguration(nhibernateConfig);

        return endpointConfiguration;
    }

andreasohlund · February 13, 2018, 10:00am

I have a hunch that the database operations are not being committed properly, can you either turn on more NH logging or do a trace against sqlserver to make sure?

andreasohlund · February 13, 2018, 12:57pm

I need to correct my self here, this is not true, batched dispatch is always in use regardless of transaction mode as stated in our doco.

Sorry for the confusion!

Konstantinos_Konstan · February 15, 2018, 3:43pm

I couldn’t find anything interesting via the sql server profiler to be honest. Not sure how to go troubleshooting this frankly.

One interesting fact though is that when I switch nihbernate’s logging level in log4net.config from WARN to DEBUG, the issue happens 100% of the time. It’s as if, whatever is asking for NHibernate to commit fully isn’t waiting for the entire work to finish before firing over the event to RabbitMQ that the service test framework picks up and tries to do something.

Andreas if you have any specific hints on what would be a good starting point in investigating this via the logs I’d appreciate it but I am out of ideas at this point

Thanks

andreasohlund · February 15, 2018, 4:00pm

Again just a hunch but could there be some awaits missing in your code?

Can you do a review or install NuGet Gallery | Particular.CodeRules 0.8.0 ?

Konstantinos_Konstan · February 16, 2018, 5:47pm

Hey Andreas

I installed it, we had a few missing ConfigureAwaits(so easy to miss )
Unfortunately the issue still persisted

I have uploaded a sample that reproduces the problem. You may need to run it a few times before you see a repro.

In order to run the sample it requires local mssql and rabbit.It includes a readme.
You can find the source here: https://drive.google.com/open?id=1jg1qjHzZ3_QX01jUZC4-ZKbThpy3kv_L

I added an entity that closely resembles what we have in this project. It’s obfuscated to a certain degree for the usual reasons.

Let me know if you manage to reproduce the issue. Here’s a picture of how a repro will look like 2018-02-16_1743

andreasohlund · February 19, 2018, 8:24am

Hi, I was able to repro the issue but it seems to be related to how you are storing the “canceled” status.

If I just put a boolean flag in the root Order entity and set it to true when canceled it works as expected

public virtual bool WasCanceled { get; set; } it works ok.

My hunch is that there is some NHibernate caching at play here causing the latest status not to be loaded from the join table properly?

Perhaps fiddle with the lazy/eager loading?

Also the orderby desc on dates looks dangerous to, could there be duplicate timestamps?

                  OrderStatuses
                        .OrderByDescending(s => s.OccurredAtUtc)
                        .FirstOrDefault();

Hope this helps!

Konstantinos_Konstan · February 19, 2018, 10:19am

Thanks Andreas

We are storing milliseconds in the OccuredAtUtc column and have some other means to prevent duplicates.

Unfortunately there are requirements to keep statuses as a separate insert only table and as such I can’t move it to the parent table.

In relation to caching, is NSB re-using session across different handlers? We are not using level 2 caching and as such(with my limited nhib understanding…) level 1 caching only occurs when the session is shared. The mapping as you can see from the sample is already configured with Not.LazyLoad which enforces eager loading

andreasohlund · February 19, 2018, 1:38pm

Yes all handlers for a given message instance will use the same NHibernate session.

Since you have only one handler per message type this shouldn’t be an issue?

The mapping as you can see from the sample is already configured with Not.LazyLoad which enforces eager

I’m not an NH expert but are there some setting around have NH is writing changes in child collections to the database?

andreasohlund · February 19, 2018, 1:54pm

I still think something is off with the order by on the timestamp, If I change IsActive to

public virtual bool IsActive => OrderStatuses.Count < 2;

all works as expected?

andreasohlund · February 19, 2018, 1:55pm

public virtual bool IsActive => OrderStatuses.All(s => s.OrderStatusType != OrderStatusType.Cancelled);

works as well?

ramonsmits · February 19, 2018, 1:58pm

Hi Konstantinos,

I verified the transaction commits via the SQL profiler and they are committed in time.

I just ran your repro. I get continuous duplicate timestamps on the status fields. You are not storing milliseconds, that requires a different NHibernate mapping. The following stores milliseconds:

Map(x => x.OccurredAtUtc).CustomType<NHibernate.Type.TimestampType>().Not.Update();

Still, you are vulnerable to duplicate. The only way to solve this in your model is to make a unique index on your composite status key on OrderStatus columns OrderId and OccurredAtUtc.

If order is important then I would advise you to use a mapping configuration that makes that ordering explicit like a list:

I’ve never mapped this via code but I see that mapping has a .AsList, I tried to following and that seems to work:

.AsList(x=>x.Column("i"))

Probably that resolves your issue. Only thing that needs to change in your code is that you do not do any ordering and just use .LastOrDefault().

Regards,
Ramon

Konstantinos_Konstan · February 19, 2018, 2:26pm

hey Ramon, Andreas

Thanks for taking the time to go through this and provide feedback. It’s embarrassing but I did forget to include the fluent nhib extension that forces the storing of milliseconds in the sample(sorry for wasting your time Andreas). Ramon, that index sounds like a good idea!

Let me get back into testing this and will update you.

Many thanks,

andreasohlund · February 19, 2018, 2:39pm

No problem, we are here to help. Glad you got it going!

Konstantinos_Konstan · February 19, 2018, 5:39pm

For completeness sake, after a lot of head banging on the desk…

Here is our problem…

We are doing hexagonal architecture and we have abstracted away the messaging gateways. The dependency of our messaging gateway was an IEndpointInstance(what we used to publish the event that was put to the bus prematurely…) instead of an IMessageContext. After fixing that, the problem never happened again.

Thanks for the help guys, really appreciate it.

ramonsmits · February 20, 2018, 12:07pm

I just looked at the code that you shared earlier and there you are using the context. I verified to see if I missed that.

Not sure how your code base is structure and how you are using dependency injection but we have the Uniform Session package. Potentially this can be useful for your project.

Uniform Session • UniformSession Support • Particular Docs

– Ramon