I have encountered a problem which is described below:
When a handler finishes processing a message, it stores some data in the database, the raises a message or an event and a local handler or an external endpoint is picking up that message to do something.
What I have noticed though, is that in a minority of cases, perhaps 5-10% of the times, the second handler does not find the data updated or inserted into the database by the time it handles the message or event. A retry fixes the situation as the data is now available.
This has been observed in two different projects which are described below:
Project 1-
NServiceBus" version=“5.2.14"
NServiceBus.Autofac” version=“5.0.0"
NServiceBus.CommonLogging” version=“2.1.0"
NServiceBus.NHibernate” version=“6.2.7"
Autofac” version=“3.5.2"
FluentNHibernate” version=“2.0.3.0”
My understanding on how nservicebus works is that when a message is published or sent and before something else picks it up, that the changes in the database from the first handled message will be persisted and available. Is that not always the case? If not so, what are the conditions that this may happen?
Since batched dispatch isn’t available in v5 those endpoints would have to rely on the outgoing message operations being enlisted in the incoming receive transaction. This depends on the transport transaction mode you are using and if the transport supports it. (Rabbit has no support for this)
In v6 and above batched dispatch would make sure that this is true unless you are running in TransactionMode.None
We are running with v6 on the latest project and Rabbit. The transactionMode is set as ReceiveOnly currently. I tried None but the issue is still manifesting.
We have an service test endpoint subscribing to an event that is raised as part of a handler which also changes the business data. Once the test framework receives that event it’s trying to immediately post a rest api request which is invalid unless the business data is updated from the first handler. It looks like the data is not always updated in time though(maybe 5-10 times out of a 100 repeats). Everything runs locally for this in Dev which includes the RabbitMQ instance, the nsb endpoint and the service test’s framework nsb endpoint
Here’s how we configure the RabbitMQ aspect of this:
var transport = endpointConfiguration.UseTransport<RabbitMQTransport>();
transport.ConnectionStringName(rabbitConnectionStringName);
transport.Transactions(TransportTransactionMode.None);
routingConfigurator.Configure(transport);
endpointConfiguration.UseSerialization<JsonSerializer>();
endpointConfiguration.EnableOutbox();
This is not correct and would cause the behavior you’re seeing since the operations would be executed immediately (but if read you correctly that was just a test and you are having the same issues with .ReceiveOnly)
With the outbox enbled its important to make sure that your “business data operations” enlist in the same storage transaction as the outbox data.
Can you verify that you’re following the instructions outlined below?
That’s correct, we are using ReceiveOnly and None was a test.
In the link that you posted, under nsb6 it instructs to use NHibernateStorageContext.Session(passed from the constructor) instead of the IMessageHandlerContext.SynchronizedStorageSession.Session()(Which is passed in the Handle method in the nsb7 documentation). The former, is marked as obsolete and I can’t use it in case it makes a difference. For the record, we are using the nsb7 method which is the IMessageHandlerContext.SynchronizedStorageSession.Session()
I can’t spot anything else Outbox specific in terms of the configuration in the wiki… here’s our nhibernate configuration in case it helps figure out the issue:
public static EndpointConfiguration EnableNHibernatePersistence(
this EndpointConfiguration endpointConfiguration)
{
var nhibernateConfig = new NHibernate.Cfg.Configuration();
nhibernateConfig.SetProperty(Environment.ConnectionProvider,
"NHibernate.Connection.DriverConnectionProvider");
nhibernateConfig.SetProperty(Environment.ConnectionDriver, "NHibernate.Driver.Sql2008ClientDriver");
nhibernateConfig.SetProperty(Environment.Dialect, "NHibernate.Dialect.MsSql2008Dialect");
nhibernateConfig.SetProperty(Environment.DefaultSchema, ConfigurationManager.AppSettings["NSBDefaultSchema"]);
nhibernateConfig.SetProperty(Environment.ConnectionStringName, NHibernateConnectionStringAppConfigVariable);
Fluently.Configure(nhibernateConfig)
.Mappings(m => m.FluentMappings.AddFromAssembly(Assembly.Load("MyFluentNHibernateMappings")))
.BuildConfiguration();
var persistence = endpointConfiguration.UsePersistence<NHibernatePersistence>();
persistence.UseConfiguration(nhibernateConfig);
return endpointConfiguration;
}
I have a hunch that the database operations are not being committed properly, can you either turn on more NH logging or do a trace against sqlserver to make sure?
I couldn’t find anything interesting via the sql server profiler to be honest. Not sure how to go troubleshooting this frankly.
One interesting fact though is that when I switch nihbernate’s logging level in log4net.config from WARN to DEBUG, the issue happens 100% of the time. It’s as if, whatever is asking for NHibernate to commit fully isn’t waiting for the entire work to finish before firing over the event to RabbitMQ that the service test framework picks up and tries to do something.
Andreas if you have any specific hints on what would be a good starting point in investigating this via the logs I’d appreciate it but I am out of ideas at this point
We are storing milliseconds in the OccuredAtUtc column and have some other means to prevent duplicates.
Unfortunately there are requirements to keep statuses as a separate insert only table and as such I can’t move it to the parent table.
In relation to caching, is NSB re-using session across different handlers? We are not using level 2 caching and as such(with my limited nhib understanding…) level 1 caching only occurs when the session is shared. The mapping as you can see from the sample is already configured with Not.LazyLoad which enforces eager loading
I verified the transaction commits via the SQL profiler and they are committed in time.
I just ran your repro. I get continuous duplicate timestamps on the status fields. You are not storing milliseconds, that requires a different NHibernate mapping. The following stores milliseconds:
Still, you are vulnerable to duplicate. The only way to solve this in your model is to make a unique index on your composite status key on OrderStatus columns OrderId and OccurredAtUtc.
If order is important then I would advise you to use a mapping configuration that makes that ordering explicit like a list:
I’ve never mapped this via code but I see that mapping has a .AsList, I tried to following and that seems to work:
.AsList(x=>x.Column("i"))
Probably that resolves your issue. Only thing that needs to change in your code is that you do not do any ordering and just use .LastOrDefault().
Thanks for taking the time to go through this and provide feedback. It’s embarrassing but I did forget to include the fluent nhib extension that forces the storing of milliseconds in the sample(sorry for wasting your time Andreas). Ramon, that index sounds like a good idea!
Let me get back into testing this and will update you.
For completeness sake, after a lot of head banging on the desk…
Here is our problem…
We are doing hexagonal architecture and we have abstracted away the messaging gateways. The dependency of our messaging gateway was an IEndpointInstance(what we used to publish the event that was put to the bus prematurely…) instead of an IMessageContext. After fixing that, the problem never happened again.
I just looked at the code that you shared earlier and there you are using the context. I verified to see if I missed that.
Not sure how your code base is structure and how you are using dependency injection but we have the Uniform Session package. Potentially this can be useful for your project.