SQL Persistence Saga race condition issue?

sql-persistence
nservicebus
sagas

#1

I am using
NServiceBus 6.3.4
NServiceBus.Persistence.Sql 3.0.4

I had, what appears to be a race condition in the SQL Persistence so that the Saga was not marked as completed when it should have been.

2018-02-27 11:53:28.3693 - ThreadId=201 Starting Saga…

2018-02-27 11:55:01.1513 - ThreadId=133 Skipping. Already in progress.
2018-02-27 11:55:01.1513 - ThreadId=147 Sending [0] changed messages.
2018-02-27 11:55:01.1513 - ThreadId=147 Finished - saga
2018-02-27 11:55:01.183 - ThreadId=133 Saga: ‘XXX’ with Id: ‘5827dccd-210c-4a1e-b295-a8940115690d’ has completed.

A saga was kicked off, running happily coordinating messages.

It received the message to let it know that it should be completed at the same time that it also received another start message for the same ID. The final message listed above is outputted AFTER the call to MarkAsComplete() by NServiceBus.

The IAmStartedBy handler of the saga checks the saga data to see if it is already running, and if so, just exists
if (Data.IsRunning)
{
Log.Warn(“Skipping. Already in progress.”);
return Task.CompletedTask;
}

However, the Saga was never completed and stayed as a row in the database. Anything else kicked off for that ID would get the “already in progress” warning.

2018-02-27 12:00:01.2163 - ThreadId=141 Skipping. Already in progress.

There were no exceptions, errors, etc.


(Mark Gould) #2

I’ve also had this issue, I haven’t been able to reliably reproduce it yet.


(Tim Bussmann) #3

Hey @macdonald-k ,

I’m not sure I understood all the details you described, could you share your saga code with us? How do you set Data.IsRunning?

Are you sure this is not a new saga created for the same correlation property value after the saga has been completed as stated in the log file?


(Mark Gould) #4

Sorry for threadjacking, but this might help. I’ve been able to reproduce it with something very similar to our existing setup.

Essentially:

Saga can be started by two messages (StartOrder, ChangeOrder) and each requests a timeout if one hasn’t already been requested. In the client I’m sending StarOrder followed by 10 ChangeOrder at random intervals. I only request a timeout once, so I set a property called TimeoutRequested in the saga to know if I should request another.

Upon timeout completion the Saga will complete. In this example, lots of concurrency exceptions get thrown, and then the saga completes, but doesn’t remove itself, so when a message comes in again (after completion) it picks up the existing Saga and it still sees TimeoutRequested is set, so it never completes.

It may be that I have to stop using the TimeoutRequested logic but it seems strange that a saga can complete (as you can see in the attached log) and then not clean up.

To reproduce, open client/server and press S from client (modified persistence sample). To verify the bug you can simply query the saga table after it finishes and see the row is still there. It doesn’t seem to happen every time, so it may take a couple of attempts.

simple_sqlpersistence_3.zip (74.8 KB)nsb_log_2018-03-02_0.zip (3.1 KB)

I have also attached a log of where it has happened. I think Macdonald-k is doing something similar with his IsRunning property.

Hope that helps. Let me know if I can provide more info.


(Mark Gould) #5

To try and pinpoint the issue I’ve logged the Saga Id, so you can see the same saga instance completes and then receives further messages. I’ve bolded the completion lines and removed the stack traces.

2018-03-02 21:40:46.196 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Received StartOrder message 6995c33a-3bfc-440c-9de1-d23fb6fe5a60. Starting Saga
2018-03-02 21:40:46.202 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order will complete in 3 seconds
2018-03-02 21:40:47.111 INFO ShipOrderHandler Order Shipped. OrderId 6995c33a-3bfc-440c-9de1-d23fb6fe5a60
2018-03-02 21:40:47.172 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:47.173 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:47.172 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:47.195 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:47.181 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:47.188 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:48.120 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:48.121 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:48.155 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:48.128 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘9cc452de-0d68-40f0-9dc2-a898003ca307’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 1.
2018-03-02 21:40:48.128 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘1e3386ef-62a3-4446-baf2-a898003ca357’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 1.
2018-03-02 21:40:48.191 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:48.182 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:48.207 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:48.148 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:48.199 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:49.133 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘8a88e746-d7ad-4a08-bce3-a898003ca462’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 2.
2018-03-02 21:40:49.136 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:49.176 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:49.182 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:49.213 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:49.137 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:49.135 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘1e3386ef-62a3-4446-baf2-a898003ca357’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 3.
2018-03-02 21:40:49.136 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:49.308 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:49.136 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:49.556 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:49.486 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:49.520 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:49.662 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:50.297 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘65eb16d2-5756-4ab5-9483-a898003ca5ba’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 4.
2018-03-02 21:40:50.297 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘84b5f966-88f8-486b-9ee4-a898003ca5c4’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 4.
2018-03-02 21:40:50.298 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘8a88e746-d7ad-4a08-bce3-a898003ca462’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 4.
2018-03-02 21:40:50.332 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:50.340 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:50.299 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:50.357 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:50.299 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘85867864-4f47-49eb-b612-a898003ca538’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 4.
2018-03-02 21:40:50.322 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:50.310 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:50.599 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:50.538 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:50.507 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:50.780 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:51.309 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘65eb16d2-5756-4ab5-9483-a898003ca5ba’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 6.
2018-03-02 21:40:51.320 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 with OrderId 6995c33a-3bfc-440c-9de1-d23fb6fe5a60 completed
2018-03-02 21:40:51.344 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:51.357 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:51.311 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘84b5f966-88f8-486b-9ee4-a898003ca5c4’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 6.
2018-03-02 21:40:51.311 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘2634f5aa-8c01-4e3f-b2a3-a898003ca68e’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 6.
2018-03-02 21:40:51.378 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:51.400 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:51.414 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:51.406 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested
2018-03-02 21:40:52.320 INFO NServiceBus.RecoverabilityExecutor Immediate Retry is going to retry message ‘84b5f966-88f8-486b-9ee4-a898003ca5c4’ because of an exception:
System.Exception: Optimistic concurrency violation when trying to save saga OrderSaga e1203148-024a-4ac1-8434-a898003ca2e2. Expected version 9.
2018-03-02 21:40:52.332 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Order changed.
2018-03-02 21:40:52.338 INFO OrderSaga Saga Id e1203148-024a-4ac1-8434-a898003ca2e2 Timeout already requested


(Mark Gould) #6

I believe I’ve fixed the issue: https://github.com/Particular/NServiceBus.Persistence.Sql/pull/261/files


(Szymon Pobiega) #7

Thanks for reporting this @macdonald-k. I can confirm that the fix @markgould provided solves the problem. Awesome job! We’ll be releasing it as soon as possible.


(Szymon Pobiega) #8

@macdonald-k We’ve just released the 3.0.5 patch based on the fix provided by @markgould. Can you give it a try?


#9

Thanx. Will upgrade and run a test.