Issues with MongoDB persistence on Azure

viwaABF · March 7, 2025, 3:50pm

So we are looking into switching our persistence from Azure Table Storage to something else and our eye has dropped on MongoDB.
The reason we are switching is because we are hitting the 64k Limit of Azure Table storage.
We have provisioned Azure CosmosDB for MongoDB in Azure for this.

We provisioned 2 types

Serverless (RU)
vCore model

Next, we swapped to either to test and run checks, however we noticed the following.
We have 2 sagas (a old (v2) and a new (v3)) which both respond to OrderResponseV1 as a Handle
However we have noticed that on the serverless, when this message returns the system fails to work because of the following error:

Response status code does not indicate success: BadRequest (400); Substatus: 1101; ActivityId: 9d16641a-bb80-4a2f-acd9-66d97816e127; Reason: (Message: {"Errors":["Transaction is not active"]}

We do not get this error on the vCore model however.

How does this happen and can we fix this?

danielmarbach · March 11, 2025, 12:44pm

Hi

Multi-document ACID transactions are only supported in the vCore model with Azure Cosmos DB for MongoDB and not in the serverless model

You could be using MongoDB atlas which supports transactions in the flex model it seems.

Are you hitting the size limits across the board, or only for a few sagas? How much storage do you actually require for those sagas? We might be able to give you some kind of workaround so that you don’t have to switch persistences necessarily but we would first need to know a bit more about your use cases. Given that you are in Azure is there any reason you are not looking at CosmosDB or SQL Azure?

Regards,
Daniel

viwaABF · March 11, 2025, 1:12pm

Hey Daniel,

Thanks for that document, that tells us enough unfortunately.
Going to MongoDB Atlas is not an option as it would run “outside our network” as far as I can see.

Additionally, we have checked CosmosDB and SQL, however MongoDB would be the easiest to shift to.

For CosmosDB we would need to implement an additional interface on our messages, which is not to bad to do but requires additional work. However, going to way would also mean a hard vendor-lock in with Azure which we are trying to prevent as much as possible.

SQL however is a whole different story, we have separated our saga data in our domain, so they are in a different assembly. Additionally because of some bad decisions in the past some models are shared between sagas, for example a customer model can be used in multiple saga’s.
This caused issues with the creation of indexes. So going this route would mean more re-work.

As far as hitting this limit, for now it is only a few saga’s. These are sales which have quite a bit of data required to send it to our ERP system.
We have already reduced this amount of data in the past, but with growth we are hitting this limit again. Additionally we do not want to be limited by this 64k limit anymore on table storage. Hence the investigation into switching our persistence.

danielmarbach · March 12, 2025, 1:42pm

Hi

Thanks for the update.

I think the Databus / ClaimCheck or NServiceBus.Attachements should also work. Have you considered offloading parts of the data on the message to the database properties?

Going to MongoDB Atlas is not an option as it would run “outside our network” as far as I can see.

With the FlexServer yes. With a dedicated cluster, that is a different story, but then you are back to the vCore model anyway and can use the CosmosDB offering.

Daniel

viwaABF · March 14, 2025, 8:51am

Hello Daniel,

The issue is not with the size of the message, but the size of the data stored in Table Storage of the Saga.
Certain Saga data is to much for Table Storage to handle.

Our RabbitMQ system can handle these large messages just fine.

ramonsmits · March 14, 2025, 9:58am

Is all that data used for orchestration? A common pattern is to only have data in the saga relevant for flow control if there is a lot of data you can store that separately from the saga state.

This also has the benefit to not having to do large amounts of IO for orchestration and prevent reading data not relavant for orchestration.

You can have both a handler and a saga process one or more of the same messages. Use the saga for orchestration and the handlers for storage interaction.

Another option is to have the saga forward the data but that tightly couples the data schema and also makes the saga responsible for it and you might want to split these responsibilities.

– Ramon

viwaABF · March 14, 2025, 10:29am

Hello Ramon,

Unfortunately we need all this data.
Additionally, even if we manage to cut down the data it can still happen.

This is one of the processes which holds all the items which are sent to our ERP when an order is placed. This can be 10 items, but also 100.
These sales used to be smaller, but are growing with time so even if we trim it down we would still run into this issue in the future.

A slight delay because of IO is acceptable. But the currently when it happens the sales needs to be split which takes a significant amount of time.
Hence the change in persistence.

SeanFeldman · March 14, 2025, 1:26pm

I agree with Ramon. Saga’s data/state is designed for the orchestration. Using it for anything else is a solution design issue. The documentation has a good amount of information on the topic.

Warning
Other than interacting with its own internal state, a saga should not access a database, call out to web services, or access other resources - neither directly nor indirectly by having such dependencies injected into it.

ramonsmits · March 14, 2025, 3:55pm

Can you maybe provide hints on what the source is that creates these message currently processed by the saga? Is the data source considered part of the same service boundary

Based on you mentioning ERP it seems you’re dealing with an integration scenario. Likely the ERP isn’t really part of a specific service boundary and this saga is responsible to replicate sales data to the ERP. Is it an aggregating saga that takes action when all items have been aggregated?

– Ramon

viwaABF · March 20, 2025, 1:35pm

I think there is some confusion, i will also try to answer to Ramon as well.

Our Saga is used to place an order.
What this basically does is

Receive a BasketId for which we want to place an order
The system sends out multiple messages to multiple other endpoints to retrieve data like Customers, Stock and Products
These responses get aggregated in the orchestration and when all data is available, the order is sent to our external ERP system

Within the saga we make sure that all data is loaded before we will send it to the ERP system
The issue we have is that the customer adds so much products to it’s basket that the Table Storage cannot handle the amount of saga data.

In this process, no data is being manipulated anywhere in this process and we are not calling an external database whatsoever.

Initially this process used more generic messages which included data which was not needed for this orchestration. This data was already cut out.
However, since we are running into this is issue again, one of the questions was if we could cut out any other data so this does not get persisted to the Table Storage.

However, at this time we have decided to look into moving to SQL Persistence again and are going to run a PoC with this.

If this does not pan out the way we want we are going to look further into MongoDB Atlas