Hello,
We are just beginning to use domain events in our system. These domain events have handlers which listen to the queue of domain events. Currently we have one queue (topic?) for all events and handlers. The problem I foresee is that if one handler, and there can be a lot, cannot process an event (e.g. due to a logic error, dependent service down), the entire domain event queue gets held up. Ensuring that each event is processed at least once (ideally once, but that’s a hard problem) is vital. However, I understand that if an event is continuously failing to be processed (logic error or dependent service down), I probably should skip the event or put the event into a dead letter queue.
An example: I have 2 different domain event handlers, AddToDailySalesReportHandler
and RecordSaleForAuditHandler
which handle the domain event MachineMadeSale. Today my subscriber would read the domain event queue, then call each handler in turn. If the first handler (AddToDailySalesReportHandler
) succeeded but the second failed, it would consider the event unprocessed and try again and again, blocking the queue. Additionally I either need to have logic in my AddToDailySalesReportHandler
handler to de-duplicate being called with the same MachineMadeSale event multiple times, OR the domain event causes something which is naturally idempotent anyway.
But is there a better way than to block the entire domain event queue? One solution, though I don’t know how easy it is to implement, is to have one queue listener per handler. This listener would keep track of where it is in the queue. I don’t know how NServiceBus works, because I thought once a consumer consumes a message it is removed from the queue. You could do this with Kafka - each handlers would basically be a consumer group and store their topic/queue offset(s)/cursor.
Additionally to not bring down the entire system, I could also have different queues/topics for different kinds or categories of domain event (e.g. 1 queue for all sale related events, 1 for all inventory related events).
The big reveal - I am using actually node.js + TypeScript, and my understanding is I cannot use NServiceBus as it’s only for .net. I’m not sure there is any equivalent for NServiceBus in the node/JS world. We ingest millions of telemetry signals from IoT devices everyday and as such we use kafka, with different being different consumers of kafka, maintaining their own position in the topic. I have seen Let's talk about Kafka • Particular Software, and agree with most that is said, but (i) I’m still not convinced moving away from kafka is smart (ii) using different conusmer groups per handler would be a good solution.
So questions:
- Does NServiceBus have the concept of multiple consumers (domain event handlers in my case), which won’t block the message queue + will the message remain until all consumers have sucessfully processed it?
I find it hard to find people talking about this on the internet - is that because most devs don’t care? Am I doing something wrong? - Should I use something other than kafka for my domain event message queue, and if so please recommend for the node.js/JS/TS world.
I’m new to this, and trying not to over engineer but also not have gaps in my system. No processing messages means my data is not consistent. My domain events are primarily used for eventual consistency
Thanks
For those that care about details, I use a ‘domain event outbox’ table in my database (if domain events are responses to entity mutations, I will write to the domain event outbox transactionally as I write to other tables to persist the entity). Then I have a debezium change data capture (CDC) that listens to the domain events outbox changes, and any table changes get dropped into a kafka queue. This kafka queue is my domain event ‘message’ queue.