Let’s imagine we have an endpoint that collects Seller ratings (1 to 5 stars) for every order and publishes SellerRatedEvent, with SellerId, OrderId and Rating. There is another endpoint subscribed to SellerRatingEvent which aggregates the data per day, so that there is a row in a SQL db table which has SellerId, Date, NumberOfRatings and AggregatedRating, so that we can calculate the average rating per day.
The implementation of the SellerRatingEvent Handler is something like (pseudocode):
var aggregatedRating = unitOfWork.GetSellerRatingForToday(message.SellerId);
This works fine, except if there is a failure after committing the changes. For example, NSB fails sending the message to the Audit queue, because it’s full (real scenario). What will happen is that the event handler will be retried a number of times, aggregating every time the same rating.
What are good patterns to avoid aggregating duplicated data? What I can think of is:
- Use outbox in the aggregation endpoind, so that the db transaction fails if the message handler doesn’t complete successfully
- Store the ratings in a db table, transactionally with the aggregated data, and verify that the rating for that orderId is not there before aggregating it.
Any other options?
What I can see with the above approaches is that 1 is mainly putting technology on it and leaving the business logic very simple, whether in option 2, the deduplication is part of the actual feature implementation. I can see good and bad things in both approaches.