Complex data types in messages

What are the preferred practises when it comes to putting complex data types in messages?

This is a high point of contention within our office, with several senior developer expressing very different views on what data is appropriate for a message and what should be persisted at source and passed through with identifiers.

Is it better to persist data to a defined data structure prior to sending a message to other services like below. Allowing each service to retrieve the data necessary to perform the operation.

 public class OrderEventA : IEvent
{
    public int OrderID { get; set; }
}

Or is it better to simplify data storage and allow NServiceBus to take care of the serialisation and persist the data until all nodes have preformed the necessary operations they need.

public class OrderEventB : IEvent
{
    public Order order { get; set; }
}

public class Order
{
    public int id { get; set; }
    public Customer customer { get; set; }
    public List<Products> products { get; set; }
    public decimal TotalPrice { get; set; }
}

public class Customer
{
    public int id { get; set; }
    public string FirstName { get; set; }
    public string Surname { get; set; }
    public Address Address  { get; set; }
}

public class Address
{
    public string AddressLine1 { get; set; }
    public string AddressLine2 { get; set; }
    public string AddressLine3 { get; set; }
}

public class Products
{
    public int Id { get; set; }
    public string ProductName { get; set; }
    public decimal Price { get; set; }
}
  • When the message makes use of identifiers, we can pivot our data model without affecting any of the existing messages in the queue, simplifying migrations and product releases. We also gain extended controls over the data access, without publishing possibly sensitive information on the bus. Messages are smaller and have a reduced footprint during processing. You can control data domain contexts and boundaries.
  • When the message transfers complex types, handler has all the information it needs without querying database explicitly (webapi/database repositories), simpler code no unnecessary repositories or api clients, quicker to extend the functionality on existing messages, prevents cross data contamination by providing shared dto objects. Simplifies event sourcing. When debugging messages, you see the entire state of the data at the time of processing.

Any help or insight is greatly appreciated?

Hi Gary,

During his ADSD training, Udi Dahan explains how you should define the boundaries within your system and then only transmit identifiers within messages. The goal is then to have such clear and hard boundaries between services, that they never have to exchange any other data than these identifiers. In the UI this comes together by composing data from these various services. I would highly recommend this approach, as it helped me enormously to reduce coupling. But that requires a visit to ADSD.

When you have ServiceA and ServiceB and they both need the same data, but you want to notify each of them via events, you get the options you present. Both have their advantages and disadvantages.

When you transmit everything inside the message, you’ll have a (database) schema in ServiceA, a schema inside your message and a (database) schema inside ServiceB. So if something really simple changes, it’s likely that you need to update 3 schemas and likely some code as well.

On the other hand, if you have ServiceA and ServiceB depend on the schema in the database, (I would advise) owned by one of the two services, it’s much harder to test runtime behavior of both services when there are schema changes.

You also mention this about complex message types

you see the entire state of the data at the time of processing.

Be aware though that data might be stale inside those messages. When you query data from a central database, the chance is higher that it’s up to date when you process it inside your code.

Breaking down the big ball of mud
However, I could definitely see the benefit of complex message types if you’re trying to break up a big-ball-of-mud. Imagine you have a legacy database that is extremely complex and slow. Every single time a business event occurs, you send a message and some component, completely isolated from all other components in your system, now can work with the data provided to take its own decisions. I’ve done this before and it felt like a relief when working on such a component.

But be very aware of data that could go out-of-sync. What if it spawns an event, based on data it received, that is out-of-sync? Who is the real owner of the data? Who has the truth? For example, in ServiceA a customer bcomes preferred, but in ServiceB it is unaware of this fact. These are exactly the kind of problems Udi Dahan talks about in his ADSD course.

All in all I would advise to keep the amount of data inside your message to an absolute minimum. Don’t spread too much data around. Try to work with just identifiers and services/components that own a piece of data. Unless there is no other way to go, because you actually want to separate data. To break down that big ball of mud, for example.

It’s all about trade-offs. It all depends.

Hope this answer provides some help. Let us know if you have additional questions.