Message validation - request for input

Simon_Cropp · June 3, 2018, 10:48pm

I created two new side projects that help validate messages:

One using FluentValidation:

And the other using DataAnnotations:

They are both currently in beta, and I would appreciate any feedback prior to setting on a stable API, behaviour, and functionality.

danielmarbach · June 4, 2018, 11:06am

Hi Simon,

I was wondering what the design decision was to make the validation on by default on incoming messages? For me it feels a bit off to use that as default. One important thing we usually teach about commands, for example, is that once they are sent they should be successful in almost all the cases. I would suggest encouraging validation on the outgoing send. Thoughts?

Regards
Daniel

Simon_Cropp · June 4, 2018, 11:20am

i took that approach based on the common approach on writing public APIs. ie the majority of public apis validate on incoming, but not on outgoing.

ramonsmits · June 4, 2018, 2:58pm

I rather have an incoming message be forwarded to the error queue immediately due to a validation exception then due a null reference exception, or even worse, storing the null value in a data store which goes unnoticed

Then you are aware that it is in the error queue, you can fix the incoming data by specifying a different default or some other strategy and send it back to the endpoint queue, pass validation an continue your process or have some other unhappy flow.

SzymonPobiega · June 4, 2018, 5:58pm

I fully agree with @danielmarbach that messages, by default, should be valid and making them valid is the sender’s responsibility.

That works within a single trust domain i.e. when the same party controls both the sender and the receiver. On the edges of the system (or component) where the arriving messages are created by a different party, receiver-side validation makes a lot of sense for various reasons, including security.

These were my mandatory $.02 anyway

Simon_Cropp · June 5, 2018, 6:54am

Thoughts on having it validate on both incoming and outgoing by default?

mookid8000 · June 5, 2018, 7:04am

I think this is a valid point – while the handler of a command of course needs to protect its own integrity and avoid poisoning itself from bad input, it’s generally a good approach to be more forgiving in the receiving end and more strict at the sender’s.

It’s basically just Postel’s Law

Thinking about Postel’s Law in regards to this specific implementation made me consider whether it would make sense to be asymmetric about it? I.e. basically apply two sets of validation: a stricter one for outgoing messages, and a more forgiving one for incoming (e.g. crafted for each subscriber, requiring only the subset of the message that each particular subscriber is interested in).

Thoughts?

ramonsmits · June 5, 2018, 7:45am

But what is the sender made an error and is unaware or maybe the sender just passed data that it received in error?

So what if the sender doesn’t do validation, it accidentally inserted a null value, you do not check this and now you have corrupted data in your system because the receiver didn’t validate?

Basically there are 4 options:

Message is valid, process
Message is invalid, process
Message is invalid, move to error queue
Message is invalid, discard

I’m all in on option 3, and move it to the error queue ASAP instead of cycling through all immediate and delayed retries as a schema validation error is not a transient error.

It then ends up in the error queue and you can either choose do archive the message and do any manual task needed or as stated before add a compensating action. You can JIT fix the message, so that it now does pass validation, or you can manually discard the message, you deploy this, and you retry the message which now does result in processing successfully without any change of corrupting your system. Meaning, the message eventually gets processed.

I’ve almost never seen systems that have well though out error flows or very good input validation. IMHO you should take validation steps to ensure you are not accidentally corrupting your business data because of rogue or erroneous input. I personally have always been a big fan of XSD Schema validation of the past, tightening schemas as much as possible and saved me many many times and gave opportunities to process incoming data differently because you explicitly develop such alternate flows based on the validation which grows over time.

Regarding postels law, yes definitely, the recipient should be as liberal as it can be, but still require validation.

danielmarbach · June 5, 2018, 8:50am

Hi all,

Great discussion!

Just to clarify my viewpoint. I’m not saying you should never do any validation on the receiver side. I’m also not saying the message should not be moved to the error queue if the validation fails on the receiver side. The point I’m trying to make is that I would encourage validation on the sender side first. That’s why I was bringing up my viewpoint and saying that I think the library default should be switched to the receiver side.

Regards
Daniel

Dennis · June 5, 2018, 9:04am

You do as much validation on both sides as possible. There are technical exceptions (null value) and business exceptions, like an ordered product that is no longer in inventory.

You check as much as you can sender-side. You can create validation logic and deploy that both at the sender and the receiver. We use that example in the SOA Done Right workshop, where we point out you don’t need to map continuously between layers. If you SPA needs json, why not feed it json directly from a DocumentDb that stores json? There are even DocumentDB that support hosting javascript for validating incoming data, if someone tries to circumvent the javascript in the SPA. And this example is even without messaging.

Again, try to do as much as you can sender side. But receiver side is the one to make the last call if data is correct.

Maybe we’re also all in agreement and just say the same, but different.

boblangley · June 5, 2018, 5:09pm

Should a data validation error go to the error queue? Messages in the error queue should be retryable without modification.

Instead, shouldn’t validation errors be handled via an explicit business process?

Simon_Cropp · June 6, 2018, 6:42am

ok i changed validation to be enabled for both incoming and outgoing by default.

@mookid8000

basically apply two sets of validation: a stricter one for outgoing messages, and a more forgiving one for incoming

I am not sure that is a good idea. see Robustness principle - Wikipedia

I think there is a scenario, when expectations/assumptions on messages need to change, and there is a message is in-flight. but i thought that would be handled either by a message mutator or serializer configurations?

I am happy to consider diff validations for incoming and outgoing, but not sure if the above is enough justification? I would perhaps prefer

@ramonsmits

I’m all in on option 3 (Message is invalid, move to error queue), and move it to the error queue ASAP instead of cycling through all immediate and delayed retries as a schema validation error is not a transient error.

yep that is the current behavior and it is in the doco Community extensions and integrations • NServiceBus • Particular Docs

I personally have always been a big fan of XSD Schema validation of the past,

I did consider doing a json.net schema variant Json.NET Schema - Newtonsoft

@SzymonPobiega

That works within a single trust domain i.e. when the same party controls both the sender and the receiver.

I think that works when u have a very small domain + small team. as it grows to many endpoints+teams+message contracts, i think those message contract assumptions+validations need to be formalised

@Dennis

You do as much validation on both sides as possible.

Agreed

You check as much as you can sender-side. You can create validation logic and deploy that both at the sender and the receiver.

We are shipping our validators inside our message assembly, then scanning all *.Messages.dll assemblies for validators.

If you SPA needs json, why not feed it json directly from a DocumentDb that stores json? There are even DocumentDB that support hosting javascript for validating incoming data, if someone tries to circumvent the javascript in the SPA. And this example is even without messaging.

We have a similar scenario, since we are using Community extensions and integrations • NServiceBus • Particular Docs. essentially place the http request directly on the sql table. We do validation on message type/namespace and destination, then full message validation is done on the receiver.

Also, as mentioned above, using a json schema before “feed it json directly from a DocumentDb” might be a viable approach?

@boblangley

Should a data validation error go to the error queue? Messages in the error queue should be retryable without modification.

Unfortunately that is not always possible. Often u have bugs which cause message content to be problematic, and it needs to be fixed prior to a retry

Instead, shouldn’t validation errors be handled via an explicit business process?

Some validations map well to having an explicit business process, other do not. A business user doesnt care that a Guid property should not be Guid.Empty and no business process needs to happen if one is detected.