That's definitely a concern, but it's also way outside of what I was talking about. I would also expect any JSON parser, even one in a dynamic language, to fail on JSON that is straight-up malformed. And ambiguous formats are always bad news.
I'm talking about situations where the JSON is formatted fine, it's just that some field wasn't specified, so then the entire input gets rejected. Even though there was zero need to read the contents of that field in the first place. It just happened to be included in some domain object that gets re-used everywhere, including some other places where the field's contents do matter.
Keep in mind that, when we're dealing with anything that might be transmitted in JSON, thinking that there might be a published spec, and that it manages to accurately cover all these details, is really optimistic. I've honestly never seen it happen in the wild. Oftentimes, any validation rules you might try to impose are guesswork as much as they are anything else. So complaining that a piece of data didn't conform to the spec might not even be a valid thing to do. All you can say for sure is that the data didn't meet the needs of some piece of business logic.
It's not perfect, but it's life. This tension, for example, is at the heart of why proto2 got replaced with proto3, and why using proto3 is strongly encouraged if you're looking to build a robust infrastructure.
There are huge debates at Google internally over required vs optional in proto2 and proto3.
Beyond that I think you’re operating from a misconception about JSON parsing in static languages. There’s no requirement to convert to domain objects and reject data that doesn’t fit on a triviality, you’re just required to specify explicitly what happens when you encounter unexpected structure or data.
Sorry if I wasn't being clear. I'm not saying that's the only way it can work in static languages. I'm saying that that's the way it tends to work out in practice, because the ergonomics of most popular static languages tend to discourage a less brittle approach.
Whereas the ergonomics of popular dynamic languages tend to favor an approach that I find, for this specific purpose, to be both less verbose and more robust.
> For example, suppose we have JSON that represents a set of metric data (this isn't our real JSON, this is just a thought experiment) that should look like this, with "tags" being optional attribute:
{ "id": "1", "timestamp":"12:30pm", "value":"999", "tags": [ "myapp" ] }
> Suppose a python client sends tags but calls the attribute "tag" rather than "tags" (its missing the "s"). Its an optional attribute, so the server won't consider it an error if the "tags" attribute is missing. But it also won't fail due to this unknown attribute called "tag" - it will just silently ignore it now. The Python developer is wondering why his tags aren't being stored - he is getting no errors but they are just silently being ignored. He would need to figure out he is sending in the wrong attribute name, with no error messages to help him out.
> That's the use-case I'm asking about - the "silent error" that will occur due to malformed JSON messages.
What is the difference in approach between these? I've programmed extensively in dynamic and static languages, and don't understand what you're talking about. Less verbose, I might concede. More robust though, I need some more evidence.
Reminds me of Rich Hickey’s “Maybe Not” speech, which I understand him suggesting that programming with “sets” is better than programming with “records” that may contain optional values.
Yes, I know it and he seems to mostly ignore the fact that you can still fall back to manual typechecking in a statically typed language. That’s the part I don’t get. There’s nothing stopping you from manipulating JSON structurally in a static language.
I'm talking about situations where the JSON is formatted fine, it's just that some field wasn't specified, so then the entire input gets rejected. Even though there was zero need to read the contents of that field in the first place. It just happened to be included in some domain object that gets re-used everywhere, including some other places where the field's contents do matter.
Keep in mind that, when we're dealing with anything that might be transmitted in JSON, thinking that there might be a published spec, and that it manages to accurately cover all these details, is really optimistic. I've honestly never seen it happen in the wild. Oftentimes, any validation rules you might try to impose are guesswork as much as they are anything else. So complaining that a piece of data didn't conform to the spec might not even be a valid thing to do. All you can say for sure is that the data didn't meet the needs of some piece of business logic.
It's not perfect, but it's life. This tension, for example, is at the heart of why proto2 got replaced with proto3, and why using proto3 is strongly encouraged if you're looking to build a robust infrastructure.