The topic is about decision problems, not about backward compatibility. That sai...

Spivak · on Jan 17, 2022

JSON has corner cases but the advantage is that every JSON document has a single obvious mapping to language primitives — dicts, lists, strings, floats. Nobody has to agree beforehand how to load JSON data.

In contrast the generic mapping for XML is a Tree[str (name), Dict[str, str] (attrs), Union[str, Tree] (body)] which maps so poorly between languages that people do one of two things — implement formats on top of XML to do serialization which leads to non-interoperability when different software does it differently, or parse to a database-like “Abstract XML object that you query with xpath.

GoblinSlayer · on Jan 18, 2022

JSON maps well to javascript types, but not anything else. And float isn't an obvious mapping: the standard does have a concept of an integer number, and most numbers are indeed integers. Arrays do everything objects do, but have better performance and better defined behavior.

>Nobody has to agree beforehand how to load JSON data.

Such agreement is never necessary, it's up to the programmer what to write, and the standard doesn't specify behavior of JSON parsers anyway, it only defines JSON documents. For example there's no need to use hashtables, it's a random javascript artifact due to parsing JSON with the eval function.

>when different software does it differently

I assume you mean schemaless documents here. Those are always abstract databases, both XML and JSON. I suppose there's jq that can query abstract JSON databases.

Spivak · on Jan 18, 2022

> JSON maps well to JavaScript types, but not anything else.

I grant you that JSON might be equally as awkward as XML languages like C but pretty much every language -- Python, Ruby, Java have very sane mappings to and from JSON types. You don't ever really have to "query" a JSON object, you just `json.loads` and `for item in obj["key"]:`. Even in the cases with schemas you're still usually only working with primitive types.

> Such agreement is never necessary, it's up to the programmer what to write...

What I mean is that there's not weirdness like having to encode types in the base document. You don't have to do things like

    <blahblah type="dict">
       <item>
         <key>dlkj</key>
         <value>kjdf</value>
       </item>
    <blahblah>

where different projects / parsers might do it differently. The "abstract JSON types" are actually useful and expressive where in XML everyone has to carve out their own way to represent lists, mappings, and numbers out of trees because basically nobody works with just trees in day-to-day work.

I think we might be talking about two different use-cases. If what you want to do with XML / JSON is serialize arbitrary classes in $specific_language and then read it back then nothing really matters; the on-disk format is just an implementation detail. But abstract JSON works really really well as a schema everyone agrees on and supported by every language.

zvrba · on Jan 22, 2022

> You don't have to do things like [...] carve out their own way to represent lists, mappings, and numbers

I work with XML extensively and out of hundreds of classes and fields, I've needed an arbitrary dictionary maybe a handful of times. Mapping/dictionary is json's abysmal replacement for a class/struct in which case you'd have XML like

    <MyClass>
        <Field1>Value</Field1>
    </MyClass>

IOW, _the tag is the key_ ! List? Simply repeated elements. Numbers? What are you talking about, they're directly representable in XML and XSD knows about integers, floats, etc. (unlike json).

GoblinSlayer · on Jan 19, 2022

You don't show anything with that. Sure you can walk through any schemaless JSON document, because it has generic JSON document structure, but the same can be done for XML too in any language. You can't make sense of the document this way beyond its wellformedness. There being numbers don't help you much, you can't tell anything about them beyond them being numbers.

wusspuss · on Jan 26, 2022

>JSON maps well to javascript types, but not anything else.

Except it does. Take python. Ruby. Any language that has a notion of dicts, lists and strings/ints/floats. That's basically every high level language ever. Even exotic stuff like tcl. And e.g. C, a low level language, has a thousand implementations of those same structures.

stormbrew · on Jan 17, 2022

> That said, JSON has corner cases too, similar to XML or worse.

Feel free to list some.

GoblinSlayer · on Jan 18, 2022

The often mentioned design paralysis of choice between elements and attributes - in JSON there can be many ways to implement a collection of name/value pairs. One interesting case is compound key: you can use a mini serialization format and still make it an object (actually saw this in the wild).

wusspuss · on Jan 26, 2022

> in JSON there can be many ways to implement a collection of name/value pairs.

Are there? JSON has dicts and lists. I mean you could store a collection of name value pairs in a list, maybe even a list of lists, but that's just stupid and an incorrect usage of the format. Where as in xml there really are tons of ways to do that, and ALL of them are awkward

zvrba · on Jan 18, 2022

https://bishopfox.com/blog/json-interoperability-vulnerabili...

stormbrew · on Jan 19, 2022

Most of these just amount to "some parsers are bad", which.. I mean sure? But xml's surface area for parsers to be 'bad' is so much greater, and its gotchas are thus much more subtle. I hope you're not trying to suggest that all xml parsers are identical and perfect.

Your assertion was that json's edge cases are "as bad or a little worse" but imo this document doesn't suggest that at all. Every single thing listed in it can go more wrong with xml, not less.