I agree with the sentiment of this post but I believe the parent is specifically referring to "bloat" that comes from XML being "perhaps too simple".
For example:
<e1> first <e2> second </e2> third </e1>
If you're parsing this into a data structure, how you you store "first" and "third"? There are a few different, very similar options. This ambiguity means XML parsers account for this differently and a particular programmer is going to have to sort through the various options to find what he needs (dom/etree/xpath/sax/etc). JSON (mostly) forces the author to decide how it should be parsed before encoding the data.
I'd say it's unfair to say XML is bloated, it's reasonable to say that the "world of XML parsing" is bloated and confusing, especially if you're a programmer looking for a data serialization format.
I'd say it's unfair to say XML is bloated, it's reasonable to say that the "world of XML parsing" is bloated and confusing, especially if you're a programmer looking for a data serialization format.