No matter what encoding your JSON file is, gzip will output a compressed bag of bytes that, when unzipped, will result in the same file coming out the other end. This is true of movie codecs, Word 97 files, or anything, and none of the maintainers of those formats had to be consulted about this in order to make it work. That's what is meant by "thin waist" here.
I know, but it’s not “just bytes” as per parent comment. You cannot infer the length of the content without decoding it. “By definition” it is variable width character data. I think it’s fair to be pedantic vs a fairly dramatic oversimplification.
Less specific interfaces let you do less interesting things, but are more resilient. It's an engineering tradeoff. Purpose-built interfaces that fully expose and understand domain-level semantics are great in certain circumstances, but other times you want a certain minimum abstraction (IP packets and 'bags-of-bytes' POSIX file semantics are good examples) that can be used to build better ones.
If the rollout of HTTP had required that all the IP routers on the internet be updated to account for it, we likely would not have it. Likewise, if we required that all the classic Unix text utilities like wc, sort, paste, etc. did meaningful things to JSON before we could standardize JSON, adoption would likely have suffered.
There isn’t a UTF-8LE/BE because it is implicitly BE for wide characters. Any byte in a WC sequence cannot meaningfully be interpreted (exc character class, page etc) without its companions, so not just bytes. There is an element of presentation that must happen before “mere bytes” are eligible for JSON
This is not a trivial point -- there are plenty of specs which are sloppy about text, try to abstract over code points, and result in non-working software.
Of course it’s all bytes. It’s all bytes. That doesn’t change the fact that you need to have some awareness of encoding before those bytes are fully sensible
Yes. The encoded content is “just bytes” - once decoded it’s logically something else (var char data structured as json) that transcends bytes at the data level.