Hacker News new | past | comments | ask | show | jobs | submit login

The thread was originally about CRA vs Vite size on disk (or implicitly, if we're applying it to real world applications, network cost in CI job startup times). And like I said, surrogate pairs don't apply to ASCII.

See this[0] for reference. Note how the first byte must fall within a certain range in order to signal being a surrogate pair. This range quite deliberately falls outside the ASCII range. This fact is taken advantage of by JS parsers to make parsing of ASCII substrings faster by special casing that range, since checking for a valid character in the entire unicode range is quite a bit more expensive[1].

IMHO nitpicking about memory consumption of the underlying data structure is a bit meaningless, since the spec doesn't actually enforce any guarantees about memory layout. An implementation can take more memory for pointer to prototype, to cache hash code/length, etc, and there are also considerations such as whether the underlying data structure is polymorphic or monomorphic due to JIT, whether the string is boxed/unboxed, whether it's implemented in terms of C strings vs slices, etc.

Regardless, it doesn't change the fact that the octet sequence "hello world" takes 11 bytes in ASCII/UTF8 encoding (disregarding implementation metadata).

[0] https://github.com/jquery/esprima/blob/0911ad869928fd218371b...

[1] https://github.com/jquery/esprima/blob/0911ad869928fd218371b...




All great points. Not trying to nitpick, just trying to satisfy curiosity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: