> OSX uses NFD, while pretty much everyone else uses NFC (by default).
It depends on the file system. HFS+ uses UTF-8 NFD, as you noted.
On Linux, ext4 uses whatever encoding your tools happen to use, as long as it's an 8-bit ASCII superset capable of representing / and \0.
On the Mac, this means you can rely on a consistent file encoding, and you never wind up with weird encoding issues when using different software, sharing files between machines, or when writing code to parse file names.
On Linux, this means you're at the whim of whatever happened to write the file. There's no real way of telling -what- the file name encoding might be, and if you wind up with files with mismatched encodings, you just wind up with garbage.
I think Apple chose the right approach; it's the only way for it to be possible for things to 'just work'
> I don't disagree that it might be a corner case that doesn't concern a lot of people, but it's not like I was doing black magic.
Well, technically you were doing black magic, in that:
- scp(1) is not file name encoding-aware.
- The Linux ext* file systems do not provide a guaranteed encoding
If you used a name-encoding aware tool/protocol (such as modern network file systems), and you have consistent and correct encoding on the server (there's no gaurantee with Linux), then transferring files between machines should always "just work".
http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_form...
> OSX uses NFD, while pretty much everyone else uses NFC (by default).
It depends on the file system. HFS+ uses UTF-8 NFD, as you noted.
On Linux, ext4 uses whatever encoding your tools happen to use, as long as it's an 8-bit ASCII superset capable of representing / and \0.
On the Mac, this means you can rely on a consistent file encoding, and you never wind up with weird encoding issues when using different software, sharing files between machines, or when writing code to parse file names.
On Linux, this means you're at the whim of whatever happened to write the file. There's no real way of telling -what- the file name encoding might be, and if you wind up with files with mismatched encodings, you just wind up with garbage.
I think Apple chose the right approach; it's the only way for it to be possible for things to 'just work'
> I don't disagree that it might be a corner case that doesn't concern a lot of people, but it's not like I was doing black magic.
Well, technically you were doing black magic, in that:
- scp(1) is not file name encoding-aware.
- The Linux ext* file systems do not provide a guaranteed encoding
If you used a name-encoding aware tool/protocol (such as modern network file systems), and you have consistent and correct encoding on the server (there's no gaurantee with Linux), then transferring files between machines should always "just work".