Unicode (really, knowing the width and printability of any given character) is mostly up to the C library on the client and server -- Mosh has historically stayed out of that. Unfortunately Apple is not always super-great about keeping their C library up to date with new Unicode releases, which leads to some frustration, and even on Linux it takes too long for a new Unicode release to percolate down to the libc on the client AND server. (I think the model of "libc implements wcwidth" probably made more sense in the era before Unicode decided to start pumping out new releases every year with lots of new characters everybody wants to use. But we try to keep Mosh's own attack/implementation surface small.)
With fish we've made the experience that relying on libc isn't good enough.
Specifically in the case of connecting to a server that typically has an old libc with old unicode information, from a desktop that has a much newer system, or in case of ambiguous characters, where libc will just give you one width that might not match what the terminal actually renders (and they frequently have configuration options to change it!).
So we've made something we call widecharwidth (https://github.com/ridiculousfish/widecharwidth), which is a python script that parses the unicode datafiles (UnicodeData.txt, emoji-data.txt and friends) and generates a header you can #include.
It's tempting, and thank you for the PR, but, do we then have to push a new Mosh release every time there's a Unicode update? And the users have to get this release installed on both the server and their client? It seems... like it fixes part of the problem, sure, by us taking on more of the load and locking us into an annual release cadence.
I think if we give up on libc, my "perfect" solution would probably be something like:
(a) user runs a script that prints every single Unicode character in their local terminal and learns its width (probably we can do this in some smart way)
(b) client somehow communicates this info to the server at runtime, over the protocol.
But that's a lot of protocol work and kind of annoying. The good-enough solution might be:
(a) user runs a script that prints every single Unicode character in their local terminal and learns its width, or perhaps user just runs your script on the Unicode tables (+ user-supplied info about how their terminal handles ambiguous-width East Asian characters)
(b) user is responsible for distributing this data file to every server they feel like connecting to and putting it in some well-known location in their homedir.
I think the current maintainership has their own idea of what they want to do that's not quite this either.
Just to be clear: It's not my PR. The person who made that must've just seen widecharwidth and thought it was a potential solution.
>do we then have to push a new Mosh release every time there's a Unicode update?
In theory, yes. In practice the new codepoints take long enough to be available anywhere and there are few enough of them that being a bit out of date isn't a problem.
(case in point widecharwidth is still on Unicode 12 apparently - I should update that)
>user runs a script that prints every single Unicode character in their local terminal and learns its width
And they would have to re-run that regularly, whenever the terminal updates or they switch.
>user is responsible for distributing this data file to every server they feel like connecting to and putting it in some well-known location in their homedir.
And they would have to do all that setup.
That's a lot of annoyance to put on your users when you can solve 99% of the problem by just incorporating a semi-up-to-date width table yourself.
The perfect is very much the enemy of the good here.