The reason you mentioned (requiring every system to use the same string encoding) matters. Interpreting a UCS-2 byte offset in rust (which uses UTF-8 internally) isn’t easy. Or symmetrically, patch a javascript string based on a UTF-8 byte offset. It’s especially hard if you want to do better than a O(n) linear scan of the entire document’s contents.
Using byte offsets also makes it possible to express a change which corrupts the encoding - like inserting in the middle of a multi byte codepoint. That goes against the principle of “make invalid data unrepresentable”. Your code is simpler if you don’t have to guard against this sort of thing. And you don’t have to worry about that if these invalid changes are impossible to represent in the patch format.
Using byte offsets also makes it possible to express a change which corrupts the encoding - like inserting in the middle of a multi byte codepoint. That goes against the principle of “make invalid data unrepresentable”. Your code is simpler if you don’t have to guard against this sort of thing. And you don’t have to worry about that if these invalid changes are impossible to represent in the patch format.