Hacker News new | past | comments | ask | show | jobs | submit login
The plague of emoji insertion in French docs (bibelo.info)
185 points by bibelo on Nov 14, 2022 | hide | past | favorite | 203 comments



This space before the colon and other marks in French should actually be a narrow non-breaking space (U+202F) [0]. There's no key for it in the AZERTY layout.

This has been a problem since the typewriter age. People having to get on with their jobs coped with it by using a full, breaking em-space. Unless this gets replaced automatically by the word processor, you get horrid typography and misplaced line breaks all over the place.

The Académie Française should have dealt with this years ago, if their ass wasn't stuck in the 17th century.

[0] https://www.compart.com/en/unicode/U+202F


A modern solution is simply to have non-breaking space easily accessible in your keyboard layout for when you need it. In the BÉPO layout this is at SHIFT + space. Especially simple since all double punctuations (:?!; those that require an nbsp before them) are also accessed with SHIFT.


Better still, why not abandon the space as a character, and render the colon with extra space to the left when locale is French?


Azerty layout could certainly map shift-space to unbreakable space too

But honestly, rather than changing keyboards (which is hard), why doesn't Google just pick a shorthand that doesn't break typography rules, like `@` instead of `:`


The : allows users to discover it by coincidence when trying to type a smily :)

If triggering it by accident wasn't intended, they could have used a regular keyboard shortcut like Alt+Shift+E or even instruct users to press Win+. to use the system-wide emoji menu.


A modern solution is to abandon old, no-longer-relevant typographic language rules, or to make typographic language rules context-specific.

But I agree that we need to make several alternative space characters easy to type:

  - non-breaking space (for this French rule)
  - wide space (for disambiguating sentence
    ending periods from non-sentence-ending
    periods)
  - zero-width non-breaking space (for
    preventing word-splitting?)


What makes the French rules old, no-longer-relevant, invalid? Why should we change language to appease lazy software developers?


Not "rules", just this rule. It has to do with typographic considerations that apply to old typesetting technologies that are no longer in use.

> Why should we change language to appease lazy software developers?

It's been done.

For example, in Spanish it is no longer the rule that "ch" and "ll" sort as if they were distinct letters (this change was made in 2010) precisely because that was such a difficult rule to implement. And that was a 256 year-old rule per-wikipedia:

  The digraphs "ch" and "ll" were
  considered single letters of the
  alphabet from 1754 to 2010 (and
  sorted separately from "c" and "l"
  from 1803 to 1994).
For another example, in Spanish capital letters were required to not carry accents, but now they are allowed to not carry accents. This was due to the use of overstriking on typewriters working to accent lower-case letters but not upper-case letters (apostrophe would collide with the glyphs for upper-case vowels). But the technology to resolve this has existed in the Spanish-speaking world for a long time now, so the rule was finally dropped. (Not accenting upper-case letter can lead to ambiguities that are annoying.)

It's not just precedent. It's that the original reason for some typographic (not even orthographic) rule is simply not relevant in 2022.

And it's not unreasonable for either French people, non-French French speakers, or just non-French French-non-users to propose the ditching of hard-to-implement French rules. Now, this particular rule is decidedly not difficult to implement, but it is an annoying rule to apply as a user -- I should know, since I speak and write French (though I am not French).

Also, it doesn't matter what the French Academy says, or what the Spanish Royal Academy says, or what Webster's dictionary says, or whatever. Language evolves, even to their consternation. Moreover, developers don't have to care that much -- I18N/G11N is fun enough, and employers have to care for legal reasons, but rules like the Spanish ch/ll rule can be much too hard even for non-lazy developers, and the Royal Spanish Academy can and did have to change, and it was for the better.


In contrast, the Hungarian "cs", "dz", "dzs", "gy", "ny", "sz", "ty", and "zs" all remain distinct, as do their accented vowels. Polish is hybrid, considering digraphs as being 'composed' of single letters (i.e. 'sz' = 's+z'), rather than being distinct, but the accented characters are considered distinct from their unaccented cousins.

This problem has primarily come about because the Catholic church enforced the latin alphabet on languages where a different alphabet might have been more appropriate. Spanish, although more closely related to Latin, still has a few sounds which there's not a good latin character for. There's no (particular) reason (as far as I'm aware) that 'ch' and 'll' became diagraphs, while ñ acquired an accent.

Why should traditions change just because it's a bit more difficult to do things the 'old way'? why do we still bother with capital letters at the beginning of sentences? or speling things with two leters when one wil do, or riting silent leters wen you cant tell the difrens? & i dont think we need apostrofees n e more.


Indeed, there's no reason for 'ch' and 'll' to have been distinct digraphs in Spanish and ñ not to have been 'gn' as in other Romance languages. I'm not familiar with the whys of that.

I didn't know that Hungarian had a similar issue.

> Why should traditions change just because it's a bit more difficult to do things the 'old way'? why do we still bother with capital letters at the beginning of sentences? or speling things with two leters when one wil do, or riting silent leters wen you cant tell the difrens? & i dont think we need apostrofees n e more.

I distinguish typographic and orthographic rules. The non-breaking, thin space before punctuation rule is typographic and outdated (i.e., motivated by outdated typographic technology).

I do want some orthographic rules reformed too, but I'm more interested in the ones that are just hard. In particular I'm interested in collation reform because we do often have to collate multi-language text items but with one collation -this is especially true in databases- so having collations for Latin-script-using languages be similar is rather useful. This is also true given that I'm not going to be switching locales when I switch languages -- I speak, read, and write multiple languages, but I never ever change locales.


  >A modern solution is simply to have non-breaking space easily accessible in your keyboard layout 
An even better solution would be --for grown up people who have progressed beyond cave-painting and want to communicate using, you know, actual words-- to be able to disable emojis completely.

Fucking moronic shite that they are. I've seen people on Twatter and FB have entire conversations in bloody emojis. Talk about reverse evolution! Why don't we just go back to grunting and gesturing and have done with it?


I'm surprising myself by saying this, but emoji are pretty useful in certain contexts.

Certainly more useful than the requirement to put a space before a colon, non breaking or not.


I guess you only speak monotonically and avoid interpreting pitch, pauses, and physical cues in conversation as well, since those would be archaic, right?


smiley face. crying face. row of hearts in different colours. strange yellow thing that might be a banana or an arm flexing its bicep. sports trophy. winking face.

My god! --you're right. This is so much better and clearer than using those boring old fashioned words.


Emotions aren't clear either, but there are certain societal expectations to read and respond to them appropriately.


If you fail to put a space before a colon when writing in French, what happens next? Do people point and laugh? You get disciplined? Or would French speakers accept this as a better way to use the colon character?

It looks like this rule is based on old typographic considerations. Much like the Spanish Royal Academy's rule that capitalized letters carry no accents (unlike the opposite French rule that capitalized letters do carry accents!), which stems from typewriters not having accented letters, so one would type a vowel, backspace, then an apostrophe to make an accented vowel, but for capitals there's not enough space so you couldn't and wouldn't overstrike them.

Users and language academies should distinguish typographic from non-typographic language rules, and typographic rules should be context-specific (well technology-specific, since technology is the context).


I don't know, what would happen in English if you didn't capitalize the days of the week? Do people point and laugh? You get disciplined? Or would English speakers accept this as a better way to write the days of the week?

No human language on Earth is in a position where it can laughs at others for their idiomatisms.


I do not think that was their point (to laugh at other languages), but rather, that there may be contemporary situations in which warrant rethinking the idioms of a language. As an aside, I do not think many people who speak english would even notice a lack of capitalization for the days of the week.


> As an aside, I do not think many people who speak english would even notice a lack of

English


I've had colleagues who refuse to use capitals in most cases. And yet their writing is completely comprehensible.


In English most people wouldn't care. You're going to get in more trouble in French schools for this than in U.S. schools (I don't know about UK schools, or elsewhere in the Anglosphere). My impression is that European culture is a lot more sensitive to these things than U.S. culture.


You'd certainly be corrected in the UK and would count as a regular spelling mistake.


Is Esperanto clear of exceptions and illogicalisms?


i never bother capitalising things. i was pulled up on it once, ever.

that person was wrong. :P


Or, word processors could understand that the pattern

"some-chars" + <whitespace> + ":"

must be treated as a single word in French.

(I guess it's more complicated than I imagine it is, alright)


It's more complicated than you imagine it is. Basically you need to know both: whether the current locale is a French locale, and also whether original text was written in a French locale.

The former is easy enough, but also very annoying to multilingual people since one might run in a Spanish locale but occasionally write in French. So that's not a solution.

The latter is... hard to do, because while Unicode has language tags that you can embed in documents, those are deprecated and they were never well supported, and so there's no way to mark-up text as being in one language or another, and a document-wide setting wouldn't be enough nor sufficiently generic and standard and portable.

The best solution here is to relax the French typographic rule (since it isn't needed anymore). But that would take time to filter through to French speakers (writers, and readers) so that they learn to not put that pesky space before punctuation, but also so that they don't complain when it's missing.

Or... you know, this business of emoji pickers could be something you could turn off. Nahhh, that would never fly! (/s)


> Unicode has language tags that you can embed in documents, those are deprecated and they were never well supported,

Huh? MDN doesn’t mention this … why are they deprecated?


Yes, [sadly] deprecated by the Unicode Consortium in Unicode 5.2, and by the IETF in RFC 6082. Nor have they be undeprecated since them (see https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf#G115...).

IMO we should have tried harder to make Unicode language tags useful and used. But it didn't happen, so they're a thing of the past. Of course, they're still there, and one could attempt to resurrect them, but most likely one would fail.

Choice quotes below:

https://www.rfc-editor.org/rfc/rfc6082

  > RFC 2482, "Language Tagging in Unicode Plain
  > Text" [RFC2482], describes a mechanism
  > for using special Unicode language tag
  > characters to identify languages when needed.
  > It is an idea whose time never quite came.
  > It has been superseded by whole-transaction
  > language identification such as the MIME
  > Content-language header [RFC3282] and more
  > general markup mechanisms such as those
  > provided by XML.  The Unicode Consortium
  > has deprecated the language tag character
  > facility and strongly recommends against
  > its use.  RFC 2482 has been moved to
  > Historic status to reduce the possibility
  > that Internet implementers would consider
  > that tagging system an appropriate mechanism
  > for identifying languages.
  >
  > A discussion of the status of the language tag
  > characters and their applicability appears
  > in Section 16.9 of The Unicode Standard
  > [Unicode52].
https://www.unicode.org/versions/Unicode5.2.0/ch16.pdf (section 9 of that chapter, 16)

  > 16.9 Deprecated Tag Characters 519 The Unicode
  > Standard, Version 5.2 Copyright © 1991–2009
  > Unicode, Inc.  for detailed recommendations
  > on the use of U+FFFD as replacement for
  > ill-formed sequences. See also Section 5.3,
  > Unknown and Missing Characters for related
  > topics.  16.9 Deprecated Tag Characters
  > Deprecated Tag Characters: U+E0000–U+E007F
  > The characters in this block provide a
  > mechanism for language tagging in Unicode
  > plain text. These characters are deprecated,
  > and should not be used—particularly with any
  > protocols that provide alternate means of
  > language tagging. The Unicode Standard recom-
  > mends the use of higher-level protocols, such as
  > HTML or XML, which provide for language tagging
  > via markup. See Unicode Technical Report #20,
  > “Unicode in XML and Other Markup Languages.”
  > The requirement for language information embedded
  > in plain text data is often overstated, and
  > markup or other rich text mechanisms constitute
  > best current practice. See Section 5.10,
  > Language Information in Plain Text for further
  > discussion.
(Reformatting is mine.)


Oh, “tag” made my brain parse as HTML tags. TIL


And thus impossible to use the emoji inserting feature, and other bugs (silently replacing whitespace with different but looking the same whitespace will make for fun issues)


Well, double colon :: is just as convenient, make more sense and not a problem in either French or English. Even in English, you might want to stick an emoji to the previous word. Like Hell::devil:: (my nephew would like that though).


Double colon would make talking about perl even worse (:: is namespace separator) -- it's already bad enough that any namespace starting with D becomes an emoticon in pretty much all online editors.

This stuff should just stop. Operating Systems / Browser vendors should instead standardize on a hotkey to bring up an emoji selector that steals focus to filter via typing and inserts on enter key.


Code should be quoted upfront.

Finding some sequence that also isn’t a valid Perl substring is impossible, in any case.


Indeed. Replace scroll lock with emoji lock


MS Word actually replaces normal whitespaces with non breaking spaces in some cases. Kinda works.


Libreoffice, and I believe ms word too, automatically replaces the space before a colon with a non-breaking one


This sounds like it's not actually true? If it was, the French code page 646 that we used until Unicode finally won would have included a narrow space, but it doesn't. "Regular" computer text in French has only ever used a normal space, even if handwriting and/or "true" typesetting using typesetting solutions like TeX or PageMaker etc. allowed for a narrow space.


FWIW, LibreOffice automatically inserts an actual Unicode NO-BREAK SPACE when I type ":" at the end of a word (if the language is set for French of course). If I insert an actual SPACE and then hit ":", it even replaces the SPACE with a NO-BREAK SPACE.

I'd be surprised MS Word doesn't do the same. No need for a "true" typesetting solution.


I think the point of the GP is that it inserts NO-BREAK SPACE instead of a NARROW NO-BREAK SPACE. If that's the case, it's a bug.


Codepages are from back in the era of little memory available and monospaced fonts.

And another "US English" centered thing; spacing does not really matter in English, but can have functional differences in other languages and scripts.

https://www.youtube.com/watch?v=2yWWFLI5kFU is a fun look at one of the problems with Unicode in general.


This? https://en.wikipedia.org/wiki/Code_page_1010 There’s no space for it nor for many other more useful characters (like â).


This is mostly correct, but I don't see how it contradicts my statement.

Did the 646 standard account for variable-width characters at all?


Not so much contradict as wondering about the claim that it should be a specific Unicode codepoint when Unicode wasn't around when we started "computering" text (and the Académie Française can't have possibly formally declared things in terms of Unicode =)

What are the actual official rules in this case (and are there links to those? Because that'd be fascinating information to read through)?


Best reference I could find is here: https://www.lalanguefrancaise.com/articles/espace-insecable

Actually before the ":" specifically there should be a regular non-breaking space, not a narrow one. Except in Switzerland. Other punctuation marks take the narrow non-breaking space.


Love how it's "recommandé", "pour des raisons esthétiques."

Which I guess means we're completely free to ignore it. Pour mêmes raisons =P


> Especially simple since all double punctuations (:?!; those that require an nbsp before them) are also accessed with SHIFT.

I just tested with `setxkmap fr`. These are not shifted:

    :!;
Only this requires Shift:

    ?
Also French layouts use an inverted number row (although none of those are accessed through that row).


Correction: `setxkbmap fr`.


This is partially false, because there isn't one true AZERTY layout. There are various platform implementations, with MS Windows being the most common.

In fact, french standard body AFNOR actually updated their AZERTY layout standard three years ago to include more characters, including the narrow non-breaking space. In traditional ISO-like fashion, one must pay to access this standard, but you can find an example here:

https://commons.wikimedia.org/wiki/File:KB_-_AZERTY_-_AFNOR....

It's mapped to AltGr + Maj + Space. Now you just need to find how to install/enable this layout on the platforms you care about.


CORRECTION: narrow non-breaking space goes before ";", "!" and "?". Before ":" you should use regular non-breaking space. That is, in France. In French-speaking Switzerland it's a narrow non-breaking space everywhere.

This is the best reference I could find: https://www.lalanguefrancaise.com/articles/espace-insecable


Personally I like auto suggestions like this, but I never want it to interrupt typing, so it should never grab input when you press typing-related keyboard keys, including the enter key, space and arrow keys which you need to move the cursor around.

I do not want to have to close boxes that appear at arbitrary times with the esc key to continue regular typing, since the box interrupts flow and there is a reaction time between seeing the box and closing it with esc.

It would be ok to me if you have to use another less regularly used key, e.g. tab or ctrl+space, first to get into the suggestion box and then use arrows to select one.

This applies also (especially in fact) in code editors, chrome devtools, etc...

I wish others felt the same so this feature would be implemented in a less interrupting way by default. I've never seen it implemented in a non-interrupting way in anything ever.


Yes, at some point I reconfigured autocomplete in to only complete when you press <tab>. I almost never press <tab> otherwise when I’m typing something.


I've remapped emojis to `::` and it's been great. Barely more work than `:`, and don't have to deal with a clunky popup when I don't intend it.


I like the implementation in a bunch of IM clients, at least Teams and iMessage: emojis are treated as autocorrect, so if you type the full text version of the emoji (not just the first character!) it gets replaced with the emoji. In the rare cases it's wrong, hitting escape after the autocorrect fixes it.

I don't see why the Google Docs implementation wouldn't work that way. If it really must pop up a box, allow the user to continue typing in document and gradually reduce the list of candidate emojis. As soon as a character is typed that either complete the emoji or rules all of them out, the box goes away.


The issue is that a very common key to hit after a colon is Enter - which selects the emoji in question. If you simply keep typing other letters, you don't have a problem.


Yes; since emojis don't contain <enter>, in my model it should dismiss the dialog and not do a replacement. The error is in the emoji panel stealing focus so <enter> is seen as a selection in the panel. If they panel did not have focus and keystrokes still went to the main app, this would work fine.


Interestingly, when using Kate as a Markdown editor I'm interrupted by the word-based completion panel popping up, grabbing input, and incorrectly completing words when I press Enter. I wish it either didn't auto-popup or didn't grab input in Markdown mode.


In Kate you can turn off auto complete in the Settings, Editing, Auto Completion tab, or set a minimal word length to complete to something long so it's less likely to appear for common letter combinations but still possible for longer words

Of course as always no settings to have a way to enable auto completion but not use the standard typing keys :(


Typographic nerd: technically, you should you use a non-breaking space (espace insécable) before colons in french. Doing that does not show the popup. Wait, what do you mean your keyboard layout does not have it?

Keyboard layout nerd: you should use a keymap that has it. Like AFNOR's latest AZERTY or BÉPO.

Historical nerd: traditionally wordprocessors have auto-inserted non-breaking spaces before colons, why should I care now ?


A narrow non-breaking space, please.


Of course, sorry.


> Historical nerd: traditionally wordprocessors have auto-inserted non-breaking spaces before colons, why should I care now ?

Because I don't want to switch locales in order to write short bits of text in another language. I don't even want to have to switch keyboard layouts -- just use compose sequences for diacritical marks and so on. And I don't use a French locale, but I do write French text sometimes.

I would imagine that in Europe this is a big deal. So many Europeans are multi-lingual... But you don't expect a Spaniard to switch to a French locale to write in French and vice-versa -- that's too disruptive.


Non-breaking spaces are transformed automatically when writing French by Word, LaTeX, and other text processors. Why would you want to learn the many ways of writing a spacing character, when your computer can do it for you?


Thanks for repeating the last sentence of my post in your own words :-)


Or you should use a line-breaking algorithm that does not suggest a break at that location. Line breaking is supposed to be tailored to language.


This is all a big pain in multi-lingual documents. What, am I supposed to switch locales every time I switch languages? No no, that's absolutely not OK.


I'd agree, but this is even more subtle, as a sibling comment to yours said: the space is actually smaller. So text rendering should take this rule into account as well. But then you're assuming that you're doing single-language text rendering. Putting the character at writing time still seems like a simpler tradeoff (maybe with editing help like auto-replace or correction).


This annoys the hell out of me and I only speak English.

Microsoft is probably the biggest development-focused company in the world, but their own work communications app, Teams, doesn't allow you to paste small pieces of code in a chat because it replaces punctuation with emoji.

Regardless, if I want an emoji, I will type the emoji. If I typed ':)' I want ':)' not some stupid yellow face.

Meta's Facebook Messenger doesn't allow those forbidden strings either. Meta's WhatsApp mobile does, but WhatsApp web does not.

My phone has an emoji picker if I want an emoji. I don't want what I type or paste replaced with an emoji. Anywhere. Ever. This practice should stop.


> their own work communications app, Teams, doesn't allow you to paste small pieces of code in a chat because it replaces punctuation with emoji.

Don't you just put it in backticks `like this`? Even for really short things, it seems clearer and it avoids the machine "helpfully" changing things.

(Your general point is correct, of course, just suggesting a solution to this very particular nuisance)


In Skype you have to write {code}...{code}. Or prefix your entire message with @@ (but it used to be you could prefix !! -- why oh why did they change it?!).

I so so wish we could standardize on GitHub markdown for all these things!


My way of getting around the emoji conversions from ':)' to a smiley face is typing it in reverse: '(:' or '):'.


But I'm not left-handed!


I don’t remember which software, but I’ve seen those being converted as well.


Plus then you can stick out your tongue like this: c(: c):


> Regardless, if I want an emoji, I will type the emoji. If I typed ':)' I want ':)' not some stupid yellow face.

But are you sure that you didn’t really want `:slightly_smiling_face:`? (Auto-complete to passive-aggressiveness)


IIRC, Teams supports Markdown-style code, either `single line` or

```

multi

line

```


Truly a joy on German keyboards where ` is a dead key that makes the system wait for an e, or an a to receive that accent and certain OS don't come with a nodeadkeys alternative keyboard map. I often find myself with the ` in the copy/paste buffer when worrying code-heavy markdown because pasting is so much easier...


Can't you just type [deadkey]+Space to get the ‘undead’ version? That works on Linux and MacOS and ChromeOS.


Sure, but it's super annoying when you write something in markdown knowing that being lazy with the proportional/monospace mix will noticeably harm understandability. Particularly when you suspect that readers will be close to failing to understand, doubly so when you suspect that this reader might be a future self. Is it worth the time writing? Only if I get the monospace right. Dead keys can make that writing situation a weirdly subtle kind of hell.

So annoying in fact that I've just spent the last two hours with Microsoft Keyboard Layout Creator (yeah, I'm still a Windows person) shifting the dead key version to hide behind alt-gr.

Then I used it to mitigate some oddities on the dynabook keyboard (pipe and smaller/greater than weirdly moved to where the windows connect menu button usually resides) and continued duplicating some of the usual alt-gr suspects at locations inspired by their US KB position. Right now I'm an inappropriately happy person, let's see how I'll think about it in the future (getting too used to something non-standard cab be a terrible cost).


Probably, but having to type six keys just to open a code block is probably painful.


Six keys, with a distracting lack of visual feedback on half of them...


Totally agree. In some software, a strategically placed backslash helps.


use the code blocks?

even messenger supports the backticks and triple backticks now


There's more of this implicit US American bias in modern computing than you would think after the whole ASCII/Unicode mess - I thought they finally learnt something about computers actually being used by other people back then, but no.

One of my pet peeves: Badly implemented "press CTRL + / to search". Entering an actual slash, which requires e.g. SHIFT+7 on German keyboards, won't open the search box, but pressing the key where American layouts have a slash will.


As multilingual user constantly switching between 3 languages, that's why I fully committed to the US_intl keyboard layout a while ago and ditched the abomination that is QWERTZ. Took a while to get used to, but overall it has been a very noticeable improvement. One layout to cover all languages, no more constant switching.

Edit for clarity: I type "a to get ä and 'a to get à. If I want to type actual quotes followed by a vocal I need to “confirm” the quotes with space before typing the vocal. That’s all there is to it, really.


It may work for you, but there are users of languages that simply cannot be covered by QWERTY, and in these cases they would like from developers to simply fix their goddamn code, and bind to keys, not characters.


You are correct. It works for me.


For me too, maybe we should containerize users


They'll containerize themselves when they see basic shortcuts not working, no action necessary.


It works for you now.


For various reasons, I've been switching both in hardware and software between two different QWERTY (US and UK) and two non-English keyboard layouts for years (usually, I'll have multiple keyboard layouts configured, but I'll generally go with the one that is actually printed on the keys unless I really need a different one).

The funny thing is that for all the getting used to different layouts, the one thing that really trips me up all the time is the different hand positioning for Control+C on Mac vs PC. Luckily, you can change the modifier key bindings in MacOS independently from the keyboard layout. But yeah, shortcuts that assume everyone has a / or whatever key are a pain too.


It is fine as long as you don't have to user other people computers.

I used a QWERTY keyboard in France for a while, but switching back and forth between AZERTY and QWERTY each time I went on someone else computer, which happened often, I had to switch, qnd it is qnnoying.

You can learn to touch type but keep in mind that not all layouts are the same, you can bring your own keyboard, change the mapping, etc... There are solutions, but it wasn't worth it for me, so I got back to that terrible AZERTY keyboard because that's what everyone has in France.


BYOK ;)


I envy you for all your languages using Latin-based alphabets. I guess I'm stuck with two keyboard layouts for life because I need to be able to type in both Latin and Cyrillic.


Yes, this. Though I prefer the X11 compose sequence business where you type <Compose>'a to get á and so on. X11 compose sequences are a lot more intuitive than the Windows U.S. international keyboard because there's no ambiguities, so, for example, a cedille is just <Compose>,c whereas in the Windows U.S. international keyboard layout it's 'c -- X11 for the win here, as the latter is super a) counter-intuitive and b) conflicts with ć (U+0107).

Also, if you program or use a shell, then the Windows U.S. international keyboard layout is just unspeakably unbearable.


> Though I prefer the X11 compose sequence business where you type <Compose>'a to get á and so on. X11 compose sequences are a lot more intuitive than the Windows U.S. international keyboard because there's no ambiguities, so, for example, a cedille is just <Compose>,c whereas in the Windows U.S. international keyboard layout it's 'c

That sounds pretty good indeed. Any idea how could I go about setting this up under macOS?

> Also, if you program or use a shell, then the Windows U.S. international keyboard layout is just unspeakably unbearable.

I have indeed a couple of little, weird problems when working in the terminal. That's inconvenient but still beats switching between 3 layouts 50 times a day.


> Any idea how could I go about setting this up under macOS?

Sorry, no idea, but I'd expect it to be possible on OS X. And I really want this for Windows, too, as I have to use it.

> I have indeed a couple of little, weird problems when working in the terminal. That's inconvenient but still beats switching between 3 layouts 50 times a day.

Oof, I can't get used to having to type '' to make one ', "" to make one ", `` to make one `, in the shell, in $EDITOR, etc. It's too painful. So I switch layouts as needed, and I hate it. Compose keys would be so much better!


That’s quite memorable. On the Mac I have to press option-u and then a, u, or o to get an ä, ü, or ö. Optionally holding the shift key to get an uppercase vowel with the diacritic.

option-` is for grave (è) option-i is for circumflex (é) option-n is for eñe (ñ) option-c is for cédille (ç)

When you called QWERTZ an abomination I simply had to upvote your comment.


I used to be pretty good at those but now I just use the feature that works (almost) everywhere in Mac OS and iPhone - press and hold the character to get a little pop-up of options: møøśę

It's slow if you have to type in another language, but in English it's fast "enough" for words like résumé.


I am on macOS as well and the diacritics I mentioned ("a → ä, `a → à) work out of the box with the "US International PC" layout. If I press ⌥a I get å with my layout.


I for some reason have a very low tolerance for dead keys. Two keypresses for one common letter becomes too much.

Thankfully I only need to write in two languages. With almost the same script.


Yup, did the same. And the layout I happen to work with also has a third layer (menu button) that includes äöüß--that was a glorious discovery.


I do you enable the third layer? I don't have a menu button on my Mac.


Do you pick the "a for an umlaut? It seems like :a would be be typed less commonly than "a, so less need to confirm the punctuation.


I didn't pick it, that's how the layout works out of the box. I agree that :a would be probably more convenient.


How do you type ä ö ü ß ?


There's loads of ways. Dead keys is one, the compose key is another suitable one for multilingual folk. I can write ä ö ü ß pretty much automatically with the compose key on a US layout, as well as any é ë ê ü I might need in Dutch.

In the Netherlands letters with diacritics don't even have their own key (Dutch computers tend to use US plus €).


Personally, I don't. I live in Muenchen, not München. When strictly needed, CTRL+C/CTRL+V from a Google search makes the trick.

When writing email/docs, autocorrector for the win.

Same thing in Italian: `e instead of è. Although, the German replacement is cleaner, I'd say.


> I live in Muenchen, not München.

That's… quite certainly not an opinion shared with all of your fellow inhabitants.


You are right, but this is the internet, and the official website for the city is https://www.muenchen.de/, so ¯\_(ツ)_/¯


> I live in Muenchen, not München

As someone from Espanya, I welcome you to the club.


Interestingly that's the Catalan spelling.


Not that you guys have it any better (with the ç and two, rather than one, sets of accent marks)


Oh we have more than two sets of accents, we have acute, grave, circumflex and dieresis. Spanish also has ¨ I think, like in güero or piragüismo.

We don't use the accents in the same way as Spanish, Portuguese, Catalan and Italian, it doesn't mark stress.


You might want to consider using a keyboard layout that supports diacritics. I use UK Extended Winkeys for example, where I can do AltGr+[ followed by a letter to get it with a diaeresis, allowing me to type äöü. It supports almost every European language character.


I like that. Makes me wish I could just input all my text in Emacs which would make mapping such things to the “proper” sequence of strings very simple.


Not OP, but I have a Mac and use the "ABC - Extended" keyboard, which has a full set of diacritics. You type them by using the option (⌥) key and a letter to set the diacritics and then type the letter.

    ⌥u  e = ë

    ⌥w  e = ė
For example, the key sequence required to type Kayodé is:

    ⇧k  a  y  o  d  ⌥e  e
etc.


I use US_Intl keyboard for Swedish and Portuguese and can type all of this without the need of special char keys: ä ö å - á à â ã. Just a mix of Opt+(u/a/e/i/n/`) and the next character I want with the diacritic.

Edit: and ß is just a Opt+s. It becomes quite logical after you get the hang of it.


Personally I use a compose-key on Linux and Wincompose on Windows, with SyncThing to keep my .XCompose file synchronized between machines when I add a custom sequence.

I mapped my Caps-Lock key as Compose, so I hit compose-a-" to get ä, and so on through the set. I've made customs for Ω and µ since I use those a lot and couldn't remember their default sequences. It's handy having ® and ™ on tap as well, and I get typographical niceties like ­— and … vs - and ...


Not GP but I use a similar setup. I have an "extended" US keyboard layout where AltGr+(location of character on native keyboard layout) types the character. For example, AltGr+[ is "ü". I find this more convenient than the compose-like approaches suggested elsewhere in this thread.

I use it because programming language syntax is mostly ergonomic only on US-like layouts, e.g. typing "{" causes a lot of strain if you have to do it on every other line.


On Linux I use the "altgr-intl" layout variation.

After pressing the right Alt key, AltGr, all of those characters are possible to type. Here's the layout:

https://upload.wikimedia.org/wikipedia/commons/thumb/2/22/KB...


⌥u creates the two dots, then you press the key for the vowel to put underneath.

⌥s creates the ß.

(Both obviously only applicable for Macs)


Applicable to linux if you set the mac qwerty keyboard layout, which is surprisingly convenient.


That would be… inconvenient. That’s not how I use it. I type "a to get ä and 'a to get à. If I want to type actual quotes followed by a vocal I need to “confirm” the quotes with space before typing the vocal.


French on US intl here ä ö ü ß are easy to make on such a setup but also the accented letters like é à ô or the infamous ç. It is even easier to type them in capital than it typically is on AZERTY (French keyboard) or QWERTZ (German one)


I swapped to us_intl as well full time a few years back (Swedish)

right alt + q = ä

right alt + w = å

right alt + s = ß

right alt + y = ü

It took a day or two to get a hang of it but it's such a relief not being constrained by everything being developed for US layouts


Using xkb map us(intl-unicode) it's a then alt gr + shift + " (I'm not totally sure if that's the correct, and ß is alt-gr + s.


AltGr+q; AltGr+p; AltGr+y; AltGr+s and Bonus: § (AltGr+Schift+s)

Once comited to muscle memory it's very easy


On a mac keyboard you can just long-press?


That's fine if you live elsewhere and type your Danish colleague Lærke's name once in a while, but if you speak a language using non-ASCII characters it feels rather slow to long-press every few characters.

> Danmark, officielt Kongeriget Danmark, er et land i Skandinavien og en suveræn stat. Det er den sydligste af de skandinaviske nationer, sydvest for Sverige og syd for Norge, og det grænser op til Tyskland mod syd. Grønland og Færøerne indgår også i den danske stat som rigsdele med selvstyre indenfor Danmarks Rige (Danmarks forhold til rigsdelene kaldes Rigsfællesskabet).

Eight Æ+Ø+Å in 382 characters.


This kind of thing usually only works in native apps that use NSTextView for text input, which is unfortunately not all of them.


I've never had that work reliably for me, across three completely different Mac installs. :/


It works in "most places" but not in others (like (some) terminals, for example).

    defaults write -g ApplePressAndHoldEnabled -bool true
Might help if it got turned off.


Same here


Stuff like this is super, super common. Even where you have things like CTRL + X/C/V for cut/copy/paste it's clear that the letters were picked to be near each other - but then often the shortcuts will remain the same even on keyboards where those keys are nowhere near each other.

This can be one of the reasons that switching to Dvorak can be super hard, all the shortcuts become quite weird unless you remap them each by hand.


9ev: pressing the key where American layouts have a slash will [open the search box]

bombcar: often the shortcuts will remain the same even on keyboards where those keys are nowhere near each other

You're objecting to opposite things: you don't like that the shortcuts stayed with the same symbols instead of staying with the physical key, and 9ev the other way around.

(And also 9ev is objecting to misleading documentation)


Yeah - and balancing the two is something often entirely overlooked and people stuck with the problem have to learn the options and try to determine “is this key the character or the key (in a US layout)”.

Issues like this are why often people just give up and learn US English enough to use the computer, then at least it’s tested and consistent.


Yeah, if I could go back in time I would have told my younger self that it's okay to use something other than qwerty, and it's okay to use vim, but trying to to do both was a hilariously terrible idea.


Some go as far as translating shortcuts, which I dislike even more. In Spanish Windows, copy is CTRL + C (copiar), but paste is... CTRL + P (pegar)


Reminds me of a US tap in Mexico I saw once - the installer obviously decided C must be for caliente which meant the H should be hielo.


That's great. To be fair, caliente and helado doesn't seem a very unreasonable interpretation if you're installing in a place where people don't speak English


> but paste is... CTRL + P (pegar)

... Oh, for fuck's sake! What English word do they think "V" is short for in the first place, there?


CTRL + Z for undo in QWERTZ keyboards is just absolutely impractical.


You mean it's not X for xterminate, C for copy and V for velcro?


I live in France but use QWERTY instead of AZERTY, and I once had an opposite problem: on a French website, there was an <input> for phone number, validating that it stays [0-9] on each keystroke.

The validation was done by checking for the AZERTY-specific key combos that result in 0-9; on AZERTY to get a <number> you need to press SHIFT+<number>.

So if you had a QWERTY keyboard, you could not enter your phone number, the input stayed blank. (If you tried with SHIFT, you could get !@#$%^&*(() to the field however.)


They didn’t allow NumPad?


My latest peeve here is the websites that try to filter out non-numeric key presses in form fields. You don’t know where my number keys are! Sometimes the only way to enter a number is to copy and paste it from a text editor.


Ooh, good point. I should test if that works with a French layout in our app :)


I use Dvorak-dvp, and I hate those form fields.


Even at a multinational like apple. I was fighting in the provisioning dash board a few years ago because it would not load. Issue was my last name which has an umlaut in it. This wasn't 1990 it was 2012 issh and I still see companies unable to deal with umlaut.

I believe either FedEx or UPS still can't deal with it.

In top of that Swiss mail is quite strict to how a letter is addressed and what is written on the mailbox. Luckily the mail carriers know to ignore garbled umlauts.


When I was living in Japan I always had to "romanize" my Japanese address if I got something shipped from USA because USPS only allows ASCII. Even if this did create problems locally - post always arrive quicker (in my experience) if the address is in proper Japanese. When I got mail from my home country I got it forwarded via my company and they always just stuck on a printed label with my address, in Kanji. No problems there. Worked fine for years.

However, my home country postal system has since changed to an online system for sending post out of the country: You go to a website and enter the destination address, then you pay and it'll print a code which you just give to the actual post office when you bring the letter/package whatever. What's infuriating is that they don't allow Japanese addresses - I can enter it, but it just creates an error message. So I again have to come up with a crude romanized address and cross fingers that it'll arrive (which it usually does - but up to a few weeks delayed).

In short - things are not getting betters over the years.


The problem is that they were dealing with it successfully before, by either treating the umlaut as a "u" or a "ue". It's not like umlauts are this new technology that English speakers have never been faced with in their business ledgers. But what you see in older business ledgers, is a variety of different spellings, and a human brain that can map them to each other, with a little bit of training. And when software came along, most systems just used the "u" approach, and that kinda worked as software systems grew to massive scale.

Now the hard part is changing how to deal with it, because now you need to patch all these huge production systems that were working before, and it's a breaking change.

So if you have a big stack: UI --> business layer --> DB

You can't just change the UI, as that will cause breakage in the pieces below. You start with the DB change, then the business layer, and you do the UI last.

Moreover this is not a simple change. Getting a DB to handle unicode is a known thing, but how do you train your customer support personnel to do data entry with unicode characters? Do you expect the customer to remember the code point? There are many characters that look the same which have different code points. So what you need to do is come up with a list of officially supported extended character code points and do entry with those, but still you will have others -- users of Asian scripts, for example, whose characters will be missing.

And you will get customers still doing the old "u" approach together with the official umlaut code point, sometimes for the same name and the same package. People will call up, trying to find their package, and randomly used one convention, since even though you can train your staff (which isn't cheap) you can't train your customers. So you need some system of identifying different orthography for the same underlying name, some with "u" and some with "ü" and some with "ue". And then that's going to require changes in a lot of other systems, for example reporting systems, etc. We see this problem today, taken to the extreme, in the various different spellings of "Gadafi".

But really it's worse, as a distributed system has

UI <--> business layer <--> regional DB <--> business layer <--> regional UI

And now it's much harder to make these changes without taking systems off line, for a distributed global network.

What they probably did, as most companies do, is look at all these problems and costs, compare them to relatively small benefits, and kick the can down the road, waiting for the next major system upgrade, and sometimes these next major upgrades can take 20 years to happen. Or even longer.

In other words, the hart part is the incremental breaking change to the system, not the dealing with the umlaut. If this was a brand new code base that was just being written, they could make it handle umlaut with much less cost.


Unless it's changed recently, IntelliJ IDEA has similar annoyances[0]. Again, using the equivalently placed key works instead. Are they using scan-codes for some reason? Seems extra bizarre given that JetBrains isn't even a US based company!

[0] https://intellij-support.jetbrains.com/hc/en-us/community/po...


Every small indie game ever: action buttons are x and z.

Guess what, y and z are swapped on a German keyboard.


That's an illustration that the problem is not so easy.

What your parent comment complains about is that the keyboard shortcut is based on the location of the key, and not the letter.

What you complain about is that the keyboard shortcut is based on the letter, and not the location.

Obviously, these complaints don't contradict each other, both make sense in different circumstances. But figuring that out requires awareness of different keyboard layouts, and of the difference between KeyboardEvent.code (location) and KeyboardEvent.key (letter).


Give the actions appropriate names and allow users to remap them. Have a few common layouts you can select from. Allow users to select a language and locale (separately) when they first launch. Refer to the actions not the keys in the documentation, and when you do specify key (e.g. in parenthesis) have it dynamically reflect the current setup.

Seem rather straight forward.


And W and Z, and Q and A on French keyboard. WASD is totally unusable, and unfortunately quite common in web apps for instance.


I would say it's more a problem of the underlying système (game engine or os) that should manage the compatibility between keyboard layout


I really hate that so many things break after you switch keyboard layouts.

Many shortcuts are positional, like "hjkl" or "wasd" for moving , or control/meta+numbers but there's no good way to denote them without assuming you are using a qwerty keyboard. Programming against them is possible,but many times there's limitations, ignorance about the different layouts, or straight neglect.


That might a problem with a specific website, but you can easily implement that correctly using the event key property[1].

[1]: https://developer.mozilla.org/en-US/docs/Web/API/KeyboardEve...


Thank you for the suggestion, but that's indeed a problem with lots of websites that get this utterly wrong. Browsing the web using a non-US keyboard is fun.


A very common bug is people confusing key codes (the raw codes for the physical keys on the keyboard) and characters (what they type). This "works" when you use the QWERTY layout, but completely breaks when it's something else, like ЙЦУКЕН.


It's amusing that for the huge amounts Google pay developers it doesn't necessarily buy common sense or empathy for the user.

Google's currently demanding I settle an invoice for an account I closed... and the only way to do this, or even get support, is if you have a Google account. And zero way of replying to the email their Collections department.

On the one hand they have a massive dominant position in multiple markets. On the other hand, if you're even slightly off the happy path you don't exist.


There was a comment the other day in the discussion about incorporation in Germany. Someone complained that in Germany, they conduct their business in German: the justification was, it's our country, our language, get used to it.

Maybe the blunt response to complaints about the US American bias is, it's our software, our (ASCII/Unicode) language, get used to it.


If I was trying to incorporate in the USA, and the registration form would be unable to process diacritics, sure — even though that would neglect a large part of your own population (see the ASCII debacle) — but we’re talking about internet services marketed globally. I don’t think it’s too much to ask to have an app at least working as designed. We’re not even talking about translating stuff!


If people of other countries didn’t want American bias in their systems they should’ve made their own and made them better than American ones are.


If American businesses wanted the money of people from other countries, they should've made their systems internationally accessible.

I mean, seriously, what kind of antiquated statement is that? Do you actually want to keep geographical borders to persist to the internet, or work towards lower barriers for everyone..?


> If American businesses wanted the money of people from other countries, they should've made their systems internationally accessible.

They haven’t and they got the money anyway.


Assuming what they got was the maximum obtainable amount, yes :)


I think the title should say "in French *Google docs." As is, it seems like French code documentation/papers are littered with emojis?


But it also occurs on Atlassian suite, Gitlab, Github and probably more tools…


It's the same with Gitlab issues and comments and it's a small but very real pain.


Same for Github as well. Super annoying when you're in the middle of an issue


UX testing should be done in a stressful and chaotic environment. Good UX becomes invisible, bad UX becomes abundantly obvious.


A similar insensitivity is the observation that none of the major Android keyboards allows true disabling of auto-blank, which is super annoying in a language like German where custom composite words occur a lot. Modern keyboards offer a wide selection of clever tricks to keep the auto-blank from messing up punctuation, but allowing one word to be swiped directly after the one before? I'm sorry, Dave, I'm afraid I can't let you do that. This post is written on Swype, final release back in 2014.


I have a partial solution in TamperMonkey here: https://gist.github.com/derac/887efd8891caa026322b7624954893...

I wrote it over my lunch break quickly, it only closes the menu the first time. I was trying to add the functionality to close it every time you enter the key combo, but it wasn't working. If someone wants to improve this feel free. MIT licensed.


doesn't really matter since they are fixing it, but I made this work


The same happens with Jira. It's a nightmare since if you press escape to quit the emoji mode, you abort your input and lose focus. There should be a way to configure that shortcut to whatever suites you. Anyway i don't get why web apps bother implementing their own emoji input, the operating system does it already (the windows key + ; shortcut for example). And to reply to those who simply ask people to change their habit: it's rude. Imagine the other way around: all the English typing people having to insert a space before a : for whatever reason, would that makes sense to you?


So this was reported on October 31st, then escalated on November 3rd, and the "feature" is still in there. Seeing how severly this impacts users from France, this is slightly diappointing.

Here is a discussion in the product support forum:

https://support.google.com/docs/thread/186496870/how-to-deac...


We have some templates for gitlab issues at work, and it always bothered me that all the colons had a space before it. Both because it triggers the same annoying emoji feature, and also because it looks weird to me. Now it all makes sense: although the templates are in English, they have been authored by a French person, so the rule just bled into English writing.


It was also the case on gitlab/GitHub at one point


Seems like the person who came up with the idea got fired and then hired by Google ;)


In Zendesk, you get an Halloween pumpkin emoji instead of a thumb up. I always wondered if it was an annoying easter egg or a bug.


Auto insertions or anything that does things behind the users' back should go die in a fire.


Problem is that this is totally normal for CJK users, where you really want to type in a syllabic or phonetic or similar system and get drop-down selection menus for Chinese/Kanji characters to replace your writing with. Doing that for emoji is basically the same thing for CJK users. For Latin script users, however, this is very disruptive.


I dabbled in Japanese; you are not comparing the same thing.


It's not that different.


It is because it is required to produce the wanted characters and is usually implemented by the OS / browser, not at the final application level which interferes with this process.


The UX is still that you get a drop-down picker. What component of the system implements that seems of little importance to users unless that becomes obvious (each app doing it differently, though obviously that happens).

I expect that the emoji picker will become part of the OS eventually. Heck, Windows has one that you have to call up explicitly (with <Alt>-.), iOS has an emoji "keyboard" (input mode), etc, so we're headed in that direction.


Same thing happen in pascal in lot of forums, in:

var x:Pointer;

The :P changes to tongue out emoji


8)


i have caps lock and escape swapped on my computer. soooo nice for vim and just in general.


I didn't realize French was written with spaces before colons. It might be time to lose that space, similar to how we don't double space after period.

I'm not saying G.Docs or any editor should dictate how French is written, only being pragmatic if I have to choose between (no space): or unwanted emojis ending up in docs.


We used to double-space after a period because that looked better on monospaced typewriters. But in French it's simply a part of punctuation, and getting it wrong means you wrote it wrong. It's as wrong as if I'm writing English and put a space before the ':' or '!', or put a comma after a quotation mark. French people shouldn't have to change their language to type in Google Docs. It should be the other way around.


In French it's a typographic rule become an orthographic rule. It doesn't have to be that way.


>It might be time to lose that space, similar to how we don't double space after period.

Well, that's is a bit like saying "it might be time to write your whenever you mean you're" just because Google Docs' autocorrect feature kept messing up the two when you wrote English.

How would it look like in an email to a client?


>if I have to choose between (no space): or unwanted emojis ending up in docs

We absolutely do not have to choose between those two things. We need to stop implementing cute automatic bullshit that makes assumptions about the way people interact with UIs.


I'd upvote for the first part of the second sentence, but downvote for the second part of the second sentence.

Typographically speaking English language text should have a wide space following a sentence-ending period [that isn't also a paragraph-ending period]. However, because wide spaces are not easy to type on normal keyboard layouts, the simplest thing to do is to type two spaces after sentence-ending periods and let the word processor change those to wide spaces when using a proportional font. When using a fixed-width font, however, it should always be two spaces after a sentence-ending period for two reasons: to disambiguate non-sentence-ending periods, and to make it easier to read. Coders write a lot of text in fixed-width fonts, so we should write two spaces after sentence-ending periods in, e.g., code comment blocks.


> similar to how we don't double space after period

What do you mean? Every sentence in this comment has two spaces after the punctuation. That one and this one. Just because browsers and html condense it all into one space doesn't make it right.


The French aren’t gonna change their language over an emoji picker. :slightly_smiling_face:


They should abandon out of date typographic rules though. As a French speaker, I want to.


If we had he opportunity to simplify some rules in French, it's not the one I'd do first.

How about getting rid somehow of the letter ù which is used for only one word (où)? Or replacing quatre-vingt by huitante (and same for 70 and 90) like in Switzerland? Or getting rid of œ and æ?

But none of it is realistic. We officially changed from oignon to ognon 32 years ago and people still don't know about it...


I find the 20-based counting system cute. I like it. In Belgian French they dropped it, but to my ear "septante", "huitante", and "novante" sound weird!

I don't mind "où". I especially do not want the circumflex accents removed (you didn't mention it, but there was an attempt in... the 90s?).

I wouldn't mind some spelling simplifications around all those eau/eaux/oeu/oeux type word endings.

> Or getting rid of œ and æ?

Yes.


the us is fairly unique in the rate it accepts common mistakes into the language. its part of how they diverged from the standard english used in the rest of the world so quickly...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: