This space before the colon and other marks in French should actually be a narrow non-breaking space (U+202F) [0]. There's no key for it in the AZERTY layout.
This has been a problem since the typewriter age. People having to get on with their jobs coped with it by using a full, breaking em-space. Unless this gets replaced automatically by the word processor, you get horrid typography and misplaced line breaks all over the place.
The Académie Française should have dealt with this years ago, if their ass wasn't stuck in the 17th century.
A modern solution is simply to have non-breaking space easily accessible in your keyboard layout for when you need it. In the BÉPO layout this is at SHIFT + space. Especially simple since all double punctuations (:?!; those that require an nbsp before them) are also accessed with SHIFT.
Azerty layout could certainly map shift-space to unbreakable space too
But honestly, rather than changing keyboards (which is hard), why doesn't Google just pick a shorthand that doesn't break typography rules, like `@` instead of `:`
The : allows users to discover it by coincidence when trying to type a smily :)
If triggering it by accident wasn't intended, they could have used a regular keyboard shortcut like Alt+Shift+E or even instruct users to press Win+. to use the system-wide emoji menu.
A modern solution is to abandon old, no-longer-relevant typographic language rules, or to make typographic language rules context-specific.
But I agree that we need to make several alternative space characters easy to type:
- non-breaking space (for this French rule)
- wide space (for disambiguating sentence
ending periods from non-sentence-ending
periods)
- zero-width non-breaking space (for
preventing word-splitting?)
Not "rules", just this rule. It has to do with typographic considerations that apply to old typesetting technologies that are no longer in use.
> Why should we change language to appease lazy software developers?
It's been done.
For example, in Spanish it is no longer the rule that "ch" and "ll" sort as if they were distinct letters (this change was made in 2010) precisely because that was such a difficult rule to implement. And that was a 256 year-old rule per-wikipedia:
The digraphs "ch" and "ll" were
considered single letters of the
alphabet from 1754 to 2010 (and
sorted separately from "c" and "l"
from 1803 to 1994).
For another example, in Spanish capital letters were required to not carry accents, but now they are allowed to not carry accents. This was due to the use of overstriking on typewriters working to accent lower-case letters but not upper-case letters (apostrophe would collide with the glyphs for upper-case vowels). But the technology to resolve this has existed in the Spanish-speaking world for a long time now, so the rule was finally dropped. (Not accenting upper-case letter can lead to ambiguities that are annoying.)
It's not just precedent. It's that the original reason for some typographic (not even orthographic) rule is simply not relevant in 2022.
And it's not unreasonable for either French people, non-French French speakers, or just non-French French-non-users to propose the ditching of hard-to-implement French rules. Now, this particular rule is decidedly not difficult to implement, but it is an annoying rule to apply as a user -- I should know, since I speak and write French (though I am not French).
Also, it doesn't matter what the French Academy says, or what the Spanish Royal Academy says, or what Webster's dictionary says, or whatever. Language evolves, even to their consternation. Moreover, developers don't have to care that much -- I18N/G11N is fun enough, and employers have to care for legal reasons, but rules like the Spanish ch/ll rule can be much too hard even for non-lazy developers, and the Royal Spanish Academy can and did have to change, and it was for the better.
In contrast, the Hungarian "cs", "dz", "dzs", "gy", "ny", "sz", "ty", and "zs" all remain distinct, as do their accented vowels. Polish is hybrid, considering digraphs as being 'composed' of single letters (i.e. 'sz' = 's+z'), rather than being distinct, but the accented characters are considered distinct from their unaccented cousins.
This problem has primarily come about because the Catholic church enforced the latin alphabet on languages where a different alphabet might have been more appropriate. Spanish, although more closely related to Latin, still has a few sounds which there's not a good latin character for. There's no (particular) reason (as far as I'm aware) that 'ch' and 'll' became diagraphs, while ñ acquired an accent.
Why should traditions change just because it's a bit more difficult to do things the 'old way'? why do we still bother with capital letters at the beginning of sentences? or speling things with two leters when one wil do, or riting silent leters wen you cant tell the difrens? & i dont think we need apostrofees n e more.
Indeed, there's no reason for 'ch' and 'll' to have been distinct digraphs in Spanish and ñ not to have been 'gn' as in other Romance languages. I'm not familiar with the whys of that.
I didn't know that Hungarian had a similar issue.
> Why should traditions change just because it's a bit more difficult to do things the 'old way'? why do we still bother with capital letters at the beginning of sentences? or speling things with two leters when one wil do, or riting silent leters wen you cant tell the difrens? & i dont think we need apostrofees n e more.
I distinguish typographic and orthographic rules. The non-breaking, thin space before punctuation rule is typographic and outdated (i.e., motivated by outdated typographic technology).
I do want some orthographic rules reformed too, but I'm more interested in the ones that are just hard. In particular I'm interested in collation reform because we do often have to collate multi-language text items but with one collation -this is especially true in databases- so having collations for Latin-script-using languages be similar is rather useful. This is also true given that I'm not going to be switching locales when I switch languages -- I speak, read, and write multiple languages, but I never ever change locales.
>A modern solution is simply to have non-breaking space easily accessible in your keyboard layout
An even better solution would be --for grown up people who have progressed beyond cave-painting and want to communicate using, you know, actual words-- to be able to disable emojis completely.
Fucking moronic shite that they are. I've seen people on Twatter and FB have entire conversations in bloody emojis. Talk about reverse evolution! Why don't we just go back to grunting and gesturing and have done with it?
I guess you only speak monotonically and avoid interpreting pitch, pauses, and physical cues in conversation as well, since those would be archaic, right?
smiley face. crying face. row of hearts in different colours. strange yellow thing that might be a banana or an arm flexing its bicep. sports trophy. winking face.
My god! --you're right. This is so much better and clearer than using those boring old fashioned words.
If you fail to put a space before a colon when writing in French, what happens next? Do people point and laugh? You get disciplined? Or would French speakers accept this as a better way to use the colon character?
It looks like this rule is based on old typographic considerations. Much like the Spanish Royal Academy's rule that capitalized letters carry no accents (unlike the opposite French rule that capitalized letters do carry accents!), which stems from typewriters not having accented letters, so one would type a vowel, backspace, then an apostrophe to make an accented vowel, but for capitals there's not enough space so you couldn't and wouldn't overstrike them.
Users and language academies should distinguish typographic from non-typographic language rules, and typographic rules should be context-specific (well technology-specific, since technology is the context).
I don't know, what would happen in English if you didn't capitalize the days of the week? Do people point and laugh? You get disciplined? Or would English speakers accept this as a better way to write the days of the week?
No human language on Earth is in a position where it can laughs at others for their idiomatisms.
I do not think that was their point (to laugh at other languages), but rather, that there may be contemporary situations in which warrant rethinking the idioms of a language. As an aside, I do not think many people who speak english would even notice a lack of capitalization for the days of the week.
In English most people wouldn't care. You're going to get in more trouble in French schools for this than in U.S. schools (I don't know about UK schools, or elsewhere in the Anglosphere). My impression is that European culture is a lot more sensitive to these things than U.S. culture.
It's more complicated than you imagine it is. Basically you need to know both: whether the current locale is a French locale, and also whether original text was written in a French locale.
The former is easy enough, but also very annoying to multilingual people since one might run in a Spanish locale but occasionally write in French. So that's not a solution.
The latter is... hard to do, because while Unicode has language tags that you can embed in documents, those are deprecated and they were never well supported, and so there's no way to mark-up text as being in one language or another, and a document-wide setting wouldn't be enough nor sufficiently generic and standard and portable.
The best solution here is to relax the French typographic rule (since it isn't needed anymore). But that would take time to filter through to French speakers (writers, and readers) so that they learn to not put that pesky space before punctuation, but also so that they don't complain when it's missing.
Or... you know, this business of emoji pickers could be something you could turn off. Nahhh, that would never fly! (/s)
IMO we should have tried harder to make Unicode language tags useful and used. But it didn't happen, so they're a thing of the past. Of course, they're still there, and one could attempt to resurrect them, but most likely one would fail.
> RFC 2482, "Language Tagging in Unicode Plain
> Text" [RFC2482], describes a mechanism
> for using special Unicode language tag
> characters to identify languages when needed.
> It is an idea whose time never quite came.
> It has been superseded by whole-transaction
> language identification such as the MIME
> Content-language header [RFC3282] and more
> general markup mechanisms such as those
> provided by XML. The Unicode Consortium
> has deprecated the language tag character
> facility and strongly recommends against
> its use. RFC 2482 has been moved to
> Historic status to reduce the possibility
> that Internet implementers would consider
> that tagging system an appropriate mechanism
> for identifying languages.
>
> A discussion of the status of the language tag
> characters and their applicability appears
> in Section 16.9 of The Unicode Standard
> [Unicode52].
And thus impossible to use the emoji inserting feature, and other bugs (silently replacing whitespace with different but looking the same whitespace will make for fun issues)
Well, double colon :: is just as convenient, make more sense and not a problem in either French or English. Even in English, you might want to stick an emoji to the previous word. Like Hell::devil:: (my nephew would like that though).
Double colon would make talking about perl even worse (:: is namespace separator) -- it's already bad enough that any namespace starting with D becomes an emoticon in pretty much all online editors.
This stuff should just stop. Operating Systems / Browser vendors should instead standardize on a hotkey to bring up an emoji selector that steals focus to filter via typing and inserts on enter key.
This sounds like it's not actually true? If it was, the French code page 646 that we used until Unicode finally won would have included a narrow space, but it doesn't. "Regular" computer text in French has only ever used a normal space, even if handwriting and/or "true" typesetting using typesetting solutions like TeX or PageMaker etc. allowed for a narrow space.
FWIW, LibreOffice automatically inserts an actual Unicode NO-BREAK SPACE when I type ":" at the end of a word (if the language is set for French of course). If I insert an actual SPACE and then hit ":", it even replaces the SPACE with a NO-BREAK SPACE.
I'd be surprised MS Word doesn't do the same. No need for a "true" typesetting solution.
Codepages are from back in the era of little memory available and monospaced fonts.
And another "US English" centered thing; spacing does not really matter in English, but can have functional differences in other languages and scripts.
Not so much contradict as wondering about the claim that it should be a specific Unicode codepoint when Unicode wasn't around when we started "computering" text (and the Académie Française can't have possibly formally declared things in terms of Unicode =)
What are the actual official rules in this case (and are there links to those? Because that'd be fascinating information to read through)?
Actually before the ":" specifically there should be a regular non-breaking space, not a narrow one. Except in Switzerland. Other punctuation marks take the narrow non-breaking space.
This is partially false, because there isn't one true AZERTY layout. There are various platform implementations, with MS Windows being the most common.
In fact, french standard body AFNOR actually updated their AZERTY layout standard three years ago to include more characters, including the narrow non-breaking space. In traditional ISO-like fashion, one must pay to access this standard, but you can find an example here:
CORRECTION: narrow non-breaking space goes before ";", "!" and "?". Before ":" you should use regular non-breaking space. That is, in France. In French-speaking Switzerland it's a narrow non-breaking space everywhere.
Personally I like auto suggestions like this, but I never want it to interrupt typing, so it should never grab input when you press typing-related keyboard keys, including the enter key, space and arrow keys which you need to move the cursor around.
I do not want to have to close boxes that appear at arbitrary times with the esc key to continue regular typing, since the box interrupts flow and there is a reaction time between seeing the box and closing it with esc.
It would be ok to me if you have to use another less regularly used key, e.g. tab or ctrl+space, first to get into the suggestion box and then use arrows to select one.
This applies also (especially in fact) in code editors, chrome devtools, etc...
I wish others felt the same so this feature would be implemented in a less interrupting way by default. I've never seen it implemented in a non-interrupting way in anything ever.
Yes, at some point I reconfigured autocomplete in to only complete when you press <tab>. I almost never press <tab> otherwise when I’m typing something.
I like the implementation in a bunch of IM clients, at least Teams and iMessage: emojis are treated as autocorrect, so if you type the full text version of the emoji (not just the first character!) it gets replaced with the emoji. In the rare cases it's wrong, hitting escape after the autocorrect fixes it.
I don't see why the Google Docs implementation wouldn't work that way. If it really must pop up a box, allow the user to continue typing in document and gradually reduce the list of candidate emojis. As soon as a character is typed that either complete the emoji or rules all of them out, the box goes away.
The issue is that a very common key to hit after a colon is Enter - which selects the emoji in question. If you simply keep typing other letters, you don't have a problem.
Yes; since emojis don't contain <enter>, in my model it should dismiss the dialog and not do a replacement. The error is in the emoji panel stealing focus so <enter> is seen as a selection in the panel. If they panel did not have focus and keystrokes still went to the main app, this would work fine.
Interestingly, when using Kate as a Markdown editor I'm interrupted by the word-based completion panel popping up, grabbing input, and incorrectly completing words when I press Enter. I wish it either didn't auto-popup or didn't grab input in Markdown mode.
In Kate you can turn off auto complete in the Settings, Editing, Auto Completion tab, or set a minimal word length to complete to something long so it's less likely to appear for common letter combinations but still possible for longer words
Of course as always no settings to have a way to enable auto completion but not use the standard typing keys :(
Typographic nerd: technically, you should you use a non-breaking space (espace insécable) before colons in french. Doing that does not show the popup. Wait, what do you mean your keyboard layout does not have it?
Keyboard layout nerd: you should use a keymap that has it. Like AFNOR's latest AZERTY or BÉPO.
Historical nerd: traditionally wordprocessors have auto-inserted non-breaking spaces before colons, why should I care now ?
> Historical nerd: traditionally wordprocessors have auto-inserted non-breaking spaces before colons, why should I care now ?
Because I don't want to switch locales in order to write short bits of text in another language. I don't even want to have to switch keyboard layouts -- just use compose sequences for diacritical marks and so on. And I don't use a French locale, but I do write French text sometimes.
I would imagine that in Europe this is a big deal. So many Europeans are multi-lingual... But you don't expect a Spaniard to switch to a French locale to write in French and vice-versa -- that's too disruptive.
Non-breaking spaces are transformed automatically when writing French by Word, LaTeX, and other text processors. Why would you want to learn the many ways of writing a spacing character, when your computer can do it for you?
This is all a big pain in multi-lingual documents. What, am I supposed to switch locales every time I switch languages? No no, that's absolutely not OK.
I'd agree, but this is even more subtle, as a sibling comment to yours said: the space is actually smaller. So text rendering should take this rule into account as well. But then you're assuming that you're doing single-language text rendering. Putting the character at writing time still seems like a simpler tradeoff (maybe with editing help like auto-replace or correction).
This annoys the hell out of me and I only speak English.
Microsoft is probably the biggest development-focused company in the world, but their own work communications app, Teams, doesn't allow you to paste small pieces of code in a chat because it replaces punctuation with emoji.
Regardless, if I want an emoji, I will type the emoji. If I typed ':)' I want ':)' not some stupid yellow face.
Meta's Facebook Messenger doesn't allow those forbidden strings either. Meta's WhatsApp mobile does, but WhatsApp web does not.
My phone has an emoji picker if I want an emoji. I don't want what I type or paste replaced with an emoji. Anywhere. Ever. This practice should stop.
In Skype you have to write {code}...{code}. Or prefix your entire message with @@ (but it used to be you could prefix !! -- why oh why did they change it?!).
I so so wish we could standardize on GitHub markdown for all these things!
Truly a joy on German keyboards where ` is a dead key that makes the system wait for an e, or an a to receive that accent and certain OS don't come with a nodeadkeys alternative keyboard map. I often find myself with the ` in the copy/paste buffer when worrying code-heavy markdown because pasting is so much easier...
Sure, but it's super annoying when you write something in markdown knowing that being lazy with the proportional/monospace mix will noticeably harm understandability. Particularly when you suspect that readers will be close to failing to understand, doubly so when you suspect that this reader might be a future self. Is it worth the time writing? Only if I get the monospace right. Dead keys can make that writing situation a weirdly subtle kind of hell.
So annoying in fact that I've just spent the last two hours with Microsoft Keyboard Layout Creator (yeah, I'm still a Windows person) shifting the dead key version to hide behind alt-gr.
Then I used it to mitigate some oddities on the dynabook keyboard (pipe and smaller/greater than weirdly moved to where the windows connect menu button usually resides) and continued duplicating some of the usual alt-gr suspects at locations inspired by their US KB position. Right now I'm an inappropriately happy person, let's see how I'll think about it in the future (getting too used to something non-standard cab be a terrible cost).
There's more of this implicit US American bias in modern computing than you would think after the whole ASCII/Unicode mess - I thought they finally learnt something about computers actually being used by other people back then, but no.
One of my pet peeves: Badly implemented "press CTRL + / to search". Entering an actual slash, which requires e.g. SHIFT+7 on German keyboards, won't open the search box, but pressing the key where American layouts have a slash will.
As multilingual user constantly switching between 3 languages, that's why I fully committed to the US_intl keyboard layout a while ago and ditched the abomination that is QWERTZ. Took a while to get used to, but overall it has been a very noticeable improvement. One layout to cover all languages, no more constant switching.
Edit for clarity: I type "a to get ä and 'a to get à. If I want to type actual quotes followed by a vocal I need to “confirm” the quotes with space before typing the vocal. That’s all there is to it, really.
It may work for you, but there are users of languages that simply cannot be covered by QWERTY, and in these cases they would like from developers to simply fix their goddamn code, and bind to keys, not characters.
For various reasons, I've been switching both in hardware and software between two different QWERTY (US and UK) and two non-English keyboard layouts for years (usually, I'll have multiple keyboard layouts configured, but I'll generally go with the one that is actually printed on the keys unless I really need a different one).
The funny thing is that for all the getting used to different layouts, the one thing that really trips me up all the time is the different hand positioning for Control+C on Mac vs PC. Luckily, you can change the modifier key bindings in MacOS independently from the keyboard layout. But yeah, shortcuts that assume everyone has a / or whatever key are a pain too.
It is fine as long as you don't have to user other people computers.
I used a QWERTY keyboard in France for a while, but switching back and forth between AZERTY and QWERTY each time I went on someone else computer, which happened often, I had to switch, qnd it is qnnoying.
You can learn to touch type but keep in mind that not all layouts are the same, you can bring your own keyboard, change the mapping, etc... There are solutions, but it wasn't worth it for me, so I got back to that terrible AZERTY keyboard because that's what everyone has in France.
I envy you for all your languages using Latin-based alphabets. I guess I'm stuck with two keyboard layouts for life because I need to be able to type in both Latin and Cyrillic.
Yes, this. Though I prefer the X11 compose sequence business where you type <Compose>'a to get á and so on. X11 compose sequences are a lot more intuitive than the Windows U.S. international keyboard because there's no ambiguities, so, for example, a cedille is just <Compose>,c whereas in the Windows U.S. international keyboard layout it's 'c -- X11 for the win here, as the latter is super a) counter-intuitive and b) conflicts with ć (U+0107).
Also, if you program or use a shell, then the Windows U.S. international keyboard layout is just unspeakably unbearable.
> Though I prefer the X11 compose sequence business where you type <Compose>'a to get á and so on. X11 compose sequences are a lot more intuitive than the Windows U.S. international keyboard because there's no ambiguities, so, for example, a cedille is just <Compose>,c whereas in the Windows U.S. international keyboard layout it's 'c
That sounds pretty good indeed. Any idea how could I go about setting this up under macOS?
> Also, if you program or use a shell, then the Windows U.S. international keyboard layout is just unspeakably unbearable.
I have indeed a couple of little, weird problems when working in the terminal. That's inconvenient but still beats switching between 3 layouts 50 times a day.
> Any idea how could I go about setting this up under macOS?
Sorry, no idea, but I'd expect it to be possible on OS X. And I really want this for Windows, too, as I have to use it.
> I have indeed a couple of little, weird problems when working in the terminal. That's inconvenient but still beats switching between 3 layouts 50 times a day.
Oof, I can't get used to having to type '' to make one ', "" to make one ", `` to make one `, in the shell, in $EDITOR, etc. It's too painful. So I switch layouts as needed, and I hate it. Compose keys would be so much better!
That’s quite memorable. On the Mac I have to press option-u and then a, u, or o to get an ä, ü, or ö. Optionally holding the shift key to get an uppercase vowel with the diacritic.
option-` is for grave (è)
option-i is for circumflex (é)
option-n is for eñe (ñ)
option-c is for cédille (ç)
When you called QWERTZ an abomination I simply had to upvote your comment.
I used to be pretty good at those but now I just use the feature that works (almost) everywhere in Mac OS and iPhone - press and hold the character to get a little pop-up of options: møøśę
It's slow if you have to type in another language, but in English it's fast "enough" for words like résumé.
I am on macOS as well and the diacritics I mentioned ("a → ä, `a → à) work out of the box with the "US International PC" layout. If I press ⌥a I get å with my layout.
There's loads of ways. Dead keys is one, the compose key is another suitable one for multilingual folk. I can write ä ö ü ß pretty much automatically with the compose key on a US layout, as well as any é ë ê ü I might need in Dutch.
In the Netherlands letters with diacritics don't even have their own key (Dutch computers tend to use US plus €).
You might want to consider using a keyboard layout that supports diacritics. I use UK Extended Winkeys for example, where I can do AltGr+[ followed by a letter to get it with a diaeresis, allowing me to type äöü. It supports almost every European language character.
I like that. Makes me wish I could just input all my text in Emacs which would make mapping such things to the “proper” sequence of strings very simple.
Not OP, but I have a Mac and use the "ABC - Extended" keyboard, which has a full set of diacritics. You type them by using the option (⌥) key and a letter to set the diacritics and then type the letter.
⌥u e = ë
⌥w e = ė
For example, the key sequence required to type Kayodé is:
I use US_Intl keyboard for Swedish and Portuguese and can type all of this without the need of special char keys: ä ö å - á à â ã. Just a mix of Opt+(u/a/e/i/n/`) and the next character I want with the diacritic.
Edit: and ß is just a Opt+s. It becomes quite logical after you get the hang of it.
Personally I use a compose-key on Linux and Wincompose on Windows, with SyncThing to keep my .XCompose file synchronized between machines when I add a custom sequence.
I mapped my Caps-Lock key as Compose, so I hit compose-a-" to get ä, and so on through the set. I've made customs for Ω and µ since I use those a lot and couldn't remember their default sequences. It's handy having ® and ™ on tap as well, and I get typographical niceties like — and … vs - and ...
Not GP but I use a similar setup. I have an "extended" US keyboard layout where AltGr+(location of character on native keyboard layout) types the character. For example, AltGr+[ is "ü". I find this more convenient than the compose-like approaches suggested elsewhere in this thread.
I use it because programming language syntax is mostly ergonomic only on US-like layouts, e.g. typing "{" causes a lot of strain if you have to do it on every other line.
That would be… inconvenient. That’s not how I use it. I type "a to get ä and 'a to get à. If I want to type actual quotes followed by a vocal I need to “confirm” the quotes with space before typing the vocal.
French on US intl here ä ö ü ß are easy to make on such a setup but also the accented letters like é à ô or the infamous ç. It is even easier to type them in capital than it typically is on AZERTY (French keyboard) or QWERTZ (German one)
That's fine if you live elsewhere and type your Danish colleague Lærke's name once in a while, but if you speak a language using non-ASCII characters it feels rather slow to long-press every few characters.
> Danmark, officielt Kongeriget Danmark, er et land i Skandinavien og en suveræn stat. Det er den sydligste af de skandinaviske nationer, sydvest for Sverige og syd for Norge, og det grænser op til Tyskland mod syd. Grønland og Færøerne indgår også i den danske stat som rigsdele med selvstyre indenfor Danmarks Rige (Danmarks forhold til rigsdelene kaldes Rigsfællesskabet).
Stuff like this is super, super common. Even where you have things like CTRL + X/C/V for cut/copy/paste it's clear that the letters were picked to be near each other - but then often the shortcuts will remain the same even on keyboards where those keys are nowhere near each other.
This can be one of the reasons that switching to Dvorak can be super hard, all the shortcuts become quite weird unless you remap them each by hand.
9ev: pressing the key where American layouts have a slash will [open the search box]
bombcar: often the shortcuts will remain the same even on keyboards where those keys are nowhere near each other
You're objecting to opposite things: you don't like that the shortcuts stayed with the same symbols instead of staying with the physical key, and 9ev the other way around.
(And also 9ev is objecting to misleading documentation)
Yeah - and balancing the two is something often entirely overlooked and people stuck with the problem have to learn the options and try to determine “is this key the character or the key (in a US layout)”.
Issues like this are why often people just give up and learn US English enough to use the computer, then at least it’s tested and consistent.
Yeah, if I could go back in time I would have told my younger self that it's okay to use something other than qwerty, and it's okay to use vim, but trying to to do both was a hilariously terrible idea.
That's great. To be fair, caliente and helado doesn't seem a very unreasonable interpretation if you're installing in a place where people don't speak English
I live in France but use QWERTY instead of AZERTY, and I once had an opposite problem: on a French website, there was an <input> for phone number, validating that it stays [0-9] on each keystroke.
The validation was done by checking for the AZERTY-specific key combos that result in 0-9; on AZERTY to get a <number> you need to press SHIFT+<number>.
So if you had a QWERTY keyboard, you could not enter your phone number, the input stayed blank. (If you tried with SHIFT, you could get !@#$%^&*(() to the field however.)
My latest peeve here is the websites that try to filter out non-numeric key presses in form fields. You don’t know where my number keys are! Sometimes the only way to enter a number is to copy and paste it from a text editor.
Even at a multinational like apple. I was fighting in the provisioning dash board a few years ago because it would not load. Issue was my last name which has an umlaut in it. This wasn't 1990 it was 2012 issh and I still see companies unable to deal with umlaut.
I believe either FedEx or UPS still can't deal with it.
In top of that Swiss mail is quite strict to how a letter is addressed and what is written on the mailbox. Luckily the mail carriers know to ignore garbled umlauts.
When I was living in Japan I always had to "romanize" my Japanese address if I got something shipped from USA because USPS only allows ASCII. Even if this did create problems locally - post always arrive quicker (in my experience) if the address is in proper Japanese. When I got mail from my home country I got it forwarded via my company and they always just stuck on a printed label with my address, in Kanji. No problems there. Worked fine for years.
However, my home country postal system has since changed to an online system for sending post out of the country: You go to a website and enter the destination address, then you pay and it'll print a code which you just give to the actual post office when you bring the letter/package whatever. What's infuriating is that they don't allow Japanese addresses - I can enter it, but it just creates an error message. So I again have to come up with a crude romanized address and cross fingers that it'll arrive (which it usually does - but up to a few weeks delayed).
In short - things are not getting betters over the years.
The problem is that they were dealing with it successfully before, by either treating the umlaut as a "u" or a "ue". It's not like umlauts are this new technology that English speakers have never been faced with in their business ledgers. But what you see in older business ledgers, is a variety of different spellings, and a human brain that can map them to each other, with a little bit of training. And when software came along, most systems just used the "u" approach, and that kinda worked as software systems grew to massive scale.
Now the hard part is changing how to deal with it, because now you need to patch all these huge production systems that were working before, and it's a breaking change.
So if you have a big stack: UI --> business layer --> DB
You can't just change the UI, as that will cause breakage in the pieces below. You start with the DB change, then the business layer, and you do the UI last.
Moreover this is not a simple change. Getting a DB to handle unicode is a known thing, but how do you train your customer support personnel to do data entry with unicode characters? Do you expect the customer to remember the code point? There are many characters that look the same which have different code points. So what you need to do is come up with a list of officially supported extended character code points and do entry with those, but still you will have others -- users of Asian scripts, for example, whose characters will be missing.
And you will get customers still doing the old "u" approach together with the official umlaut code point, sometimes for the same name and the same package. People will call up, trying to find their package, and randomly used one convention, since even though you can train your staff (which isn't cheap) you can't train your customers. So you need some system of identifying different orthography for the same underlying name, some with "u" and some with "ü" and some with "ue". And then that's going to require changes in a lot of other systems, for example reporting systems, etc. We see this problem today, taken to the extreme, in the various different spellings of "Gadafi".
But really it's worse, as a distributed system has
UI <--> business layer <--> regional DB <--> business layer <--> regional UI
And now it's much harder to make these changes without taking systems off line, for a distributed global network.
What they probably did, as most companies do, is look at all these problems and costs, compare them to relatively small benefits, and kick the can down the road, waiting for the next major system upgrade, and sometimes these next major upgrades can take 20 years to happen. Or even longer.
In other words, the hart part is the incremental breaking change to the system, not the dealing with the umlaut. If this was a brand new code base that was just being written, they could make it handle umlaut with much less cost.
Unless it's changed recently, IntelliJ IDEA has similar annoyances[0]. Again, using the equivalently placed key works instead. Are they using scan-codes for some reason? Seems extra bizarre given that JetBrains isn't even a US based company!
That's an illustration that the problem is not so easy.
What your parent comment complains about is that the keyboard shortcut is based on the location of the key, and not the letter.
What you complain about is that the keyboard shortcut is based on the letter, and not the location.
Obviously, these complaints don't contradict each other, both make sense in different circumstances. But figuring that out requires awareness of different keyboard layouts, and of the difference between KeyboardEvent.code (location) and KeyboardEvent.key (letter).
Give the actions appropriate names and allow users to remap them. Have a few common layouts you can select from. Allow users to select a language and locale (separately) when they first launch. Refer to the actions not the keys in the documentation, and when you do specify key (e.g. in parenthesis) have it dynamically reflect the current setup.
I really hate that so many things break after you switch keyboard layouts.
Many shortcuts are positional, like "hjkl" or "wasd" for moving , or control/meta+numbers but there's no good way to denote them without assuming you are using a qwerty keyboard.
Programming against them is possible,but many times there's limitations, ignorance about the different layouts, or straight neglect.
Thank you for the suggestion, but that's indeed a problem with lots of websites that get this utterly wrong. Browsing the web using a non-US keyboard is fun.
A very common bug is people confusing key codes (the raw codes for the physical keys on the keyboard) and characters (what they type). This "works" when you use the QWERTY layout, but completely breaks when it's something else, like ЙЦУКЕН.
It's amusing that for the huge amounts Google pay developers it doesn't necessarily buy common sense or empathy for the user.
Google's currently demanding I settle an invoice for an account I closed... and the only way to do this, or even get support, is if you have a Google account. And zero way of replying to the email their Collections department.
On the one hand they have a massive dominant position in multiple markets. On the other hand, if you're even slightly off the happy path you don't exist.
There was a comment the other day in the discussion about incorporation in Germany. Someone complained that in Germany, they conduct their business in German: the justification was, it's our country, our language, get used to it.
Maybe the blunt response to complaints about the US American bias is, it's our software, our (ASCII/Unicode) language, get used to it.
If I was trying to incorporate in the USA, and the registration form would be unable to process diacritics, sure — even though that would neglect a large part of your own population (see the ASCII debacle) — but we’re talking about internet services marketed globally. I don’t think it’s too much to ask to have an app at least working as designed. We’re not even talking about translating stuff!
If American businesses wanted the money of people from other countries, they should've made their systems internationally accessible.
I mean, seriously, what kind of antiquated statement is that? Do you actually want to keep geographical borders to persist to the internet, or work towards lower barriers for everyone..?
A similar insensitivity is the observation that none of the major Android keyboards allows true disabling of auto-blank, which is super annoying in a language like German where custom composite words occur a lot. Modern keyboards offer a wide selection of clever tricks to keep the auto-blank from messing up punctuation, but allowing one word to be swiped directly after the one before? I'm sorry, Dave, I'm afraid I can't let you do that. This post is written on Swype, final release back in 2014.
I wrote it over my lunch break quickly, it only closes the menu the first time. I was trying to add the functionality to close it every time you enter the key combo, but it wasn't working. If someone wants to improve this feel free. MIT licensed.
The same happens with Jira. It's a nightmare since if you press escape to quit the emoji mode, you abort your input and lose focus. There should be a way to configure that shortcut to whatever suites you. Anyway i don't get why web apps bother implementing their own emoji input, the operating system does it already (the windows key + ; shortcut for example).
And to reply to those who simply ask people to change their habit: it's rude. Imagine the other way around: all the English typing people having to insert a space before a : for whatever reason, would that makes sense to you?
So this was reported on October 31st, then escalated on November 3rd, and the "feature" is still in there. Seeing how severly this impacts users from France, this is slightly diappointing.
Here is a discussion in the product support forum:
We have some templates for gitlab issues at work, and it always bothered me that all the colons had a space before it. Both because it triggers the same annoying emoji feature, and also because it looks weird to me. Now it all makes sense: although the templates are in English, they have been authored by a French person, so the rule just bled into English writing.
Problem is that this is totally normal for CJK users, where you really want to type in a syllabic or phonetic or similar system and get drop-down selection menus for Chinese/Kanji characters to replace your writing with. Doing that for emoji is basically the same thing for CJK users. For Latin script users, however, this is very disruptive.
It is because it is required to produce the wanted characters and is usually implemented by the OS / browser, not at the final application level which interferes with this process.
The UX is still that you get a drop-down picker. What component of the system implements that seems of little importance to users unless that becomes obvious (each app doing it differently, though obviously that happens).
I expect that the emoji picker will become part of the OS eventually. Heck, Windows has one that you have to call up explicitly (with <Alt>-.), iOS has an emoji "keyboard" (input mode), etc, so we're headed in that direction.
I didn't realize French was written with spaces before colons. It might be time to lose that space, similar to how we don't double space after period.
I'm not saying G.Docs or any editor should dictate how French is written, only being pragmatic if I have to choose between (no space): or unwanted emojis ending up in docs.
We used to double-space after a period because that looked better on monospaced typewriters. But in French it's simply a part of punctuation, and getting it wrong means you wrote it wrong. It's as wrong as if I'm writing English and put a space before the ':' or '!', or put a comma after a quotation mark. French people shouldn't have to change their language to type in Google Docs. It should be the other way around.
>It might be time to lose that space, similar to how we don't double space after period.
Well, that's is a bit like saying "it might be time to write your whenever you mean you're" just because Google Docs' autocorrect feature kept messing up the two when you wrote English.
>if I have to choose between (no space): or unwanted emojis ending up in docs
We absolutely do not have to choose between those two things. We need to stop implementing cute automatic bullshit that makes assumptions about the way people interact with UIs.
I'd upvote for the first part of the second sentence, but downvote for the second part of the second sentence.
Typographically speaking English language text should have a wide space following a sentence-ending period [that isn't also a paragraph-ending period]. However, because wide spaces are not easy to type on normal keyboard layouts, the simplest thing to do is to type two spaces after sentence-ending periods and let the word processor change those to wide spaces when using a proportional font. When using a fixed-width font, however, it should always be two spaces after a sentence-ending period for two reasons: to disambiguate non-sentence-ending periods, and to make it easier to read. Coders write a lot of text in fixed-width fonts, so we should write two spaces after sentence-ending periods in, e.g., code comment blocks.
> similar to how we don't double space after period
What do you mean? Every sentence in this comment has two spaces after the punctuation. That one and this one. Just because browsers and html condense it all into one space doesn't make it right.
If we had he opportunity to simplify some rules in French, it's not the one I'd do first.
How about getting rid somehow of the letter ù which is used for only one word (où)? Or replacing quatre-vingt by huitante (and same for 70 and 90) like in Switzerland? Or getting rid of œ and æ?
But none of it is realistic. We officially changed from oignon to ognon 32 years ago and people still don't know about it...
I find the 20-based counting system cute. I like it. In Belgian French they dropped it, but to my ear "septante", "huitante", and "novante" sound weird!
I don't mind "où". I especially do not want the circumflex accents removed (you didn't mention it, but there was an attempt in... the 90s?).
I wouldn't mind some spelling simplifications around all those eau/eaux/oeu/oeux type word endings.
the us is fairly unique in the rate it accepts common mistakes into the language. its part of how they diverged from the standard english used in the rest of the world so quickly...
This has been a problem since the typewriter age. People having to get on with their jobs coped with it by using a full, breaking em-space. Unless this gets replaced automatically by the word processor, you get horrid typography and misplaced line breaks all over the place.
The Académie Française should have dealt with this years ago, if their ass wasn't stuck in the 17th century.
[0] https://www.compart.com/en/unicode/U+202F