Pretty much every major site will completely ignore not only your browser's language settings but also onsite user account settings and other desperate attempts at selecting the language. The only thing that matters is your IP address.
For example, accessing Google: my browser is set to accept English only. I'm entering the English URL. In my account settings I periodically reset everything I can find to English (settings apparently decay, too). Google knows I want the English version. Yet, they still give me the interface in whatever language my IP address comes from. And not only the UI, search results as well.
Recently it's gotten even worse than that: Google figured out I'm actually German, so they start defaulting to German more often now - ignoring everything else. At least with the IP address-based routing it was impersonal.
I happened to be in Sweden when I linked my Facebook calendar to my Google calendar. Ever since that day, my friends' birthdays are given to me in Swedish. Facebook knows I want English, yet for some reason this is how it's got to be.
The same abuse is apparently considered best practice at new startups as well: recently I was testing a browser game for an acquaintance who's on their development team. Because I was in Portugal at the time, I of course got the site in Portuguese. Manually switching that to English, the game still started up in Portuguese. It's been doing that ever since. Every email I get from that company is in Portuguese, too, even though I tried everything I could to set my language to English.
It's a source of endless frustration, maybe even a hostile act. They're effectively saying "Your choices don't matter, we know what's best for you. You're from country X, so you _must_ speak Xish. People are on the internet to enjoy regional separation. Really, it's best."
can anyone provide insight into the business reasoning behind this? i really can't conceive of why you would want to supersede a user's exact, known language with a guess. sites are pretty difficult to use when you can't read anything. maybe there are technical issues for some sites, but Google search is the worse i've ever dealt with, and they def have some resources behind that.
Mostly it's because people don't know how to change that setting. Imagine walking up to a computer in a shared space (hotel lobby, library, etc.) and it's been configured to send out accept-language: <something you don't understand>.
Many of the people reading Hacker News will be able to find and change that setting. My mom never will. She'll just know she went to google.com, and saw Chinese.
If you're using a computer in a country, and websites seem to be showing you things in the language of that country, that's something you can probably understand. If you're using a computer and some websites insist on showing you some other language, you'll be confused.
It's true that most people leave the defaults. However, there may an easy solution for a subset of cases.
Do the browser defaults in any country include multiple language settings? It seems likely that in most countries, the default would be only one language. And if this is the case, then if multiple alternatives are present in the request headers, it's very likely the user or computer admin has deliberately changed it, and that in turn would mean that sites should respect the choices.
This might still be wrong in when the settings were made by someone other than the current user, or there are multiple languages default-configured, but it might be a step in the right direction.
> Imagine walking up to a computer in a shared space (hotel lobby, library, etc.) and it's been configured to send out accept-language: <something you don't understand>.
If web browsers could somehow figure out they were running either under a guest/public-use account, or in a kiosk mode, they could avoid sending an Accept-Language header at all. Then,
1. in cases where the header is sent, it would mean a lot more (and hopefully override both online-profile-stored and IP-detection-based answers);
and 2. in cases where the header isn't sent, using an answer from an online profile setting or IP-detection would no longer be against-standard.
While that excuses ignoring the Accept-Language header, it doesn't make sense for overriding the user's explicit configuration in their profile; you wouldn't expect that to be shared.
That, and I'd expect public computers to disallow that sort of configuration anyway, so it would be stuck at the default value, which should be sensible for the location it's set up in; it's not like it would move around...
And don't forget that no matter what your IP, no matter what your language setting, no matter what country you have set, Google Maps will still default you to showing a view of the continental United States. You're in Tokyo, searching for Yokohama? Here's Yokohama Sushi in downtown Los Angeles. For a while, the new web Google Maps redesign even removed the HTML5 Location API "my position" button. It's back now, but it should be defaulting to showing your location, like the mobile apps do.
Huh, I didn't know that. But why geoIP-based redirects on some sites and not others? google.com redirects me to google.dk, and even blogspot.com redirects me to blogspot.dk, but maps.google.com doesn't take me to maps.google.dk.
I believe the logic was that "google.com" always does the redirect, because people tend to type google.com manually into their address bar a lot. Google has never bothered to set up any other redirects itself.
Other services that Google has acquired, though (e.g. Blogger, Youtube, etc.) may have come pre-set-up with redirection logic, and Google has mostly left that untouched.
You probably want something like "en-US,en;q=0.9,ko;q=0.8". (Note the addition of "en" between "en-US" and "ko".) Some quick testing with Firefox, which lets you directly alter the Accept-Language: header in requests in about:config, shows that fedoraproject.org has "en" versions of resources but not "en-US" versions. Since your Accept-Language: header only lists "en-US", "ko" ends up being selected.
EDIT: I just noticed your guesses at the bottom of the post. Your second guess is correct. See §14.4 of RFC 2616:
As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.
As a French guy leaving in the German part of Switzerland with Accept-Language configured to get English content, I'm kind of ashamed to have that kind of bug in my language detection code. I'm always complaining about other websites language detection, looks like I should have looked at my own code first!
Thanks! There is a corollary to this that would have prevented all this - when I went back in the Chrome settings and set the settings to the same order, it reset my header to this: "en-US,en;q=0.8,ko;q=0.6" - which makes things work for all sites again. I haven't touched my language settings since ~2012, so it's possible Chrome "fixed" this a while back, but didn't change my existing settings.
Having now read the full code and not just the diff, I have to say it looks pretty good. I note that plain "zh" is not redirected to the cn site. ;) Whether it should or not is debatable though -- I actually think ignoring "zh" altogether is a rather prudent move if it is intentional.
Language choices are a mess. There can easily (and often) be conflicting data based on:
- accept-language header
- URL that includes language/region codes as a subdomain or part of the path
- language preferences set in a cookie or account
- IP region detection
In the end, any website is trying to provide the right language most often for their users, and there are no easy answers. When I access webmail from an Internet cafe in China, I don't want the interface popping up in Chinese just because the browser's accept-language is configured for Chinese. Fortunately, it doesn't.
Most web users have never even heard of accept-language, it's just automatically configured by whatever language their browser was installed in, which isn't always the language you want to be browsing in. (E.g. you bought your laptop overseas because it was cheaper, so it runs in English instead of your own language.) It's not a surprise that IP address detection provides the best default experience most of the time, which can then be overridden by URL or user choice, and that accept-language is fairly irrelevant.
* In all cases, a fairly visible language picker is displayed at the top of the page, with internationalized language names.
* If someone goes to a language-specific subdomain (fr.dolphin-emu.org, cy.dolphin-emu.org, ast.dolphin-emu.org, ...), they get this version.
* If someone goes to the generic/english dolphin-emu.org, the system checks whether the user has a "nocr" cookie. If so, they get the english website. Otherwise, they get redirected based on their Accept-Language.
* If a user uses the language picker, we assume they know what they want and set the "nocr" cookie to disable redirections in the future.
* When the user gets redirected from the standard/english version to an internationalized version, a message is shown in english saying that they have been redirected based on their browser preferences, with a link to go back to the english version (and set the "nocr" cookie).
I thought for a pretty long time about this and think it is a good compromise between providing the best version for our users and not being annoying/guessing too much. In the end, more than 50% of our users now are shown internationalized versions of our website, which is a very good number in my opinion.
They do make sense for many users, and they are the closest you can find to a proper graphical representation of languages. When I add a language that I know to be official in several countries, I look at my analytics to see where most users come from and use the flag from their country. I can't remember a time where it did not also match the country with the most speakers.
It's a common enough practice that most people usually know what it means, but there's a reason you don't see flags on Wikipedia, Facebook, or Youtube. Languages are spoken in many countries, and countries are multilingual. There are quite a few articles around the web on this topic, but that's basically what they boil down to: languages are not countries. Some users may be confused or offended that their flag is not represented.
And as a Canadian I find it generally a little weird that the Canadian flag often means Canadian French, and I have to click the US flag to get English (which is of course a slightly different English than Canadian English which is probably unavailable).
I guess it's something like "language most unique to that country", no but that's not right either... I don't know.
Unless you have different pricing per country or something orthogonal to language, I'm sure than a speaker of Canadian French can figure out that clicking the French flag may help them understand this page better. It's a common enough idiom on the web.
I think in the case of more than one country per language, you're right, just picking a big and/or well-known country as "representative" is fine: French flag for French, US or UK flag for English, German flag for German.
The bigger problem is the other situation, of more than one language per country. India has ~13 languages with >10m native speakers, and using the Indian flag for all of them would be pretty confusing. You could pick state flags (e.g. the flag of Gujarat to represent Gujarati), but that can be a politically tricky issue. In some cases choosing a representative flag for a language has even stronger political overtones, like using the flag of the Kurdistan independence movement to represent the Kurdish language. Plus, it's not always that clear which flag to pick, and user recognition may not be as high as in the French-flag-for-French case.
Google is^H^Hwas* really bloody annoying when it comes to this. English (en-us and en) is the only language in my accept header. When I lived in Geneva though, Google always used to serve me pages in German (presumably Swiss-German). Gee, that's logical. (Geneva is a mainly French speaking city, though over 40% of the population is non-Swiss.)
Where I live now is another French speaking area. I just checked and it seems they are no longer serving French pages to me. But they were even just one year ago. (I don't use Google by default, so I don't know when they changed.)
Admittedly, that was an issue with geo-detecting rather than the website having bad language detection.
* They seemed to have stopped.
Air France is (though they have many faults) actually alright at detecting my language. And mostly gives me English pages...
My accept-language header only has en_GB and en in it. Google still randomly serves me pages in Swedish and German (which are both languages I speak, but which I both explicitly disabled in my Google account settings).
The best case of this was when they launched the preview for the new Google Maps version - there was a landing page with some information and a button in the middle. This page was served to me in three languages at the same time (the header, the button and the info text) - presumably served by different internal components that all handle languages differently.
I always have to open google.com/ncr in a separate tab which sets a session cookie (I don't accept permanent cookies from google.)
I guess they've changed their logic for some places, just not where I am :(
Google has become worse when it comes to language detection. I often get Brazilian Portuguese on Google Analytics, I'm in Denmark. I believe one of by co-worker often got Russian.
IIRC google also "detects" language based on DNS geolocation, i.e. doing a dig google.com may reveal different IP addresses in every country (depending on the language).
There are some IP addresses which, when viewed "raw" like http://aaa.bbb.ccc.ddd/ will return a localized Google.
Google is a Royal Pain in the Ass on this point. They completely disregard any request configuration and decide on output language based on IP geolocation (which is pretty much always Not What I Want, even more so in multilingual countries such as Belgium or Switzerland[0]), then Chrome "helpfully" suggests translating documents.
[0] where it won't even send you something matching your actual geographical location's language, usually sending the country's most common language — dutch in Belgium and german in Switzerland
I live in Switzerland and google does follow my accept-languages (en-US,en;q=0.8,fr;q=0.6,de;q=0.4). When going to google.com in incognito I get google.ch in english which is what I asked.
I don't get that behavior in Firefox. I have 'en-US' and 'en' as my preferred languages (in that order), set via Preferences->Content->Languages. But when I go to google.com in incognito, I get google.dk in Danish.
I guess English is preferred here commonly enough that it's at least listed as one of the two alternate google.dk languages in an easy-to-find place under the search box, along with Faroese. "Google.dk på: Føroyskt English". And if you click "English" it stays with it for the session. No luck if you wanted something else like German, though.
This is what so many websites do now. It causes me constant aggravation. It's nothing to do with your browser settings. They infer your language from your IP address, and for most cases that IS the right thing to do. However for me it isn't.
I really really wish there was a way to configure your browser to force websites to accept your language settings.
The only other option is to enable cookies so that the website language choice is saved - which also invites countless tracking cookies which I do NOT want.
Your web site does NOT know better than me which language I want to read.
Very minor point: the right thing to do is to not infer the language, specially not from the IP, if no explicit information is available. You shoukd make the user choose or take him/her to the default language (there should be an obvious way to change the language from there anyway)
If you have multiple language, hopefully you already have a scheme to differentiate the language (i.e. wikipedia has the language in the URL). If the user went to a specific language URL you should ignore the other settings.
If he/she didn't go directly to a specific language, it's fair to assume he/she is in a non standard situation or is OK with the defaults, and applying heuristics doesn't help.
Would assuming that the default language is English be valid? I know a large percentage of the Internet probably doesn't "know English", but if they can connect to the Internet, would they at least recognise enough words (like "language"") that they can choose a different language?
I think it could be anything making sense (the default for a german company could be german for instance) as long as switching away is smooth and discoverable. People not familiar with the english alphabet for instance could be lost in the site, getting overwhelmed by the unknow information, even if they know the word "english" or "language". For people like in that case, the page could be in french it wouldn't make much difference.
As a visual marker for language switching I imagined having a flag, but looking at the replies, that seems non optimal.
The best behavior could be a popup shown only to users who's accepted languages don't match the current language, and keep the choice in a cookie perhaps ?
What makes you think that? If their computer came with an OS preinstalled, with Mongolian, for example, as the language, they would not have to ever see any English anywhere. A flag might be more universally recognized way of selecting the right language.
The flag doesn't work for the billion people of India. We are massively multilingual; we have 22 official languages. Google.co.in is offered in 9 of those 22 languages -- each one with its own unique script, which are mutually un-intelligible. One Indian flag for all languages? No, thanks. Look at the confusion caused by the Metropolitan Police website [1].
If you can afford two pop-up lists, the approach of Lufthansa is the best [2]. If you want just one list, follow the installation screen of Ubuntu -- they write out the name of the language in its own script.
You're preaching to the choir, I'm in the suffering minority; that's is my problem exactly. I was just pre-empting the usual replies. Every time the subject comes up, people always respond with "Most people can't configure their browsers correctly" and "these websites do extensive testing and for most visitors they are right".
I have blogged about this several times. Google are one of the worst offenders. I'm not sure if it is insular non-travelled US developers with a deep love of IP-to-geolication databases, or an anally retentive legal department, but it really sucks as a user experience.
From an advertising perspective this is a major market that is being overlooked, because guess what, I don't look at ads that much, but you can bet your bottom advertising dollar that I'm definitely not going to read it in a language that isn't my mother tongue.
IP address != language preference
It is about time that developers got that through their thick skulls.
Finally, over here in Europe we can live in whichever EU country we want to. This means that we can move countries easily. I've already been in four of them. I don't think I'm an edge case by any means. People migrate.
I'm working on locale stuff for a Rails app right now (just updated the i18n_data in fact).
The assumption will be that country is mostly orthognal to language b/c people are übermobile. Further, that the dialect of the language should not force assumptions of other preferences... only autodiscover initial settings as close to desired as possible. (Fuck, why isn't there a standard for this common, hard-to-manage shit the OS already knows.)
i18n is taking up tons of time to get (mostly) right, but I believe it's one of those things not to botch because it's such a huge signal to everything else about your app.
If I want to be the most obscure hipster paying in Lesotho Loti, read Catalonian, have a "," for thousands separator and use UTC tz, by Flying Spaghetti Monster that's what it's gonna allow.
Those sites are not relying on accept-language, but rather on IP geolocation to select the default language. I sometimes use a non-US proxy when I'm feeling vigilant, and Google always uses the IP of the proxy to determine what language to serve me (even though my browser accept-language hasn't changed).
I'm running an English version of Windows 7 in Sweden. The accept-language headers in each of my installed browsers are:
* IE9: sv-SE
* Firefox: en,sv;q=0.5
* Chrome: en-US,en;q=0.8,sv;q=0.6
I'm going to go ahead and suggest that the reason English comes before Swedish is due to my system language, and that Swedish otherwise would come first. The "users will have the wrong settings" argument seems moot to me.
I know people who have their OS in English even though they're not fluent in English - they can't be bothered to reinstall OS that came with the laptop (or cracked torrent) and understanding a few words like "ok/cancel" is enough.
To avoid such problems in Rack/Rails Ruby project I suggest the rack-i18n_best_langs gem (regardless of the name, it does not depend on the i18n gem) I wrote:
https://github.com/gioele/rack-i18n_best_langs
> Differently from other similar Rack middleware components, rack-i18n_best_langs returns a list of languages in order of guessed importance, not a single language.
> Language discovery is done using three clues:
> * the presences of language tags in paths (e.g. /service/warranty/ita),
> * the content of the HTTP Accept-Language header,
> * the content of the rack.i18n_best_langs cookie when set.
Unrelated with the accept-language issue, but somehow related:
I needed to create a yahoo account, and I registered it selecting the kimo.com domain (kimo.com is a Chinese domain owned by yahoo). Since the first moment I set my language preferences to English.
No matter which yahoo service I'm visiting, I always get welcomed by at least the login prompt in Chinese, I can't really complain, because I was the one who looked for a rare domain, but it's an annoyance for me, because yahoo assumes that I understand Chinese because of the domain.
There is also the problem of getting the original content, I speak 3 langages, I intend to read the original source if it's one of those language. I don't want unpaid-intern translation.
MS C# documentation deserve a special kind of hell, because they detected I reverted the language to english ; and now they present me a special translation mode of their freaking doc where there are huge tooltip texts everywhere.
I don't know when, might be due to the language of your Windows installation, but sometimes in .NET they even translate exceptions and other error codes to your local language making it impossible to use google for troubleshooting.
It isn't the default for most people. Download a browser and OS localized to German, French, or British English and Accept-Language defaults to that instead of "en-US".
accept-language is like any of 1000 other idealistic parts of Internet specs that has good intentions but is so poorly used (or misused) that almost no-one implements it correctly, instead simply doing the simplest thing that works best for 99% of the audience.
For example, accessing Google: my browser is set to accept English only. I'm entering the English URL. In my account settings I periodically reset everything I can find to English (settings apparently decay, too). Google knows I want the English version. Yet, they still give me the interface in whatever language my IP address comes from. And not only the UI, search results as well.
Recently it's gotten even worse than that: Google figured out I'm actually German, so they start defaulting to German more often now - ignoring everything else. At least with the IP address-based routing it was impersonal.
I happened to be in Sweden when I linked my Facebook calendar to my Google calendar. Ever since that day, my friends' birthdays are given to me in Swedish. Facebook knows I want English, yet for some reason this is how it's got to be.
The same abuse is apparently considered best practice at new startups as well: recently I was testing a browser game for an acquaintance who's on their development team. Because I was in Portugal at the time, I of course got the site in Portuguese. Manually switching that to English, the game still started up in Portuguese. It's been doing that ever since. Every email I get from that company is in Portuguese, too, even though I tried everything I could to set my language to English.
It's a source of endless frustration, maybe even a hostile act. They're effectively saying "Your choices don't matter, we know what's best for you. You're from country X, so you _must_ speak Xish. People are on the internet to enjoy regional separation. Really, it's best."