Hacker News new | past | comments | ask | show | jobs | submit login
Accept my accept-language (choibean.com)
111 points by hc5 on March 16, 2014 | hide | past | favorite | 73 comments



Pretty much every major site will completely ignore not only your browser's language settings but also onsite user account settings and other desperate attempts at selecting the language. The only thing that matters is your IP address.

For example, accessing Google: my browser is set to accept English only. I'm entering the English URL. In my account settings I periodically reset everything I can find to English (settings apparently decay, too). Google knows I want the English version. Yet, they still give me the interface in whatever language my IP address comes from. And not only the UI, search results as well.

Recently it's gotten even worse than that: Google figured out I'm actually German, so they start defaulting to German more often now - ignoring everything else. At least with the IP address-based routing it was impersonal.

I happened to be in Sweden when I linked my Facebook calendar to my Google calendar. Ever since that day, my friends' birthdays are given to me in Swedish. Facebook knows I want English, yet for some reason this is how it's got to be.

The same abuse is apparently considered best practice at new startups as well: recently I was testing a browser game for an acquaintance who's on their development team. Because I was in Portugal at the time, I of course got the site in Portuguese. Manually switching that to English, the game still started up in Portuguese. It's been doing that ever since. Every email I get from that company is in Portuguese, too, even though I tried everything I could to set my language to English.

It's a source of endless frustration, maybe even a hostile act. They're effectively saying "Your choices don't matter, we know what's best for you. You're from country X, so you _must_ speak Xish. People are on the internet to enjoy regional separation. Really, it's best."


Go to http://www.google.com/ncr (once, it sets a cookie) to fix this at least for the search results.


can anyone provide insight into the business reasoning behind this? i really can't conceive of why you would want to supersede a user's exact, known language with a guess. sites are pretty difficult to use when you can't read anything. maybe there are technical issues for some sites, but Google search is the worse i've ever dealt with, and they def have some resources behind that.


Mostly it's because people don't know how to change that setting. Imagine walking up to a computer in a shared space (hotel lobby, library, etc.) and it's been configured to send out accept-language: <something you don't understand>.

Many of the people reading Hacker News will be able to find and change that setting. My mom never will. She'll just know she went to google.com, and saw Chinese.

If you're using a computer in a country, and websites seem to be showing you things in the language of that country, that's something you can probably understand. If you're using a computer and some websites insist on showing you some other language, you'll be confused.


It's true that most people leave the defaults. However, there may an easy solution for a subset of cases.

Do the browser defaults in any country include multiple language settings? It seems likely that in most countries, the default would be only one language. And if this is the case, then if multiple alternatives are present in the request headers, it's very likely the user or computer admin has deliberately changed it, and that in turn would mean that sites should respect the choices.

This might still be wrong in when the settings were made by someone other than the current user, or there are multiple languages default-configured, but it might be a step in the right direction.


> Imagine walking up to a computer in a shared space (hotel lobby, library, etc.) and it's been configured to send out accept-language: <something you don't understand>.

If web browsers could somehow figure out they were running either under a guest/public-use account, or in a kiosk mode, they could avoid sending an Accept-Language header at all. Then,

1. in cases where the header is sent, it would mean a lot more (and hopefully override both online-profile-stored and IP-detection-based answers);

and 2. in cases where the header isn't sent, using an answer from an online profile setting or IP-detection would no longer be against-standard.


While that excuses ignoring the Accept-Language header, it doesn't make sense for overriding the user's explicit configuration in their profile; you wouldn't expect that to be shared.

That, and I'd expect public computers to disallow that sort of configuration anyway, so it would be stuck at the default value, which should be sensible for the location it's set up in; it's not like it would move around...


It sounds like it could be legal? I know different countries have different agreements / restrictions with google about search results.


Well, the Adword value of your visit is considerably higher in the primary language of your location.


And don't forget that no matter what your IP, no matter what your language setting, no matter what country you have set, Google Maps will still default you to showing a view of the continental United States. You're in Tokyo, searching for Yokohama? Here's Yokohama Sushi in downtown Los Angeles. For a while, the new web Google Maps redesign even removed the HTML5 Location API "my position" button. It's back now, but it should be defaulting to showing your location, like the mobile apps do.


The web version is actually localized by domain.

http://maps.google.com is the US-local Google Maps; http://maps.google.ca defaults to Canada; http://maps.google.co.jp sends you to Japan; etc.


Huh, I didn't know that. But why geoIP-based redirects on some sites and not others? google.com redirects me to google.dk, and even blogspot.com redirects me to blogspot.dk, but maps.google.com doesn't take me to maps.google.dk.


I believe the logic was that "google.com" always does the redirect, because people tend to type google.com manually into their address bar a lot. Google has never bothered to set up any other redirects itself.

Other services that Google has acquired, though (e.g. Blogger, Youtube, etc.) may have come pre-set-up with redirection logic, and Google has mostly left that untouched.


Blogspot changed their behavior long after Google bought them.


You probably want something like "en-US,en;q=0.9,ko;q=0.8". (Note the addition of "en" between "en-US" and "ko".) Some quick testing with Firefox, which lets you directly alter the Accept-Language: header in requests in about:config, shows that fedoraproject.org has "en" versions of resources but not "en-US" versions. Since your Accept-Language: header only lists "en-US", "ko" ends up being selected.

EDIT: I just noticed your guesses at the bottom of the post. Your second guess is correct. See §14.4 of RFC 2616:

As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.


I just fixed dolphin-emu.org, this was a bug in our code that would not detect en-US as being "compatible" with en. See https://github.com/dolphin-emu/www/commit/ddef974c6f601bc2db...

As a French guy leaving in the German part of Switzerland with Accept-Language configured to get English content, I'm kind of ashamed to have that kind of bug in my language detection code. I'm always complaining about other websites language detection, looks like I should have looked at my own code first!


Thanks! There is a corollary to this that would have prevented all this - when I went back in the Chrome settings and set the settings to the same order, it reset my header to this: "en-US,en;q=0.8,ko;q=0.6" - which makes things work for all sites again. I haven't touched my language settings since ~2012, so it's possible Chrome "fixed" this a while back, but didn't change my existing settings.


Please don't do this... I detest sites that try to be clever and serve me Simplified Chinese even though I only have zh-hk in Accept-Language:.


I already have exceptions for things like that. I think our code handles zh_{CN,TW,HK} separately, as well as things like pt_BR vs. pt.

    > curl -I -H 'Accept-Language: zh-hk,en;q=0.8' https://dolphin-emu.org/
    HTTP/1.1 200 OK  # No zh_HK translation (yet!)

    > curl -I -H 'Accept-Language: zh-cn,en;q=0.8' https://dolphin-emu.org/
    HTTP/1.1 302 FOUND
    Location: http://cn.dolphin-emu.org/?cr=cn

    > curl -I -H 'Accept-Language: pt,en;q=0.8' https://dolphin-emu.org/
    HTTP/1.1 200 OK  # No pt translation (yet!)

    > curl -I -H 'Accept-Language: pt-br,en;q=0.8' https://dolphin-emu.org/
    HTTP/1.1 302 FOUND
    Location: http://br.dolphin-emu.org/?cr=br
i18n is hard but I think I've been doing a fairly good job on it. Proud to have more than 50% of our visitors from outside of the US!


Having now read the full code and not just the diff, I have to say it looks pretty good. I note that plain "zh" is not redirected to the cn site. ;) Whether it should or not is debatable though -- I actually think ignoring "zh" altogether is a rather prudent move if it is intentional.


Language choices are a mess. There can easily (and often) be conflicting data based on:

- accept-language header

- URL that includes language/region codes as a subdomain or part of the path

- language preferences set in a cookie or account

- IP region detection

In the end, any website is trying to provide the right language most often for their users, and there are no easy answers. When I access webmail from an Internet cafe in China, I don't want the interface popping up in Chinese just because the browser's accept-language is configured for Chinese. Fortunately, it doesn't.

Most web users have never even heard of accept-language, it's just automatically configured by whatever language their browser was installed in, which isn't always the language you want to be browsing in. (E.g. you bought your laptop overseas because it was cheaper, so it runs in English instead of your own language.) It's not a surprise that IP address detection provides the best default experience most of the time, which can then be overridden by URL or user choice, and that accept-language is fairly irrelevant.


What we've done for dolphin-emu.org:

* In all cases, a fairly visible language picker is displayed at the top of the page, with internationalized language names.

* If someone goes to a language-specific subdomain (fr.dolphin-emu.org, cy.dolphin-emu.org, ast.dolphin-emu.org, ...), they get this version.

* If someone goes to the generic/english dolphin-emu.org, the system checks whether the user has a "nocr" cookie. If so, they get the english website. Otherwise, they get redirected based on their Accept-Language.

* If a user uses the language picker, we assume they know what they want and set the "nocr" cookie to disable redirections in the future.

* When the user gets redirected from the standard/english version to an internationalized version, a message is shown in english saying that they have been redirected based on their browser preferences, with a link to go back to the english version (and set the "nocr" cookie).

I thought for a pretty long time about this and think it is a good compromise between providing the best version for our users and not being annoying/guessing too much. In the end, more than 50% of our users now are shown internationalized versions of our website, which is a very good number in my opinion.


This seems like a pretty good solution, except that your language picker includes country flags, which don't make sense for many users.


They do make sense for many users, and they are the closest you can find to a proper graphical representation of languages. When I add a language that I know to be official in several countries, I look at my analytics to see where most users come from and use the flag from their country. I can't remember a time where it did not also match the country with the most speakers.


It's a common enough practice that most people usually know what it means, but there's a reason you don't see flags on Wikipedia, Facebook, or Youtube. Languages are spoken in many countries, and countries are multilingual. There are quite a few articles around the web on this topic, but that's basically what they boil down to: languages are not countries. Some users may be confused or offended that their flag is not represented.


And as a Canadian I find it generally a little weird that the Canadian flag often means Canadian French, and I have to click the US flag to get English (which is of course a slightly different English than Canadian English which is probably unavailable).

I guess it's something like "language most unique to that country", no but that's not right either... I don't know.


Unless you have different pricing per country or something orthogonal to language, I'm sure than a speaker of Canadian French can figure out that clicking the French flag may help them understand this page better. It's a common enough idiom on the web.


I think in the case of more than one country per language, you're right, just picking a big and/or well-known country as "representative" is fine: French flag for French, US or UK flag for English, German flag for German.

The bigger problem is the other situation, of more than one language per country. India has ~13 languages with >10m native speakers, and using the Indian flag for all of them would be pretty confusing. You could pick state flags (e.g. the flag of Gujarat to represent Gujarati), but that can be a politically tricky issue. In some cases choosing a representative flag for a language has even stronger political overtones, like using the flag of the Kurdistan independence movement to represent the Kurdish language. Plus, it's not always that clear which flag to pick, and user recognition may not be as high as in the French-flag-for-French case.


Google is^H^Hwas* really bloody annoying when it comes to this. English (en-us and en) is the only language in my accept header. When I lived in Geneva though, Google always used to serve me pages in German (presumably Swiss-German). Gee, that's logical. (Geneva is a mainly French speaking city, though over 40% of the population is non-Swiss.)

Where I live now is another French speaking area. I just checked and it seems they are no longer serving French pages to me. But they were even just one year ago. (I don't use Google by default, so I don't know when they changed.)

Admittedly, that was an issue with geo-detecting rather than the website having bad language detection.

* They seemed to have stopped.

Air France is (though they have many faults) actually alright at detecting my language. And mostly gives me English pages...


My accept-language header only has en_GB and en in it. Google still randomly serves me pages in Swedish and German (which are both languages I speak, but which I both explicitly disabled in my Google account settings).

The best case of this was when they launched the preview for the new Google Maps version - there was a landing page with some information and a button in the middle. This page was served to me in three languages at the same time (the header, the button and the info text) - presumably served by different internal components that all handle languages differently.


Worse: Google went through a period of normally serving me French-language pages.

I'm not in a French-speaking country. I don't have French in my accept header. I never expressed any preference towards the French language.

But, my ISP was Orange (France Telecom) and I had a variable IP from them.


I always have to open google.com/ncr in a separate tab which sets a session cookie (I don't accept permanent cookies from google.) I guess they've changed their logic for some places, just not where I am :(


Google has become worse when it comes to language detection. I often get Brazilian Portuguese on Google Analytics, I'm in Denmark. I believe one of by co-worker often got Russian.


IIRC google also "detects" language based on DNS geolocation, i.e. doing a dig google.com may reveal different IP addresses in every country (depending on the language).

There are some IP addresses which, when viewed "raw" like http://aaa.bbb.ccc.ddd/ will return a localized Google.


My accept header only contains en-US and en. I tend to get served German content (and Google's especially bad about this).

Please, I hope someone hears your complains and starts fixing things. That issue is highly annoying..


> and Google's especially bad about this

Google is a Royal Pain in the Ass on this point. They completely disregard any request configuration and decide on output language based on IP geolocation (which is pretty much always Not What I Want, even more so in multilingual countries such as Belgium or Switzerland[0]), then Chrome "helpfully" suggests translating documents.

[0] where it won't even send you something matching your actual geographical location's language, usually sending the country's most common language — dutch in Belgium and german in Switzerland


I live in Switzerland and google does follow my accept-languages (en-US,en;q=0.8,fr;q=0.6,de;q=0.4). When going to google.com in incognito I get google.ch in english which is what I asked.


I don't get that behavior in Firefox. I have 'en-US' and 'en' as my preferred languages (in that order), set via Preferences->Content->Languages. But when I go to google.com in incognito, I get google.dk in Danish.

I guess English is preferred here commonly enough that it's at least listed as one of the two alternate google.dk languages in an easy-to-find place under the search box, along with Faroese. "Google.dk på: Føroyskt English". And if you click "English" it stays with it for the session. No luck if you wanted something else like German, though.


Ok, I think the trick is to have at least one more in addition to en-US+en (I have a couple more).


This gets very fun once you start using a vpn from another country.


Indeed, and sometimes strange things happen:

ssh tunnel from Australia to server located in California. Geo IP tool [1] reports it's in Fremont, CA. But Google assigns Taiwanese locale..

Outdated/erroneous geolocation database? or did I take a wrong turn somewhere ;-)

[1] http://www.geoiptool.com


This. I travel a lot, I've had this happen too often. Sometimes they even lock payment methods based on where you are, it's horrible.


> 1. the default quality value is being parsed wrong, and English is being assigned q=0 instead of q=1 or

> 2. en-US doesn't match en and is being bypassed

Or: 3. They are simply checking your IP address and not looking at your header at all.


This is what so many websites do now. It causes me constant aggravation. It's nothing to do with your browser settings. They infer your language from your IP address, and for most cases that IS the right thing to do. However for me it isn't. I really really wish there was a way to configure your browser to force websites to accept your language settings.

The only other option is to enable cookies so that the website language choice is saved - which also invites countless tracking cookies which I do NOT want.

Your web site does NOT know better than me which language I want to read.


Very minor point: the right thing to do is to not infer the language, specially not from the IP, if no explicit information is available. You shoukd make the user choose or take him/her to the default language (there should be an obvious way to change the language from there anyway)

If you have multiple language, hopefully you already have a scheme to differentiate the language (i.e. wikipedia has the language in the URL). If the user went to a specific language URL you should ignore the other settings.

If he/she didn't go directly to a specific language, it's fair to assume he/she is in a non standard situation or is OK with the defaults, and applying heuristics doesn't help.


Would assuming that the default language is English be valid? I know a large percentage of the Internet probably doesn't "know English", but if they can connect to the Internet, would they at least recognise enough words (like "language"") that they can choose a different language?


I think it could be anything making sense (the default for a german company could be german for instance) as long as switching away is smooth and discoverable. People not familiar with the english alphabet for instance could be lost in the site, getting overwhelmed by the unknow information, even if they know the word "english" or "language". For people like in that case, the page could be in french it wouldn't make much difference.

As a visual marker for language switching I imagined having a flag, but looking at the replies, that seems non optimal.

The best behavior could be a popup shown only to users who's accepted languages don't match the current language, and keep the choice in a cookie perhaps ?


What makes you think that? If their computer came with an OS preinstalled, with Mongolian, for example, as the language, they would not have to ever see any English anywhere. A flag might be more universally recognized way of selecting the right language.


The flag doesn't work for the billion people of India. We are massively multilingual; we have 22 official languages. Google.co.in is offered in 9 of those 22 languages -- each one with its own unique script, which are mutually un-intelligible. One Indian flag for all languages? No, thanks. Look at the confusion caused by the Metropolitan Police website [1].

If you can afford two pop-up lists, the approach of Lufthansa is the best [2]. If you want just one list, follow the installation screen of Ubuntu -- they write out the name of the language in its own script.

[1]: http://flagsarenotlanguages.com/blog/2011/09/the-metropolita...

[2]: http://www.lufthansa.com/de/en/pre-homepage?command=cc&lang=...



Or just have each language option specified in said language, so that native-only speakers and pick it out.


> They infer your language from your IP address, and for most cases that IS the right thing to do.

If IP and accept-language don't match, why not make a prominent button (in the language they didn't pick) to allow you to quickly change?


This is the best option.

"It looks like you are in Japan, but your computer's language is set to English. Which language would you prefer? [English][日本語]"


> and for most cases that IS the right thing to do.

The problem is that for a large minority of people this is absolutely catastrophic. Think of the Western business traveller going to Japan or China...


You're preaching to the choir, I'm in the suffering minority; that's is my problem exactly. I was just pre-empting the usual replies. Every time the subject comes up, people always respond with "Most people can't configure their browsers correctly" and "these websites do extensive testing and for most visitors they are right".


Not just travelers. What content do you serve to people with an IP from Belgium? Switzerland? Canada?


His location is listed as California, so that wouldn't make much sense.

The current top poster actually figured out the exact issue in this case.


I have blogged about this several times. Google are one of the worst offenders. I'm not sure if it is insular non-travelled US developers with a deep love of IP-to-geolication databases, or an anally retentive legal department, but it really sucks as a user experience.

From an advertising perspective this is a major market that is being overlooked, because guess what, I don't look at ads that much, but you can bet your bottom advertising dollar that I'm definitely not going to read it in a language that isn't my mother tongue.

IP address != language preference

It is about time that developers got that through their thick skulls.

Finally, over here in Europe we can live in whichever EU country we want to. This means that we can move countries easily. I've already been in four of them. I don't think I'm an edge case by any means. People migrate.


I'm working on locale stuff for a Rails app right now (just updated the i18n_data in fact).

The assumption will be that country is mostly orthognal to language b/c people are übermobile. Further, that the dialect of the language should not force assumptions of other preferences... only autodiscover initial settings as close to desired as possible. (Fuck, why isn't there a standard for this common, hard-to-manage shit the OS already knows.)

i18n is taking up tons of time to get (mostly) right, but I believe it's one of those things not to botch because it's such a huge signal to everything else about your app.

If I want to be the most obscure hipster paying in Lesotho Loti, read Catalonian, have a "," for thousands separator and use UTC tz, by Flying Spaghetti Monster that's what it's gonna allow.



Interesting, thanks.

Current Gemfile:

  # ...

  # i18n
  gem 'rails_locale_detection' # consider locale_setter
  gem 'rails-i18n', github: 'steakknife/rails-i18n'
  gem 'i18n_data', github: 'steakknife/i18n_data'
  gem 'countries_and_languages', require: 'countries_and_languages/rails'
  gem 'country_select' # for simple_form

  # tz
  gem 'tzinfo-data', '>= 1.2014.1'
  gem 'tzinfo'

  # symbols and images
  gem 'svg-flags-rails'

  # idn
  gem 'resolv-idn'   # resolv unicode patch
  gem 'idn-ruby'     # unicode IDNA domain resolution
  # ...

  # ...


Those sites are not relying on accept-language, but rather on IP geolocation to select the default language. I sometimes use a non-US proxy when I'm feeling vigilant, and Google always uses the IP of the proxy to determine what language to serve me (even though my browser accept-language hasn't changed).


I'm running an English version of Windows 7 in Sweden. The accept-language headers in each of my installed browsers are:

* IE9: sv-SE

* Firefox: en,sv;q=0.5

* Chrome: en-US,en;q=0.8,sv;q=0.6

I'm going to go ahead and suggest that the reason English comes before Swedish is due to my system language, and that Swedish otherwise would come first. The "users will have the wrong settings" argument seems moot to me.


I know people who have their OS in English even though they're not fluent in English - they can't be bothered to reinstall OS that came with the laptop (or cracked torrent) and understanding a few words like "ok/cancel" is enough.


To avoid such problems in Rack/Rails Ruby project I suggest the rack-i18n_best_langs gem (regardless of the name, it does not depend on the i18n gem) I wrote:

    https://github.com/gioele/rack-i18n_best_langs
> Differently from other similar Rack middleware components, rack-i18n_best_langs returns a list of languages in order of guessed importance, not a single language.

> Language discovery is done using three clues:

> * the presences of language tags in paths (e.g. /service/warranty/ita),

> * the content of the HTTP Accept-Language header,

> * the content of the rack.i18n_best_langs cookie when set.


Unrelated with the accept-language issue, but somehow related:

I needed to create a yahoo account, and I registered it selecting the kimo.com domain (kimo.com is a Chinese domain owned by yahoo). Since the first moment I set my language preferences to English.

No matter which yahoo service I'm visiting, I always get welcomed by at least the login prompt in Chinese, I can't really complain, because I was the one who looked for a rare domain, but it's an annoyance for me, because yahoo assumes that I understand Chinese because of the domain.


There is also the problem of getting the original content, I speak 3 langages, I intend to read the original source if it's one of those language. I don't want unpaid-intern translation.

MS C# documentation deserve a special kind of hell, because they detected I reverted the language to english ; and now they present me a special translation mode of their freaking doc where there are huge tooltip texts everywhere.


I don't know when, might be due to the language of your Windows installation, but sometimes in .NET they even translate exceptions and other error codes to your local language making it impossible to use google for troubleshooting.


yeah, translated error messages are a pain to google. sometimes we can get away we error codes.

(I'm using mono on a mac, and I'm not really doing important stuff).


Sorry, when I implement language negotiation I interpret "en-US" as lack of preference.

The problem is that en-US is the default and I can't tell difference between user not setting language and user choosing en-US.

Add "en" or even "en-GB" to your Accept-Language header.


It isn't the default for most people. Download a browser and OS localized to German, French, or British English and Accept-Language defaults to that instead of "en-US".


accept-language is like any of 1000 other idealistic parts of Internet specs that has good intentions but is so poorly used (or misused) that almost no-one implements it correctly, instead simply doing the simplest thing that works best for 99% of the audience.


If I access Google using and European IP it will show the Google page for the Country I am in, regardless of my accept-language

(I don't have an accept-language for Dutch, Italian, German, French but in all these cases I was shown the local page)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: