More

zbraniecki · on Aug 11, 2023

Hi! Thank you for your critique!

> 1) “Your knight has killed a dragon with a crossbow”

We have a proposal for dynamic references to address this problem - https://github.com/projectfluent/fluent/issues/80 - it's non-trivial but I hope we'll see it solved in Fluent and/or in MessageFormat 2.

> 2) The parser is extremely sensitive

True. It's on purpose. We wanted to start with strict and loosen, rather than the opposite.

> 3) The input files mandate a weird arrangement of new lines for even the simplest branching

Same as above.

> 4) The documentation is too Spartan to know what happens in edge cases.

We're a small team :)

> It heralds itself to be the saviour of all i18n, but it’s literally worse than the mess that came before it.

I'm sorry to hear it doesn't work for you. I'm relieved that your criticism is seems more subjective except of one missing feature that no other l10n system has as of yet. We'll keep pushing, but if you encounter a better l10n system, please let me know! We're working on Unicode MessageFormat 2.0 based on Fluent and incorporating lessons learned.

troad · on Aug 12, 2023

I wish you guys the best, but I think you’re being a little self-congratulatory here.

The first feature is not optional - it has been a feature of i18n systems since the 1990s, possibly earlier. I’ve seen cludged-together in-house solutions that can do it without breaking a sweat. It is currently not feasible to use Fluent to localise any substantive, dynamic content in languages with case or gender - which is the main challenge an i18n package exists to solve. (I note the issue you link is five years old, dismisses the problem as not significant, and flat out states it is not being worked on.)

Translation files are generally made by translators, not programmers, and the fact that Fluent falls over in a slight breeze makes it difficult to imagine a translator being able to produce working Fluent files. This is not a ”subjective” problem. Translators do not, and should not, work for free. Using Fluent adds considerable (and needless!) complexity and therefore expense.

As you point out, you’re working on a new data format, so it’s unclear why anyone should adopt (and pay for translations in) the current moribund format.

I genuinely do wish you guys the best, and I apologise if I spoke too bluntly above, but it is not merely a matter of personal opinion that Fluent is de facto still in alpha.

zbraniecki · on April 18, 2023

We're happy to announce ICU4X 1.2, containing a host of new features with a focus on text engines. The new ML-powered break iterators and HarfBuzz bindings enable developers to perform text layout on many platforms and resource-constrained environments.

zbraniecki · on July 16, 2022

Hi, just for context - this was a comparison of `unicode-normalizer` crate to ICU4C.

Since then @hsivonen from Mozilla wrote a new normalizer that recently got merged into ICU4X - https://github.com/unicode-org/icu4x/tree/main/components/no...

I don't have perf numbers yet but I suspect it to be perf comparable to ICU4C at least.

zbraniecki · on June 2, 2021

Hi all!

I've been working on ECMA-402 for the last 6 years on behalf of Mozilla. I'm excited to see it showcased on HN! We have an amazing, inclusive and open community of engineers, linguists and standardization exports from all of the World from largest corporations to smallest non profits and maintainers of open source little libraries.

If you'd like to see what we're working on, see https://github.com/tc39/proposals/blob/master/ecma402/README...

If you'd like to join us, please check out https://github.com/tc39/ecma402/blob/master/CONTRIBUTING.md

Our biggest challenge now is to make sure that anyone can use ECMA-402 - either with a well supported JS engine (V8, SpiderMonkey, JSShell etc.) or via library. To achieve that we're working on Rust project called ICU4X which aims to be able to back ECMA-402 in web browsers, on servers, in client solutions and offer FFI to many programming languages including to JS over WASM.

If you'd like to help us with that, there's tons of work and we're very eager to grow our community! Check out https://github.com/unicode-org/icu4x/blob/main/CONTRIBUTING.... and https://github.com/unicode-org/icu4x/tree/main/docs

If you have any questions, AMA!

zbraniecki · on June 2, 2021

`strftime` is not an internationalization API! It's a datetime formatting API! :)

I can see an appeal for it, and I wouldn't mind `Date` or `Temporal` to have such pattern-driven formatting, but it is critical to recognize that it is not internationalization and thus doesn't belong there.

In particular, if you use `strftime` like formatting you're doing the opposite - you're hardcoding the formatting into a single pattern. It may be the right thing for your project, but it definitely is not i18n :)

csande17 · on June 2, 2021

In practice, though, to implement Intl you pretty much need a robust implementation of a strftime-like date formatting function, as well as a full set of month/weekday/etc strings for every language you support. So it's less "add this functionality" and more "expose the functionality this clearly already has".

zbraniecki · on June 2, 2021

You're correct. Intl formatting has two components: - Selecting the appropriate pattern for a given locale - Formatting numbers and words into the given locale (example: eastern-arabic numerals and Arabic month/week names).

You might say "I want to supply my own pattern, you just do step two for me" and technically we can provide that functionality by exposing it.

The issue is that from the API design perspective it would lead to people misusing the API misunderstanding what is going on and believing that they "internationalized their UI" which is not the case. In fact, they'd make things worse for their users than if they just displayed a date in a single locale with consistent pattern+localization because in some cases "MM/DD" and "DD/MM" are indistinguishable when expanded and that may lead to data loss, security loss, or just confusion.

I'd argue that every case where you want to supply your own pattern is a case where you should not attempt to internationalize that pattern. I also recognize that it's just my opinion.

zbraniecki · on June 2, 2021

I'm one of the leaders of the ICU4X project which is a Rust implementation of ECMA-402 aiming to back client-side solutions.

We hope to eventually back SpiderMonkey (Firefox JS engine) implementation with it, but we also want to target WASM and in result expose it as a polyfill for any browser to use.

I don't know if by the time ICU4X is 1.0 IE11 will still matter, but it may be possible to compile it to asm.js and run in IE11 maybe?

zbraniecki · on June 2, 2021

I18n parsing is indeed hell. I'm very very reluctant to try to add it to ECMA-402 for that particular reason.

Thank you for writing a user land library for it. It's a thankless task and a very important one and I think such a complex problem deserves a userland solution!

zbraniecki · on June 2, 2021

Thanks! If you'd like to help we just merged Segmenter API into ICU4X with an intention to back the ECMA-402 Segmenter API! See https://github.com/unicode-org/icu4x/pull/717

We could definitely use some help :)

zbraniecki · on June 2, 2021

We have been talking about this problem a bit. The tension here is that UX mocks a particular locale as "pixel perfect" and doesn't really "care" about how it'll i18n into other locales.

So they want "do the right thing" for 103 out of 104 locales, but "do what I told you" for "MY" locale.

This is a bit of a conundrum because we don't want to treat any locale as "special". To translate it to code you'd write something like:

  if (currentLocale == "THE_ONE_UX_CREATED_MOCKS_FOR") {
    let value = formatToUXProvidedPattern(data);
  } else {
    let value = data.toLocaleString(currentLocale);
  }

I don't have a great answer how to approach it since there are severe drawbacks and risks to all known potential approaches to "squeeze just one particular format that UX provided into i18n database", but I just wanted to say that we're aware that Intl API has the clash with the UX-driven-development.

arvinsim · on June 2, 2021

I think the problem is that design teams don't bake in internationalization when it is not explicitly asked for. It used to be the same thing for responsive pages.

felixfbecker · on June 2, 2021

Design tools are also severely lacking here. Even modern tools like Figma have no way to manage copy well (e.g. swap out copy for different "sets" of strings to test the design with variable-length translations). Heck, you can't even use the spell checker or Grammarly on text boxes or comments despite it being a browser-based app.

zbraniecki · on June 2, 2021

The list your linked is from 2012. In the last 9 years ECMA-402 has been heavily influenced by Mozilla's projects like Firefox OS and MozIntl APIs. :)

capableweb · on June 2, 2021

Yeah, I guess the specification was finalized in 2012? The specification itself has the same list of contributors: https://402.ecma-international.org/ecma-402/1.0/ECMA-402.pdf

Unless the specification has been rewritten under another name, I'm not sure why the date matters? The specification was mostly written by non-mozillians and I still don't quite get the point from m0llusk. Firefox OS and MozIntl APIs is not mentioned nor linked in the submission either. Maybe I'm just missing both of your points entirely, if so, I'm sorry.

zbraniecki · on June 2, 2021

Fair question, thanks for asking!

There have been 9 revisions of ECMA-402 since 2012. We are, in fact, following TC-39 on a yearly cadence of releases.

You can see the history of editions here: https://www.ecma-international.org/publications-and-standard...

The first edition (finished in 2012), had just NumberFormat, Collator and basic DateTimeFormat.

Since then, we added Locale, PluralRules, ListFormat, RelativeTimeFormat, DisplayNames, two new revisions of NumberFormat, two major additions to DateTimeFormat, and we're now adding Segmenter, LocaleInfo, CalendarInfo and working on MessageFormat 2.0.

Here's a very incomplete list of finished proposals that have been merged into the standard and implemented in browsers - https://github.com/tc39/proposals/blob/master/ecma402/finish...

I may or may not be involved in majority of them! :)

capableweb · on June 3, 2021

Thank you a lot for clarifying and educating! Also thanks to your contributions, you've definitely made the web a better place :)

elangoc · on June 2, 2021

Ah, I see... :-T