RFC: Enterprise-ready internationalization (i18n) for Rails

We have been working on a Rails application for a big company, that will be rolled out in many countries. We have carefully followed Rails development over the last 8 months and tried out different tool sets for internationalization: Rails 2.1 together with the gettext 1.93 gem, ActiveSupport.I18n as part of Rails 2.3 with and without fast_gettext.

All our approaches still required extensive monkey patching resulting in high, unexpected efforts. The solutions work in 95% of all cases, which is probably sufficient for most Rails applications, but in our case it is not.

We think it is possible to implement a reliable 100% correct localization with reasonable effort, but some important principles have to be considered in the Rails core though.

# Internationalization principles

### Separation of roles

Often software developers are not able to or are not allowed to create translations of their applications' texts for several reasons:

* There are only a few developers who perfectly speak several languages. * For commercial-grade applications the exact wording usually is controlled by   the marketing department. * In open source projects it should be possible for enthusiastic non- technical   users to contribute improvements or complete translations. Consequentially,   some open source projects have defined even [more roles][gnome- roles].

Important note: non-developers need [tools][poedit] for editing language files!

### Streamlining the development and translation process

Identifying messages by a symbol in a YAML file (like in Rails 2.3) is problematic, because it breaks the developer's flow: you have to stop coding, come up with a good identifier (symbol) name for your message, go to a YAML file, and type in the message.

Later on the translator does not know a message's context and needs to open two YAML files side by side - one contains the context and the other one gets filled with the translations. In contrast to that, the [gettext approach][gettext-approach] works smoothly - for C and for Python; for open source and for commercial projects.

    _('Archive is invalid')     _('%{attribute} must not be empty') % attr

are both easier to write and easier to translate.

With command line tools such as [msgfmt -cv][msgfmt] you can also check the well-formnedess and completeness of your transaltions as part of the *continuous build process*.

A reliable, high-quality, feature rich parsing tool for Ruby and for Rails still needs to be implemented, but [ruby-gettext] is a good starting point.

### Linguistically correct translations

ActiveRecord validations support the concept of error messages and full error messages. From a linguistical point of view this does not work: there is no way to infer a correct full message from its short message counterpart and vice versa. The string concatenation approach used by Rails (almost) works for English but rarely for other languages.

If you can not infer one message from another, the distinction does not make sense. You only need one kind of message, preferably with a placeholder for the attribute name. (see the example above.)

The current implementation is both overengineered and not sufficient:

    # lib/active_record/validations.rb:196     full_messages << attr_name + I18n.t ('activerecord.errors.format.separator',       :default => ' ') + message

The main problem with this solution is: if a language needs a different separator for different parts of the sentence, then it will probably also differ in more vital aspects. For example, it might insist on a different order of words in a sentence.

A message can only be translated as a whole. Hence, it should be possible to provide custom ActiveRecord validation messages at any time. For us it was only possible with [a dirty hack][custom-validation-messages].

Usage of string concatenation for building error messages in the framework makes it [extremely complicated][remove-prefix] to avoid the corruption of error messages with a prefix derived from attribute/relation names.

*String concatenation should never be used to create human-readable messages. Use string interpolation instead (as it has been used in other frameworks and platforms for decades).*

In addition, ActiveRecord should allow for proc-based validation messages:

    validates_format_of account, :message => proc {...}

Of course, a robust pluralization implementation, as provided by gettext, is important, too.

### Locale selection

All the different localization libraries try to select an appropriate locale corresponding to their own rules and in a transparent way. The corresponding logic is often buried deep in the library's implementation and cannot be fixed using monkey patching.

Even if our application only offers English and Italian, for example, the gettext library with its ActiveRecord extensions sometimes shows validation messages in Greek (depending on the user's browser settings). Of course, libraries and Rails should be able to provide translations in a plenty of languages, but the application should have the last word in the decision, which subset of possible languages is offered to the user.

A callback in the application controller which can be overridden by an application developer would be an advantage. A before_filter would also do, but it has to be executed before all other before_filters.

    # initializer/internationalization.rb     offer_locales :en_UK, :en_ZA, :nl     default_text_domain 'myapp'

    # application_controller.rb     class ApplicationController       def compute_effective_locale         # application specific implementation, that uses

I noticed that the formatting somehow got screwed up. You may find the html version easier to read http://github.com/geekq/rails/blob/854140d9401ee25fae5b5e0f8c1436818507e796/rfc-internationalization.markdown

Regards,

Vladimir

Hey guys,

thanks a lot for you proposal. I think we all agree that rails-i18n can be improved and input on that is highly appreciated.

This is a very long mail, so I'll cut some of it.

All our approaches still required extensive monkey patching resulting in high, unexpected efforts.

Could you please list the bits you had to monkey patch, maybe providing some code we can look at? Or does this only refer to AR messages?

The solutions work in 95% of all cases, which is probably sufficient for most Rails applications, but in our case it is not.

Yes. The initial goal of the rails-i18n project was to a) provide a common API all I18n solutions could build on while b) providing an implementation (simple backend) that works for English. It turned out that this implementation seems to work for (as you say) 95% of all usecases which is much more than we expected.

We are currently seeing concurring implementations (backends) which IMO is a good thing but doesn't necessarily mean we need to integrate all of them into Rails core right now.

Often software developers are not able to or are not allowed to create translations of their applications' texts for several reasons:

Agreed.

Important note: non-developers need [tools][poedit] for editing language files!

I agree that currently a solid tool for managing large collections of translations is missing. There are efforts to build such tools though. (E.g. see http://github.com/newsdesk/translate)

Identifying messages by a symbol in a YAML file (like in Rails 2.3) is problematic, because it breaks the developer's flow: you have to stop coding, come up with a good identifier (symbol) name for your message, go to a YAML file, and type in the message.

This is certainly a highly debated topic.

We are refering to this as "default translations as keys" vs "symbols as keys". Gettext also embeds "contexts" (scopes) to the default translation. There are several variations of these concepts and there are pros and cons to all of them.

You name one of the problems with "defaults as keys":

* For commercial-grade applications the exact wording usually is controlled by the marketing department.

"You don't want to those guys mess with your code." seems like a good reason to use Symbols as keys.

Afaik another reason is that the initially picked default translation might need to be changed during the process so there's a risk for keys getting out of sync. Also, with "defaults as keys" there's no way to compute defaults (fallbacks) (which of course affects another highly debated topic: reusing keys). E.g. you can not fallback

In the end we agreed to go with Symbols as keys because we felt that they a) are a better fit for framework needs and b) provides even better means for separating roles (i.e. devs mess with Symbols, translators mess with translations).

There's no reason though why you could not add a helper method _() to your application and then use the same syntax as in your examples:

   _('Archive is invalid')    _('%{attribute} must not be empty') % attr

In Rails I18n Strings can be used as keys. The only drawback here is that you'd need to escape/unescape dots to something else in your _() implementation because they'd be interpreted as scope (context) separators. (This might be an opportunity to improve the API. We've never discussed this further.)

With command line tools such as [msgfmt -cv][msgfmt] you can also check the well-formnedess and completeness of your transaltions as part of the *continuous build process*.

This again concerns the tools layer and isn't necessarily related to the API and/or backend implementation.

Afaik these Gettext tools rely on the keys not being computed though. E.g. devs must stick to using _('Archive is invalid') instead of _(foo.msg), right? This obviously is a limitation that we might not want to rely on for Rails itself. It might be a perfect fit for userland apps though so I can't see what's holding anybody back from using Rails I18n like this.

ActiveRecord validations support the concept of error messages and full error messages. From a linguistical point of view this does not work: there is no way to infer a correct full message from its short message counterpart and vice versa. The string concatenation approach used by Rails (almost) works for English but rarely for other languages.

I'm actually not sure about the current status of this issue, but it's been a known issue when we implemented Rails I18n. AR error messages were subject to an ongoing discussion at that time so we simply ported the existing functionality even though it's suboptimal.

A message can only be translated as a whole. Hence, it should be possible to provide custom ActiveRecord validation messages at any time. For us it was only possible with [a dirty hack][custom-validation-messages].

Allowing Procs for AR messages seems like a good idea to me. It's not the only place where Rails translates stuff though so it's probably not sufficient for replacing Rails I18n with "something else" (i.e. Gettext in your case). I feel the more appropriate way would be to use the API and use a Gettext enabled backend instead.

*String concatenation should never be used to create human-readable messages. Use string interpolation instead (as it has been used in other frameworks and platforms for decades).*

I think we all agree on this.

Of course, a robust pluralization implementation, as provided by gettext, is important, too.

What's wrong or not robust with the current pluralization API derived from CLDR?

All the different localization libraries try to select an appropriate locale corresponding to their own rules and in a transparent way. The corresponding logic is often buried deep in the library's implementation and cannot be fixed using monkey patching.

Rails I18n does not ship locale detection/selection, so there's nothing to monkey patch?

But yeah, you're laying out why we decided to leave features like these to plugin land for now.

If the handling of text messages needs to be refactored anyway, it would be advantageous to switch to the less invasive, proven, and familiar gettext syntax:

   _("The billing system is not available. Please, try again later.")

instead of

   I18n.t(:billing_not_available)

Providing context for translation:

  "Gadget|Title" => (German) "Bezeichnung"

Tbh I don't fully understand how this is less invasive than

t('gadget.title', :default => 'Title')

It's a bit shorter, sure, but that comes to a price, too. And you can always add your own accessor layer/helper on top of I18n#t, no?

The word "Title" is translated differently depending on its context. Hierarchical contexts are not needed, that is YAML files with deeper nesting as in Rails 2.3 do not make sense.

You don't need to nest your keys/scopes/contexts if you don't want to. Even the GNU gettext manual though seems to suggest that there are situations where this makes sense: Contexts (GNU gettext utilities)

Am I missing your point here?

The current interface for plugging in different localization storage backends is a nice intention, but in this case flexibility is not needed. A perfectly designed and working backend would be sufficient.

That's a strong statement as it suggests that there's a silver-bullet solution for all needs. Our experience with several concurrent I18n solutions in Rails' history rather seemed to suggest the opposite.

I believe the way forward can't be to force everbody to use Gettext but instead make sure Rails I18n supports a full stack Gettext solution through the API as seemlessly as possible. That might mean: implement a Gettext backend, provide some helper syntax, maybe make the scope separator configurable.

All that said, thanks again for bringing this up!

Sven

Get this man hooked up to the wagon! It's great to see this level of thought going into Rails I18n, and equally wonderful to know that his efforts could positively impact the framework.

Greetings, Folks!

having implemented a few "internationalized" applications with increasing complexity in the past, I have to agree with some valid points Vladimir names in his post that do not (yet?) seem to have been solved in Rails. But let's take a look first at what we as web developers mostly expect from multilingual applications. Obviously, we do not want to introduce new code into our existing application if a new languages has to be supported by it. By "code" I mean everything from "logic concerning timezone calculation", "formatting rules concerning different punctuation, date, time, currency, pluralization" to "everything that cannot be edited with M$ WORD without significant danger of losing important information". Important information can be anything like indentation and quotation that is easily screwed as soon as the average non-tech guy from YourBigComp Inc.'s marketing department starts to translate things using his favourite "text editor".

So while most of us do write internationalized applications using no more than two languages targeted at an audience within one timezone and exchanging the same currency most of the time, that "separation of roles" is, as Vladimir pointed out, a non-issue for us most of the time. This picture however changes drastically as soon as we're lucky enough to catch a "big fish" client that is actually willing to pay us huge amounts of money for developing an "enterprisey", yet web2.0-ish application that will serve business content to a target audience spread across the globe. Huge amounts of money come, however, with a catch: that application has to conform to certain rules that we only find in enterprise level companies:

* You're not going to be the one that will deploy the application and decide which platform it will run on * You're not going to be the one that will run the application * You're not going to be the one that will maintain the application two years from now * You're certainly not going to be the one that is going to translate the whole UI into half a gazillion languages (by "languages" I mean "locales"!)

In larger companies, interfacing with other departments that usually are far less competent in technical issues is a major concern. "technical issues" can really be things like using a different "text editor" than WORD or conforming to any formatting rules that are not like "insert a bulleted list here" or "make this font look bold" -- sad but true, but writing YAML means "programming" to most folks out there! So you definitely do not want to offer anyone outside your own developer team anything else than the simplest possible type of plain text, "ideally" even encapsulated in a .doc- or .xls-file.

This non-technical world out there seems infinitely far away, but for some of us it's a daily struggle we have to live with. In my current project we have actually managed to get translation done for all different countries using gettext and by sending our gettext PO-files the translators. After a few training iterations, they were able to translate the strings marked with "msgid" into the line marked with "msgstr" -- that really IS more than one is actually allowed to expect from people in that positions (no offense meant at this point).

The major advantages of the gettext PO-file format over any other format currently used in different i18n frameworks are, as I see them:

* it's a not-so technical looking format (go on, flame me for that) * it's incredibly robust: it has enough "syntactical overhead" to allow automated sanity checks and _error correction_! Even if the translator screws whitespaces and newlines, you'd still be able to recover a valid file only by parsing for "msgstr" etc. this is a HUGE advantage over YAML files. * it is only as verbose as it needs to be: XML would of course be more robust than anything else (validation and re-formatting tools everywhere...), but it is way too much overhead (tech stuff! OMG!) to be handled by a non-techie. * there's _excellent_, robust tool support and huge dictionary resources for translating Gettext files. Hey, GNU Gettext has been around for about 30 years or so, tools are mature and platform independent - Poedit, KBabel, Gorm, ... you name it.

Some more remarks on that last point: Even if, at some point in time, there will be "tool support" for the Rails I18n API: tools take time to mature. And if there's one thing about big companies that really gets annoying at times, it's that big companies love solid tools. Companies like (or rather call it 'demand') tools if you want to introduce new technologies, and they want them right away. It is at times not acceptable to hope for someone to come up with a half-baken translation tool that has to be used by those aforementioned marketing guys. An 80% solution just doesn't do the job, neither does a 95% solution -- even if that is "far more than _we_ expected": It's either a 100% support or none at all -- which would mean, either a working solution right now or you're gonna do it with J2EE or C++ or Perl and some whacky CGI based framework, because that's what they've been working with for the past couple decades -- anyway chances are you'll end up using Gettext anyway.

I do not intend to offend anyone contributing to Rails, but it has to be said that Rails itself is, in large parts, far from mature. Especially its new I18n API, which basically seems to me like a re- invention of the wheel. It would be ignorant to say that a rock solid tool for managing internationalization in its full complexity can be expected to emerge from the community within the next year(s), because it is an incredibly complex subject. Things take time, so at this point time would be wisely spent on adopting proven technology into the rails core.

Bottom line: either we offer those translators a simple file format they can handle using the big four-letter-WORD, or we provide them the rock solid tool chain necessary to handle our "strange files". A minimal-effort solution would indeed be to depend on Gettext as a whole (not just as an optional backend, more reasons for that later), since it is neither code-invasive nor visually invasive and it has actually worked really, really well for major OSS project for decades (and will continue to do so, my bet ;)).

Convention over configuration has always been a really helpful, productive and, for that matter, cool design philosophy of Rails. Conventions should not be broken, because they make it easy to implement things using a simple and well-known set of basic rules. Pulling in at least something that looks like the Gettext API into the Rails core would probably make it easier and more productive for the "senior" SW developers amongst us (well, I'm not counting myself in here ;)) to start with I18n right away -- just because it would be done in Rails in the exact same way as it has been done in dozens of other frameworks and applications (<flamebait>even the Python guys over at Pylons use it</flamebait>). and the really neat thing would be, that you could migrate from your old Perl/CGI based web app that has to be replaced with a nifty new Rails app without even touching your translation files, granted the UI would look roughly alike. Think companies with a strong need for continuity at their customer care level. Why bother with scripts for moving the one file format into the other -- just take what you already have and what everybody already knows.

Some words targeted at Sven's remarks: I18n goes much further than just translating strings, applying a new date and currency format and possibly determining the timezone of the current user. You also have to account for some languages actually having an arbitrary amount of pluralization formats, even different pluralization rules depending on whether the subject is male or female and possibly the grammatical gender of the object spoken about. To take that even further: there exist languages on this world that have different words for things that come in two, three or more and additionally depending on the shape of those things. Other Languages use "measure words", i.e. classifiers used in conjunction with quantities and different "classes" of objects. Those are concepts unheard of in English or German, but such languages are actually being spoken in target locations of some companies: just take Japanese or Russian, for example. I'm sure Vladimir could tell us more about the latter -- and drop Matz himself a line to hear more about the former, I guess :wink: The point I want to make here is: if we look at e.g. the ActiveRecord validation mechanism, we have to consider more than just a field name, possibly a number (length, range, size) and some default messsage. It requires logic that nobody amongst us wants to re-invent from scratch as soon as we need it. However, some of us need it right now.

I personally would very much appreciate a solid Gettext based I18n Implementation making it into the Rails core. If whatever company or individual has the resources to actually implement it in a way an enterprise level application could work flawlessly with it without having to monkey patch Rails with every new release, go for it!

May the source be with you

Willem

(Note: I'm biased towards the Gettext approach, after having used it to translate desktop applications)

After reading all the replies I think the issue boils down to "default translations as key" vs "symbols as keys".

"default transactions as key": 1. Pro: lots of existing, mature Gettext tools for creating translations, detecting stale/outdated translations, generating translation statistics, etc.   For example non-tech savvy translators can use Poedit to create the translations, which should be the most fool-proof thing after a web interface. 2. Pro: can easily fallback to the default translation.   This is a huge benefit if you don't have a reliable translation team, i.e. not all translations are always kept up-to-date. This way the user interface can at least fallback to an English string, which is still better than presenting the user with an empty string, a symbol or an error message. This is the case for many open source projects, but probably not so for enterprise developers. 3. Pro: the default translation makes the code easier to understand. Symbols are usually a lot more opaque. 4. Con: it's not 100% straightforward. Developers who implement a localization framework themselves for the first time would probably use the symbol approach. Developers need some training in order to get used to Gettext's workflow of marking strings for translation, extracting them with tools, editing the translation files and compiling the translation files. 5. Con: not possible for translators to change the default text without editing the source code. 6. Con: Ruby-related Gettext tools still suck. For example Ruby- Gettext Rails plugin cannot extract strings from Haml templates. I've seen someone reinvent his own localization framework based on symbols because of this.

"symbols as keys": 7. Pro: easy to understand for new developers. Most people who implement a localization framework for the first time would probably use this approach. 8. Pro: possible for translators to change the default text without editing the source code. 9. Pro: allows falling back to a related string, e.g. "errors.article.invalid" => "errors.model.invalid". 10. Con: very limited tool support, even worse than Ruby-Gettext. 11. Con: makes code more opaque; the meaning of a symbol is not always immediately obvious until the programmer sees the associated string.

It's arguable whether 9 really is a pro. Has there even been any need for this feature? I figure that in most applications, most symbols have no related fallback symbol, and so a missing translation usually results in an error. Gettext falls back to the default string which is usually English, which is still better than presenting the user with the symbol or with an empty string.

4 is pretty awkward for developers who are just getting into localization, but I blame it on documentation. There shouldn't be any problems if the documentation is good. The need to manually compile .po files to .mo files can be solved with the right code. Rails could, for example, auto-compile modified .po files during startup, or someone could write a .po parser and load .po files directly.

6 and 10 can be fixed given enough effort.

So this leaves 5/8 and 3/11 as the only fundamental issues, which are also mutually exclusive: the ability to change the default string without letting translators to mess with the source code (Rails I18n), and whether embedding default strings in the source code makes it easier to understand than using symbols (Gettext). 3/11 might be arguable, I'm sure there are developers out there who don't think that using symbols makes their code more opaque.

Hi Hongli,

cool, that's a great writeup! Thanks for turning this discussion
towards more practical points :slight_smile:

Perhaps it helps when I also add some disclaimer about myself. I'm not
biased towards or against Gettext in any way, too. I've used it a lot
a quite some years ago. In fact it was me who repeatedly tried to get
Gettext people on board while we worked on Rails I18n. I do think
though that the API in fact is the best bread of all solutions for
Rails we had previously, including ruby-gettext. That of course
doesn't mean it can not be improved, but to me it means we should not
go back to a less flexible API.

In your list I'd suggest that 9.) is just an example of a more
abstract point: "Symbols as keys" makes it possible to compute keys.
You can not compute default translations. Rails itself leverages that
for validation messages, I've seen people using it for "resourceful
controllers" (e.g. flash messages) and there are tons of other
situations where this is useful. Computing keys allows you to define a
generic translation that works for most of the situations and
overwrite that for particular situations where you need something
special - thus effectively reducing the amout of repetition a lot. You
can also react to contexts (e.g. pick a particular translation
depending on the type of an object) flexibly where you'd otherwise
need to use generic translations/messages. Thus, I believe that
"Symbols as keys" allow for a more abstract way of coding.

I'd also suggest to add another pair of pro/con arguments to the list.
Using "defaults as keys" usually means that you have the actual
translations cluttered throughout your code. Of course, Gettext allows
you to "announce" translated strings through gettext_noop when you
want to collect messages at a central place but that requirement
really feels much more like jumping through hoops than just using
Symbols in the first place.

(Also, I'd like to remind of the motivations that lead to such
solutions as Gibberish, Globalite (not Globalize) and
SimpleLocalization. People wanted a simple and clean API, they
explicitely did not want to mess with Gettext which was designed,
let's face it, in 1994 for C. It feels old and awkward to many.)

I really wonder though if both approaches actually are mutually
exclusive, or mutually exclusive in all areas (Rails core, plugin
land, user/dev/app land).

Imagine a helper like this:

def _(msg)    I18n.t(msg, :default => msg) end

For pluralization there could be a similar helper. This should work
for all messages that do not contain a dot. I wonder if we can get rid
of this limitation. Approaches that come to my mind:

1. Make the scope separator (dot) configurable. That might mean that
Rails core should not continue using dots as separators (but instead
just use Arrays for scopes). 2. Escape/unescape dots in the helper. 3. ?

Btw. re "Gettext falls back to the default string which is usually
English" - you can do the same thing easily with Rails I18n. It just
wasn't part of our original requirements ("the simple backend works
for English") so we left locale fallbacks to the plugin land.

The remaining issues are *really* subjective. - Putting default translations in the code is clutter in your opinion. In my opinion it's the opposite: it makes the code easier to read. :slight_smile: gettext_noop is a bit weird at first but I don't see it as any worse than what all the other localization frameworks provide. - You view the fact that SimpleLocalization, Globalite and co are not designed with the Gettext style as proof that people want something simple and clean. The way I see it is that they haven't seriously tried Gettext. I think their view of "simple" is like coding a web application without using MVC - it's simpler but it gives you more headache down the road. I find it "interesting" that pretty much all open source desktop applications use Gettext. Gettext has been used to translate hundreds, if not thousands, of desktop applications to dozens of languages. Yet the web applications world seems to completely ignore Gettext. For PHP I can understand, everybody's reinventing the wheel there. But Rails?

Your idea regarding the computability of symbols is interesting. On an abstract level it does seem to fit within the Rails philosophy, but it remains to be seen how useful it is in practice and whether anyone can come up with a good implementation.

In any case, what is clear that at the very least, Rails should have better I18n tools. There should be tools that alert translators which translations need to be updated, how many strings still need translations, for writing the translations, etc.

Hi Hongli,

The remaining issues are *really* subjective. - Putting default translations in the code is clutter in your opinion.

I didn't say it's my opinion :slight_smile: My role in the Rails I18n group was more the one of being a moderator.

- You view the fact that SimpleLocalization, Globalite and co are not designed with the Gettext style as proof that people want something simple and clean. The way I see it is that they haven't seriously tried Gettext.

I'm pretty sure they did.

I think their view of "simple" is like coding a web application without using MVC - it's simpler but it gives you more headache down the road.

Regarding the API quite the opposite is true. You just don't have this feature set with gettext's _(). Regarding the tools layer I agree, but hey, if you want to use poedit for 95% of your messages you can just do that, no? Just add fast_gettext and use it. Also, I bet a converter that takes a flat yaml translations file and converts it to po should not be that hard to do.

I find it "interesting" that pretty much all open source desktop applications use Gettext. Gettext has been used to translate hundreds, if not thousands, of desktop applications to dozens of languages. Yet the web applications world seems to completely ignore Gettext. For PHP I can understand, everybody's reinventing the wheel there. But Rails?

Look at the history of Rails. There were tons of concurring implementations, Gettext being one of them. Gettext hasn't been able to win the race in any way and I think that's for a reason.

Also, we haven't reinvented the wheel. We've extracted what we (based on the experience of several implementors) believed the best ideas are.

Your idea regarding the computability of symbols is interesting. On an abstract level it does seem to fit within the Rails philosophy, but it remains to be seen how useful it is in practice and whether anyone can come up with a good implementation.

Hm? People are doing stuff like this.

flash[:notice] = t(:"flash.#{controller_name}.#{action}.success", :default => :"flash.#{action}.success")

Do that in gettext. Obviously, flash messages are only one place where computability of keys is quite useful.

Again, there was a reason why so many people weren't happy with gettext before and invented their own APIs for years.

In any case, what is clear that at the very least, Rails should have better I18n tools. There should be tools that alert translators which translations need to be updated, how many strings still need translations, for writing the translations, etc.

I agree.

Aside from that though I think some thought should be put into how integrate a gettext style accessor _('foo') and a gettext backend.

I forgot to add the "default argument" against "default as keys" as
another pro/contra pair: keys can easily get out of sync. If it's hard
for a developer to come up with a good key for a translation (while
focussing on development) then it's even harder to come up with the
final English message at this point: there's a good chance for it to
change, so one has to propagate that change to translation files.
(Again, depending on your setup and environment that might be more or
less hassle.)

Btw having default translations in your code, no matter how clean or
cluttered that seems to anybody, will fight one of the major original
points that brought up this discussion: separation of roles (dev vs
editors vs marketing vs translators etc.)

Hi,

= Executive summary :wink:

the most important question is, whether the core team would sacrifice *some* parts of humanize, pluralize and other string concatenation voodoo, especially in ActiveRecord to allow for 100% linguistically correct translations and smooth, enterprise-ready localization workflow.

This, together with other improvements, would make broader adoption of Rails in a more traditional environment, outside start-ups, possible. Currently we have to put a lot of effort into monkey patching to work around the opinionated decisions baked in into Rails. My hope was that Rails3 is planned to become a more of general purpose web framework.

Other questions are only technical details, supporting such a decision.

= Details

Regarding default human readable string as a key Hongli Lai listed some pros and cons, let me turn the three remaining cons to pros and we get a solution, that has only advantages :wink:

1. Pro: lots of existing, mature Gettext tools for creating translations, detecting stale/outdated translations, generating translation statistics, etc.   For example non-tech savvy translators can use Poedit to create the translations, which should be the most fool-proof thing after a web interface. 2. Pro: can easily fallback to the default translation.   This is a huge benefit if you don't have a reliable translation team, i.e. not all translations are always kept up-to-date. This way the user interface can at least fallback to an English string, which is still better than presenting the user with an empty string, a symbol or an error message. This is the case for many open source projects, but probably not so for enterprise developers. 3. Pro: the default translation makes the code easier to understand. Symbols are usually a lot more opaque. 4. Con: it's not 100% straightforward. Developers who implement a localization framework themselves for the first time would probably use the symbol approach. Developers need some training in order to get used to Gettext's workflow of marking strings for translation, extracting them with tools, editing the translation files and compiling the translation files.

I've found the gettext workflow easy to grasp for new developers in every team I have worked with so far. For self-didacts high quality documentation (to be written) should be enough.

5. Con: not possible for translators to change the default text without editing the source code.

Missing possibility of changing the default string has never been an issue, neither in my personal decade of writing international applications (different open source platforms, Microsoft.net and pre-dot-net) nor for big open source projects with longer history. For commercial grade applications the marketing department or release team is going to translate our hacker- English to marketing-conform English anyway. BTW, separately for every English speaking country to account for cultural differences, e.g. translations differ between UK and South-Africa.

The case with a typo in the default message can be handled the same way as a typo in the symbol-name - as a bug, it can be corrected in all the relevant files. There is even some tool support in gettext for this - fuzzy matching and checking for missing translations.

6. Con: Ruby-related Gettext tools still suck. For example Ruby- Gettext Rails plugin cannot extract strings from Haml templates. I've seen someone reinvent his own localization framework based on symbols because of this.

Masao is currently rewriting ruby-gettext. I personally currently prefer the fast_gettext - not because it is fast, but because it has a more straightforward implementation. As opposite to ruby-gettext it does not make attempts to monkey-patch the Rails.

Regarding the parsing of Haml templates - the implementation will likely be up to the Haml users. It can be based on ruby-gettext typical parsing of source code for string literals.

Sven Fuchs wrote:

It turned out that this implementation seems to work for (as you say) 95% of all usecases which is much more than we expected.

So this solution does not qualify for any serious enterprise or governmental (European Union) application. 100% linguistically correct translations are required. 95% is much less that is expected from us.

In your list I'd suggest that 9.) is just an example of a more abstract point: "Symbols as keys" makes it possible to compute keys. You can not compute default translations. Rails itself leverages that for validation messages, I've seen people using it for "resourceful controllers" (e.g. flash messages) and there are tons of other situations where this is useful. Computing keys allows you to define a generic translation that works for most of the situations and overwrite that for particular situations where you need something special - thus effectively reducing the amout of repetition a lot. You can also react to contexts (e.g. pick a particular translation depending on the type of an object) flexibly where you'd otherwise need to use generic translations/messages. Thus, I believe that "Symbols as keys" allow for a more abstract way of coding.

All the kinds of hierarchically organized scopes (computed keys) and (optional) translation inheritance do not work in environment with role separation. Inheritance and method overriding work in OOP. But it does not work for translations. A translation agency needs a comprehensive and flat list of strings to be translated. To be able to make a decision about to override or not to override or where to override they would need to analyse the application source code. Only manually created and obligatory translation scope makes sense. This is a kind of message from developer to the translation team. The gettext convention is to use the pipe character. _("Search|I'm feeling lucky") _("Mood poll|I'm feeling lucky")

People wanted a simple and clean API, they explicitely did not want to mess with Gettext which was designed, let's face it, in 1994 for C. It feels old and awkward to many.)

And since then adapted for 20 different programming languages. Bindings for dynamic language, e.g. Python are very nice. Same API can be used for Ruby too.

For pluralization there could be a similar helper. This should work for all messages that do not contain a dot. I wonder if we can get rid of this limitation. Approaches that come to my mind:

1. Make the scope separator (dot) configurable. That might mean that Rails core should not continue using dots as separators (but instead just use Arrays for scopes). 2. Escape/unescape dots in the helper. 3. ?

Sounds complicated...

BTW, * pluralization rules are different for different languages,   gettext uses a formula in a programming language per language for that * some languages have 3 or 5 plural forms as opposite to 2 in English   and German * only complete sentence can be pluralized, not a single word

Gettext accounts for all that - in code and in the tool chain, Rails I18n - not. So before spending much more time and effort on yet another I18n implementation, we should focus on integrating a perfectly solid solution that is known to work. Fixing and patching and hoping for a >95% solution won't get Rails where many would like to see it in the near future, i.e. in bigger enterprise-y environments.

The question remains, whether this really is the direction towards Rails is heading. If so, we would contribute a solid Gettext based I18n implementation that addresses the aforementioned issues. This however requires some breaking changes within the Rails core and a consensus about the necessity of them being addressed.

Best Regards,

Vladimir

Hi Vladimir,

the most important question is, whether the core team would sacrifice *some* parts of humanize, pluralize and other string concatenation voodoo, especially in ActiveRecord to allow for 100% linguistically correct translations and smooth, enterprise-ready localization workflow.

Exactly which parts are you referring to?

Currently we have to put a lot of effort into monkey patching to work around the opinionated decisions baked in into Rails.

Again, it would be great if you could list the exact places that you found need monkeypatching.

Missing possibility of changing the default string has never been an issue, neither in my personal decade of writing international applications (different open source platforms, Microsoft.net and pre-dot-net) nor for big open source projects with longer history.

It has been an issue which is why people implemented key based solutions.

It turned out that this implementation seems to work for (as you say) 95% of all usecases which is much more than we expected.

So this solution does not qualify for any serious enterprise or governmental (European Union) application. 100% linguistically correct translations are required. 95% is much less that is expected from us.

Right. Which is why we have a pluggable backend so you can implement your needs in plugin land. If you need patching to core, please list the places that need patching. If you need changes to the API, please do so, too.

All the kinds of hierarchically organized scopes (computed keys) and (optional) translation inheritance do not work in environment with role separation. Inheritance and method overriding work in OOP. But it does not work for translations. A translation agency needs a comprehensive and flat list of strings to be translated. To be able to make a decision about to override or not to override or where to override they would need to analyse the application source code. Only manually created and obligatory translation scope makes sense.

Maybe they don't make sense for the most part of translation agencies. That doesn't mean they don't make sense for the rest.

People wanted a simple and clean API, they explicitely did not want to mess with Gettext which was designed, let's face it, in 1994 for C. It feels old and awkward to many.)

And since then adapted for 20 different programming languages. Bindings for dynamic language, e.g. Python are very nice. Same API can be used for Ruby too.

Yeah, still asuming a C'ish API and compilation stage though.

* pluralization rules are different for different languages, gettext uses a formula in a programming language per language for that * some languages have 3 or 5 plural forms as opposite to 2 in English and German * only complete sentence can be pluralized, not a single word

Yup. The API covers that.

The question remains, whether this really is the direction towards Rails is heading. If so, we would contribute a solid Gettext based I18n implementation that addresses the aforementioned issues. This however requires some breaking changes within the Rails core and a consensus about the necessity of them being addressed.

I believe this ship has sailed about 1 year ago. It's not the question anymore whether or not we want that API. The question is if everybody who has good ideas rolls up their sleeves and implements them *using* this common API. If you want to do that for Gettext I'm absolutely sure the community will welcome that with big applause.

For pluralization there could be a similar helper. This should work for all messages that do not contain a dot. I wonder if we can get rid of this limitation. Approaches that come to my mind:

1. Make the scope separator (dot) configurable. That might mean that Rails core should not continue using dots as separators (but instead just use Arrays for scopes). 2. Escape/unescape dots in the helper. 3. ?

Sounds complicated...

Btw I've just pushed some experiments with gettext'ish accessors on top of Rails I18n:

http://github.com/svenfuchs/i18n/tree/gettext

You might particularly want to look at the helper layer and the tests:

For a fullstack gettext support that uses the Rails I18n API there seem to be three things missing:

- complete the helpers (trivial) - implement a gettext backend (anybody?) - figure out a gettext'ish way to announce expected translations for computed keys

Any help and/or feedback would be appreciated!

Hi Sven

Nice.

I'd use this even without having gettext as the backend.

With helpers you refer to the ability to accept named arguments? Would you add that to the _ method or would you do it ruby-gettext style by extending String with a % method?

(I'd prefer _ to accept the arguments directly, saves polluting the String object)

Cheers, Lawrence

Hi Lawrence,

I'd use this even without having gettext as the backend.

heh :slight_smile:

With helpers you refer to the ability to accept named arguments? Would you add that to the _ method or would you do it ruby-gettext style by extending String with a % method?

No idea, I was just checking this out for some kind of proof of concept.

(I'd prefer _ to accept the arguments directly, saves polluting the String object)

Sure. I guess the question would be whether one wants to rebuild the exact gettext api with all of its C'ish methods (sgettext, pgettext, psgettext, ngettext, nsgettext, ...) or not.

I've continued playing with this stuff and added an experimental
Gettext backend:

http://github.com/svenfuchs/i18n/commit/fb7fcfff5e94510dbc1cb0b9b12a374c6828fb6f

It extends from the Simple backend, reads PO files using Masao Mutoh's
poparser [1] and simply loads the translations to the standard Hash
format. They can then be read both using the gettext'ish helper
methods I've played with recently as well as the standard I18n gem API.

     I18n.load_path = [File.dirname(__FILE__) + '/../locale/de.po']      I18n.backend = I18n::Backend::Gettext.new      I18n.locale = :de      assert_equal 'Auto', _('car')

Please note that this is really just an experimental proof of concept
thing. I want to show that it's possible but don't have a real use for
that myself right now. So, any feedback or help with this is highly
appreciated!

Also, maybe this is a good time to take this discussion over to the
rails-i18n mailinglist [2] to work out implementation details? I'll
just post a follow-up over there.

Lemme also point out that there are efforts from other people to
improve Gettext integration for or use alongside with Rails I18n.
Maybe most notably:

- Masao Mutoh's gettext_rails - Sam Lown's i18n_gettext - Michael Grosser's fast_gettext

[1] http://github.com/mutoh/gettext/blob/d36e97af7dc801af1b1ceb5a47450cab90ed078f/lib/gettext/poparser.rb [2] http://groups.google.com/group/rails-i18n [3] http://github.com/mutoh/gettext_rails [4] http://github.com/ferblape/i18n_gettext [5] http://github.com/grosser/fast_gettext

Hi,

Sven Fuchs wrote:

Hi Vladimir,

the most important question is, whether the core team would sacrifice *some* parts of humanize, pluralize and other string concatenation voodoo, especially in ActiveRecord to allow for 100% linguistically correct translations and smooth, enterprise-ready localization workflow.

Exactly which parts are you referring to?

The problem is best visible in the following line:

full_messages << attr_name + I18n.t('activerecord.errors.format.separator', :default => ' ') + message

The counterpart in Rails3 is

errors_with_attributes << (attribute.to_s.humanize + " " + error)

This makes the whole ActiveRecord validation subsystem impossible to use for linguistically correct validation messages.

There are probably more places, where string concatenation is used, but *validation* makes trouble whole the time.

The second issue with ActiveRecord validations is using custom messages. Gettext can not be used at this place without monkey- patching, that adds lambda support.

= Known Monkeys

* in our project we monkey patched as follows   http://blog.geekq.net/2009/04/09/i18n-remove-validation-message-prefix/

* Masao Mutoh pointed out, that we do not need any monkey patching,   if we use N_ from his gettext library because he has already   monkey patched everything.

* more monkey patching from masao   mutoh (Masao Mutoh) · GitHub*

* following library also overrides the full_messages()

* pluralization rules are different for different languages, gettext uses a formula in a programming language per language for that * some languages have 3 or 5 plural forms as opposite to 2 in English and German * only complete sentence can be pluralized, not a single word

Yup. The API covers that.

Did not find documentation for that in activesupport-2.3 / I18n. Now I've found some hints in the current Rails guide.

But people are still forced to do a lot of programming per language, like in

I believe this ship has sailed about 1 year ago. It's not the question anymore whether or not we want that API. The question is if everybody who has good ideas rolls up their sleeves and implements them *using* this common API. If you want to do that for Gettext I'm absolutely sure the community will welcome that with big applause.

Could you point to at least one complete backend implementation, that is entirely based on the Rails.I18n public API, without the need for extensive monkey patching?

Lemme also point out that there are efforts from other people to improve Gettext integration for or use alongside with Rails I18n. Maybe most notably:

- Masao Mutoh's gettext_rails - Sam Lown's i18n_gettext - Michael Grosser's fast_gettext

No, it is not possible to implement serious Gettext or serious internationalization on the basis of Rails I18n API, that is why

- gettext_rails is a pure monkey patch solution, without any usage   of the mentioned API - Michael Grosser's fast_gettext does not use the mentioned API   in any way - Sam Lown's i18n_gettext is a Rails plugin, that simply wraps   the Masao's library and uses it as a fallback in addition to   the Rails simple backend. i18n_gettext is not a stand alone   internationalization solution

Also, maybe this is a good time to take this discussion over to the rails-i18n mailinglist [2] to work out implementation details? I'll just post a follow-up over there.

No, I was discussing the ActiveRecord and Rails core issues here, not rails-i18n issues. If there is no interest here, I'll not bother with further mails.

= Conclusion

I've noticed, that

1. Internationalization is out of scope of the Rails core team 2. Rails.I18n responsible do not grasp the important concepts    of internationalization 3. All cool hacker, that need real internationalization, do    this currently by monkey patching Rails, especially ActiveRecord

So I'll concentrate on doing the third until something changes on the first.

Best Regards and good-bye,

Vladimir

Vladimir,

Sven has tried repeatedly to get specifics out of you throughout this thread. Until this message there's been nothing but vague statements and rehashing of discussions which came to conclusion months ago. Sven and the rails-i18n team *do* grasp the issues that you've mentioned and have their i18n patches applied straight to rails. The guys on that list are responsible for directing the rails i18n effort and we listen to them and take their patches. You've cleaerly identified a few key points where the existing i18n api is lacking, and the ActiveRecord code is inflexible. Let's address those issues, and the right place to do those is the rails-i18n list and sven and co are the ones to talk with.

Rather than throwing your toys out of the cot and feeling self-satisfied in the superior enterprisiness of your approach, you should try to work with Sven and the team to iron out all the issues with the existing api and let everyone benefit from the amount of work you've clearly put into this. If you're genuinely interested in enabling 'true gettext' support and removing the string concatenations in the validations API, then it will be surely be a few small, targeted patches.

If on the other hand you're looking to dump wiki markup into mailing list threads and talk dismissively about the work of other programmers, then perhaps you should do that elsewhere.

We're all working towards the same goal here, just because you've found some shortcomings doesn't mean that the people who did the existing work are evil or clueless.

Hi Vladimir,

The problem is best visible in the following line:

http://github.com/rails/rails/blob/09a976ac58d2d7637003b92d51637f59f647b53a/activerecord/lib/active_record/validations.rb#L207

full_messages << attr_name + I18n.t('activerecord.errors.format.separator', :default => ' ') + message

The counterpart in Rails3 is http://github.com/rails/rails/blob/bab2bfa69220ca1b6c7b56dccc79cf8e41245306/activemodel/lib/active_model/errors.rb#L65

errors_with_attributes << (attribute.to_s.humanize + " " + error)

Great, thanks for pointing that out. This is a known issue and I agree that we should get that fixed. There are few options to do that and discussion about that has already started over at http://groups.google.com/group/rails-i18n

Please join in! We're keen on hearing your opinions.

The second issue with ActiveRecord validations is using custom messages. Gettext can not be used at this place without monkey- patching, that adds lambda support.

Integrating lamda support to the I18n API has been a request for a long time. It's also useful for localizing dates to rather funky rules and such.

I've worked with Clemens yesterday on integrating and polishing his contributions and pushed it to a branch: http://github.com/svenfuchs/i18n/commits/lambda

So this should then be possible:

   validates_format_of :account, :messages => lambda { _("foo") }

Another option to solve this situation might be:

   validates_format_of :account, :messages => gettext_noop("foo").to_sym

This is also being discussed on the rails-i18n list. Please let us know about your opinion.

= Known Monkeys

* in our project we monkey patched as follows http://blog.geekq.net/2009/04/09/i18n-remove-validation-message-prefix/

* Masao Mutoh pointed out, that we do not need any monkey patching, if we use N_ from his gettext library because he has already monkey patched everything.

* more monkey patching from masao mutoh (Masao Mutoh) · GitHub*

* following library also overrides the full_messages() http://github.com/yaroslav/russian/blob/7960596ede5159462c41d5dcd07b137953bf1b3d/lib/russian/active_record_ext/custom_error_message.rb

Great list, this is helpful. Thanks!

* pluralization rules are different for different languages, gettext uses a formula in a programming language per language for that * some languages have 3 or 5 plural forms as opposite to 2 in English and German * only complete sentence can be pluralized, not a single word

Yup. The API covers that.

Did not find documentation for that in activesupport-2.3 / I18n. Now I've found some hints in the current Rails guide.

But people are still forced to do a lot of programming per language, like in http://github.com/yaroslav/russian/blob/7960596ede5159462c41d5dcd07b137953bf1b3d/lib/russian/backend/advanced.rb

Sure. Please distinguish the API from their implementations (backends). This is on purpose.

There are a few more backend implementations in Globalize2:

Please think about the I18n API in Rails as similar to the Rack API support. Rack allows for previously unseen extensibility and exchangeability of concurrent implementations of rather focussed features.

Now, even though Rails 2.x now supports that API it doesn't leverage all of the features it provides. E.g. Rack routing/url_generation is not supported, yet (will be there in Rails 3, afaik). Nobody's arguing Rails should stop supporting Rack for this reason though. And similar the fact that Rails does not perfectly support all features required for proper I18n/L10n does not mean it should stop supporting the I18n API.

I believe this ship has sailed about 1 year ago. It's not the question anymore whether or not we want that API. The question is if everybody who has good ideas rolls up their sleeves and implements them *using* this common API. If you want to do that for Gettext I'm absolutely sure the community will welcome that with big applause.

Could you point to at least one complete backend implementation, that is entirely based on the Rails.I18n public API, without the need for extensive monkey patching?

If by "extensive monkey patching" you mean the bug/shortcoming in AR#full_messages then, no.

No, it is not possible to implement serious Gettext or serious internationalization on the basis of Rails I18n API, that is why

- gettext_rails is a pure monkey patch solution, without any usage of the mentioned API - Michael Grosser's fast_gettext does not use the mentioned API in any way - Sam Lown's i18n_gettext is a Rails plugin, that simply wraps the Masao's library and uses it as a fallback in addition to the Rails simple backend. i18n_gettext is not a stand alone internationalization solution

Yeah, I know these are different approaches from what you have in mind.

I've listed them because I have received some angry private messages that were based on the perception I wouldn't know about or conceal or downplay these efforts. Just wanted to make sure people know that I don't, these are great contributions.

Thanks again!

Hi Sven,

I am happy to hear, that we totally agree on the important items.

We should tackle the problems, as you are describing:

1. use string interpolation instead of concatenation everywhere

> Great, thanks for pointing that out. This is a known issue and I agree > that we should get that fixed. There are few options to do that and > discussion about that has already started over at > http://groups.google.com/group/rails-i18n

2. introducing lambda support for error messages for maximum flexibility

> Integrating lamda support to the I18n API has been a request for a > long time. It's also useful for localizing dates to rather funky rules > and such.

3. discuss/improve the API. My opinion always was, that supporting     different *storage* backends is a good idea. Every developer is     comfortable with yaml files. Others can use gettext     specific .mo or .po (bypassing, as you pointed out,     the dated compilation approach). On the other hand, some things     should not be optional and can not be plugged through the API,     but lets discuss this later, after we succeeded with the first     two things.

> Please think about the I18n API in Rails as similar to the Rack API > support. Rack allows for previously unseen extensibility and > exchangeability of concurrent implementations of rather focussed > features.

This is an excellent example!

It illustrates two things: successful technical design and importance of experience and solution maturity. Instead of reinventing the wheel, the Ruby community adopted successful and proven solution from the Python world, where it is known under the name of WSGI, and further improved it.

I was talking about the gettext whole the time not because I admire the obscure .mo file format GNU gettext utilities , but because the folks at GNU have already seen all the possible problems and addressed them in the design, the tools, and the best practices. GNU gettext utilities

Best Regards and see you on the http://groups.google.com/group/rails-i18n shortly,

Vladimir