Beginnings of a i18n plugin agnostic api for rails edge

Hi all,

I just added a ticket (http://dev.rubyonrails.org/ticket/9726) with a very rough proposal of how a i18n plugin-agnostic api for rails may look like. I've added details in the ticket so all those interested please read that and provide as much feedback as possible.

Many thanks,

Saimon Moore (Globalize for Rails plugin developer)

This is great! Kudos to you and Corp.

I hope this gets accepted and commited soon. Rails i18n doesn’t have to be core (as most of us agreed after previous discussions), but supporting plugins that do this is a great step forward. I guess it’s clear to everyone that plugins that monkeypatch or redefine core methods are not something we want to have in our new, Rails 2.0 apps.

Oh my. I so heart that.

I just added a ticket (http://dev.rubyonrails.org/ticket/9726) with a very rough proposal of how a i18n plugin-agnostic api for rails may look like. I've added details in the ticket so all those interested please read that and provide as much feedback as possible.

There are a few comments in the trac ticket which could do with addressing. But I'm a little confused by the functionality of String#t, that doesn't look like the no-op it should be?

I guess we should keep on the discussion here instead of on Track.

My points were:

- I'm ok with a translation hook that could be used by all translation/ l10n plugins. - String#t isn't clear enough and should be named String#translate or something more obvious (you can always use alias in your plugin if you want to) - the translation method should be 100% transparent for Rails and not change the existing pluralization system. - the translation method should be added to Symbol, Date, DateTime, Time - namespacing should NOT be forced by the API by reserving the Symbol class for that effect - this API should wait until after 2.0

Koz, the only way we have to localize Rails at the moment is to overwrite all places where content is localized in English only. For instance, I had to overwrite a lot of helpers to display a date, currency, ActiveRecord error properly in a different locale. Having a String#translate method for instance would create a simple hook we could use by extending this method. By default the #translate method would just return what was passed to it. However a plugin developer could decide to use localization files to translate the passed string.

Each localization plugin deals differently with translation, interpolation, pluralization, namespacing, conventions, locales...

I don't think there is one 'proper' way of doing this kind of things since it really depends on what you are trying to do. How many languages are you gonna support, one, 10, 50? Do you want to store the translation in the database or in files? Do you have a 'master' language or do you use translation keys? Do you want to organize your translation by namespaces? ..etc...

Because of all these challenges, the guys from Globalize came up with this patch allowing a 'cleaner' hook into the core of Rails. My only worry is that this patch is too biased to their view of localization.

The whole reason behind me writing the GlobaLite plugin is that Globalize was way to complicated for what I needed to do (at least in 90% of my projects requiring localization). (GlobaLite automatically "translates" rails core in the user language without worrying about database migration etc... However GlobaliZe handles database content translation.)

To sum up mu thoughts, I'm 200% for a localization API/hook so we wouldn't have to monkey patch the core every time you guys change something. But honestly, having a diversity of localization solutions is awesome and I would hate to see this new API/hook forcing people to do things a certain and limit other developers.

-Matt

To sum up mu thoughts, I'm 200% for a localization API/hook so we wouldn't have to monkey patch the core every time you guys change something. But honestly, having a diversity of localization solutions is awesome and I would hate to see this new API/hook forcing people to do things a certain and limit other developers.

This view is shared by all of us, and as you can see, the patch by no means forces you to do anything, since you can override the method completely. While the *default* implementation proposed here reflects some of the concepts from Globalize, it doesn't come near the issue of # of languages to support, etc. Those are separate issues unrelated to this patch.

The only question right now is how to handle the default implementation of things like pluralization for Rails internals. If you can come up with a simpler, better way to do it, please provide a patch we can look at. As we have seen, these discussions don't go too far without actual code to review.

Joshua

This view is shared by all of us, and as you can see, the patch by no means forces you to do anything, since you can override the method completely.

That's exactly my point, I should not have override anything, that's exactly what we are doing with Rails atm and the whole purpose of this patch is to add a no-op method as mentioned by Koz.

I'm super busy at the moment and it's hard for me to find the time to write a no-op translation method covering all of 2.0

Since 2.0 PR is out, I would expect that until 2.0 final is released, the main focus would be on stabilizing and fixing the core, not adding new features. In the mean time, I'd love to hear from other people such as Chris Wanstrath from errtheblog author of Gibberish, Jeremy Voorhis (Globalize), DHH (author of a simple localization plugin ;)) and other developers using localization plugins / gettext.

I don't see a big rush to get that api/hook implemented since most plugins work well, but maybe I'm wrong.

-Matt

Hi all,

Sorry for my delayed response (I was feeling a bit under the weather).

I'm very happy that our patch is receiving this kind of attention as it shows that a lot of people are seriously interested in i18n hooks appearing in rails core.

Also the main reason we decided to add the patch (in it's current state) was to generate just exactly this kind of discussion which is great as this is exactly what is happening. Whatever api eventually does get in to rails core, it should be one that has the blessing of all developers directly involved in i18n rails development.

Here are my views on what has been said (I'm including things mentioned in trac)

- In the current patch, the translation method has been added to String, Date, DateTime & Time.

- The reason we decided to use String#t rather than String#translate was mainly one of convenience and readability. As this method will need to be added to many fixed strings throughout the rails core codebase we opted for something short and sweet which wouldn't add to much baggage to the rails code itself. I think that String#t can quickly become a symbol that every rails developer associates with translation. Saying that I'm not really bothered either way. A String#translate method (with aliasing as required) would also be acceptable to me but I think we should get a consensus on this between all interested parties. I don't think the api itself should do any aliasing. i.e. If the hook method is translate, any aliasing should be done by the api implementors and the rails core code itself should use the unaliased method.

- The issue of pluralization is a tricky one. In order for a string to be translatable the entire string needs to be passed to String#t.

   i.e. consider the following (from action_view/helpers/ active_record_helper.rb):

#134 header_message = "#{pluralize(count, 'error')} prohibited this #{(options[:object_name] || params.first).to_s.gsub('_', ' ')} from being saved"

There are two ways of ensuring that this string is translatable:

1. header_message = "#{pluralize(count, 'error')} prohibited this #{(options[:object_name] || params.first).to_s.gsub('_', ' ')} from being saved".t

2. header_message = "%d errors prohibited this #{(options[:object_name] || params.first).to_s.gsub('_', ' ')} from being saved".t(count)

Version 1 just passes the entire string to String#t (noop) and assumes the api implementer can handle the translation of this string.

Version 2 modifies the string making it clear that the string has a plural that has a dynamic value making it easy for api implementers to handle the pluralization of this string in other languages. Yes, this isn't a noop but it's of very little cost and provides a significant advantage over version 1.

- Simple strings that don't require pluralization, in the this patch, are just returned as is (i.e. noop)

- The patch currently provides for simple/multiple interpolation (again another non-noop) but this by no means is required to be done by the String#t method for any string, rather it's mainly used when a string has a plural and avoiding the noop is not possible.

- The issue of namespacing is again another topic that is best to be discussed about up front which is partly the reason why we introduced the assumption in the patch. Namespacing of translatable strings has many obvious advantages. But it seems to me that a big win for them is their use in exactly the situation we're in. i.e. Making fixed strings in a framework translatable. It seems logical to me, that all fixed strings in rails core be automatically (and silently) added to a particular namespace in order to avoid clashing.

  In fact, the default implementation does not even have to physically add these strings to any particular namespace, it can just be assumed that for the api all rails core strings should be added to the 'rails' namespace by any implementors (Thus reducing clutter in rails core).

  As far as, the actual syntax for namespaces is concerned, we like the idea of using a Symbol (or Array of symbols for nested namespaces) to denote the namespace but again this is definitely open for discussion. We like the idea of the following mapping of arguments to String#t:

    String -> Interpolated value     Integer -> Plural value     Symbol -> Namespace

  But it's true that other i18n implementations use Symbol as the key for translations so we should agree on some common ground. IMO, a translatable string is already a key, in the same regard as using a symbol for this purpose. You can always provide a distinct 'translation' in the language it's written it.

  I also think it's beneficial that the API be flexible enough to allow the use of Strings and/or Symbols as the translation key. Saying this, I don't see how this clashes with the use of Symbol as an argument to #t/#translate

  As I just mentioned, the handling of namespaces (which I think is an integral part of any i18n lib) could be left completely up to the implementor but I think it's a good idea that the api forces a particular syntax otherwise you end up with lot's of different types of syntax for namespaces making it less flexible for users when switching between implementations. Again, let's discuss this further.

- I don't see any particular reason why this API should have to 'purposefully' wait until after rails 2.0 though in fact, I assumed this to be the case given the lateness of it's appearance and the time required to get a consensus between interested parties and a clean usable patch against the entire rails codebase.

But I do think this is something to be pushed along as quickly as possible to perhaps make it in for an inevitable 2.0.x rails release. Josh and I, will certainly be promoting this.

Just to reiterate that this patch is meant for people to play with and discuss over and NOT as a proposed real patch. i.e. We don't consider it by no means +1able yet.

Finally, let me add my voice to the call for other prominent i18n/core rails developers to voice their opinions on this issue so we can get to a consensus as soon as possible.

Regards,

Saimon

Hey Saimon,

I hope you feel better. Thanks for clarifying few details and thanks for generating this healthy discussion.

Since we are talking about the core localization, we obviously have to talk about it's internationalization.

I have a problem when we use real language strings as translation keys. If you don't mind I'd like to discuss this issue.

Let's say we use English strings as base for localization and out string looks like: 'There were problems with the following fields:'. The string itself becomes the translation key or translation reference if you will. Each language will use this string as reference. The obvious problem occurs when the English is modified. If the string becomes: 'There were problems with the following fields' or "A problem occurred with the following fields:'then all the translations are broken. (in globalize, I believe the recommended way is to create a dummy language and to use it base language)

Which leads me to another question: should the Core be language agnostic? We could certainly use translation keys linked to no specific languages. For instance: active_record_default_error_messages.

We already have something quite similar in AR. In this specific case we have a class variable containing a hash using symbols to reference each message:

Module ActiveRecord

  # Active Record validation is reported to and from this object, which is used by Base#save to   # determine whether the object in a valid state to be saved. See usage example in Validations.   class Errors     include Enumerable

    @@default_error_messages = {       :inclusion => "is not included in the list",       :exclusion => "is reserved",       :invalid => "is invalid",       :confirmation => "doesn't match confirmation",       :accepted => "must be accepted",       :empty => "can't be empty",       :blank => "can't be blank",       :too_long => "is too long (maximum is %d characters)",       :too_short => "is too short (minimum is %d characters)",       :wrong_length => "is the wrong length (should be %d characters)",       :taken => "has already been taken",       :not_a_number => "is not a number",       :greater_than => "must be greater than %d",       :greater_than_or_equal_to => "must be greater than or equal to %d",       :equal_to => "must be equal to %d",       :less_than => "must be less than %d",       :less_than_or_equal_to => "must be less than or equal to %d",       :odd => "must be odd",       :even => "must be even"     }

It makes things really clean and easy to change and/or overwrite.

To help people understand better the challenge, here is the list of all the hardcoded English strings that need to be overwritten to use Rails in a different language:

http://globalite.googlecode.com/svn/trunk/lang/rails/en-US.yml

Not a big deal really, but the problem is that we all have to monkey patch many core classes to translate the hardcoded strings. (it's not always cleanly done as in ActiveRecord Errors.

The advantage of using such as format is that you can create your own namespacing by 1/ using class variables. 2/ using constants (please don't) 3/ namespacing your key: active_record_errors_key

My other concern is that if we do our job properly, I *think* we should be able to release a Rails version (without any required plugins) where "ONLY" the core should be available in multiple languages.

During last RailsConf I had a discussion with David Heinemeier Hansson regarding the very same issue. If I remember correctly, the agreement was that the core team should only be responsible for the English localization of the Core, the rest would be up to the community as it's the case at the moment.

Why not making this API a bit more useful and provide for the needs of 80% of the foreign users out there and making the core dead simple to localize. I'm not talking about anything complicated, just drop/ overwrite/load a file in Rails and all the core strings are now in your language.

Anything else can be handle by 3rd party plugins, UI localization/ internationalization, database content, anything else than the core is up to the developer to deal with. But when it comes to the core, a simple file should be sufficient. (I'm not talking about supporting multiple languages at the same time)

If you think about it, most Rails developers (I would guess above 98%) only use Rails in one language. It might be English, Spanish, French, Chinese or Swahili, their app is only in one language. I believe this is what we should focus on: how to make their life easier. Let's not focus our energy on the 2% managing content in multiple languages at the same time, let's keep things as simple as possible.

My suggested way of doing that is to push any hardcoded English strings into a class or a yaml file and load it at runtime. That's exactly the concept behind Simple localization ( http://simple-localization.arkanis.de/ ), GlobaLite ( Google Code Archive - Long-term storage for Google Code Project Hosting. ) and gibberish ( http://errtheblog.com/post/4396 ). Dead simple, already done by many plugins and not 'too' obtrusive.

I'd be interested in knowing what you guys think of that suggestion and the key vs English string issue.

-Matt

p.s: if you are a developer using Rails in a different language than English, please let us what you would expect when it comes to having Rails 'speak' your language.

I tried this approach and if you do semi-decent copywriting for your mesagges and views the keys in the string resources get obsolete on each commit. I found it to be pretty disastrous.

I fully agree Julian.

We are going to have an IRC meeting on Thursday 4 at 17:00 UTC / 19:00 UTC+2 (Barcelona) / 1pm EST at #globalize over on freenode.net to discuss the API. (at least 3 l10n rails plugin teams will try to find a compromise to provide Rails with an awesome localization API)

Feel free to join us.

-Matt

Guys,

Strings-as-keys are really just keys, like Symbols-as-keys are. (I.e.
they are used to lookup your translation.) You get into trouble when
you change your keys (something we know from elsewhere). But that's
not special to Strings-as-keys and Symbols-as-keys do not solve this.

But that's not special to Strings-as-keys and Symbols-as-keys do not solve this.

Sven, I think you missed the point. The issue with Strings-as-keys is that most of the time you use real sentences as keys, usually in your 'base' language. That's where the problem is. It's not really a technical issue it's just that you are giving developers a rope to hang themselves. I would expect most people would use the real strings they want to use in their app as keys. That's where they nightmare starts.

Using symbols/hash keys helps a lot since you are not tempted to enter the real thing since they don't translate directly in real strings.

It's all about being agile and offer flexibility when it comes to copies. I never worked on a project where the copies were right the first. We always went back and changed things around.

-Matt

Julian has noted disatisfaction with string keys, but I suspect he didn't use a dummy language...

I'm a big believer that the "dummy language" approach is very effective. While the concept's name leaves a bit of a sour taste in your mouth ('dummy'), the concept provides a great ability to segregate the "programmer's" language from the "production" language. For example, as the programmer, I will write an error message like this and make it a translatable string:

  "The userid you supplied was not found"

The product owner laughs at me and says that sounds stoopid and tells me to replace it with "Your login was incorrect" That's when I whip out my dummy language bomb and tell him to translate it into his "Marketing" English if he doesn't like it.

So the dummy language serves as a language of last resort (if no translation is available) and as the "development" language used before the product owner comes in and cleans up (or wreaks havoc on...) the textual representation of the site. The translation keys rarely change -because the programmer has got better things to do than iteratively refine ("peaufiner") the text content he sees. And if they do change, it probably signals a need to review the translations anyway.

Once you solve the problem of changing keys, string keys have some nice properties -they are expressive and tend to convey enough meaning that a translator doesn't need to search around a lot to translate it. Of course you could do the same thing with symbols, but 30- character symbols start to look strange. To take an extreme example, which would you rather translate, "red" or :clr2? If that seems contrived, how about "The userid you supplied was not found" or :login_failure_23?

Finally, if the Rails Core team adopts i18n comprehensively, it is unlikely we will be satisfied with the totality of their choices of words -even when selected by a native (English) speaker. So the benefit of a dummy language also extends to the case where the programmer isn't even you -it's DHH et. al. Or even a plugin developer.

All that to say that string keys + dummy language works VERY nicely. I manage a Rails site that renders content in about 25 different languages (including four versions of English) and the dummy language works great.

FYI, 'en-XX' is the private RFC 3066 code I use for my dummy language.

+1 for string keys

How about all the benefits of string keys and all the performance of symbol keys?

irb(main):001:0> :"red" => :red irb(main):002:0> :"The userid you supplied was not found" => :"The userid you supplied was not found" irb(main):003:0> :"#{1}-#{$$}" => :"1-3280" irb(main):004:0> :"#{$0}" => :"irb.bat" irb(main):005:0> :'#{1}-#{$$}' => :"\#{1}-\#{$$}"

You just need to prepend a string with : to turn it into a symbol.

Jeremy

Thanks Jeremy but I'm not sure understand your point. Are you suggesting to use symbolized strings?

-Matt

Thanks Jeremy but I'm not sure understand your point. Are you suggesting to use symbolized strings?

I'm not really suggesting anything. I'm just demonstrating that string keys are no more expressive than symbol keys, assuming you use appropriate syntax. Symbols have better performance, so is there any reason to use string keys?

Jeremy

FYI people a bunch of us got together today to discuss the proposed API.

Globalize, Simple Localization and GlobaLite plugins were represented. We had a good discussion and tried to come to an agreement and a suggestion for the core team. We will meet one more time next week to finish the discussion.

For your information, we are keeping track of our progress and will share the result of our work with you more than likely towards the end of next week (that's if everything goes according to plan)

-Matt

What was the outcome? This discussion died out …

Interesting bit of trivia: this was first proposed 3 years ago, but in a less extensible way:

http://dev.rubyonrails.org/ticket/1089

It has also been proposed multiple times in the last 2 years. Every single discussion or patch died out. Let’s not let that happen again!

Sorry, we got together few times, but I’m in the middle of the San Diego fires so we had to postpone our meetings. We are really close to a proposal, just working on the details.

-Matt