Should Inflector do consider accented vowels while underscoring?

Not sure if this is a plugin functionality, so I am asking here.

By default rails works well in italian, except for not seeing accented vowels as vowels while doing underscore inflection.

<pre>

"FacoltàController".underscore

=> "facoltàcontroller" </pre>

but it should be

<pre>

"FacoltàController".underscore

=> "facoltà_controller" </pre>

Note the underscore after the 'à'. The fix is easy:

<pre> --- a/activesupport/lib/active_support/inflector.rb +++ b/activesupport/lib/active_support/inflector.rb @@ -205,8 +205,8 @@ module ActiveSupport      # "ActiveRecord::Errors".underscore # => active_record/errors      def underscore(camel_cased_word)        camel_cased_word.to_s.gsub(/::/, '/'). - gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2'). - gsub(/([a-z\d])([A-Z])/,'\1_\2'). + gsub(/([A-ZÀ-Ù]+)([A-ZÀ-Ù][a-zà-ù])/,'\1_\2'). + gsub(/([a-zà-ù\d])([A-ZÀ-Ù])/,'\1_\2').          tr("-", "_").          downcase      end </pre>

Should it go to core rails?

Hi Marcello,

The fix for central european languages is relatively easy, but the
solution for languages in general is very hard. I'm a bit hesitant to
root for a solution that singles out a few languages and doesn't
support the rest.

If you really want to explore this, it might be a good idea to start a
plugin that replaces the inflector implementation and see if other
people are interested?

Manfred

A good start point can be this plugin:

  http://github.com/rsl/stringex/tree/master

It already deals with accents to generate slugs.

Perhaps an incremental solution would be better -- add support for Central European characters first and see if there's a need for a more elaborate solution.

Only thing I have against this is that I believe all code should be written in English (I'm Danish myself).

Cheers, Daniel

This is what I do, but there are exceptions.

For example I am writing ActiveRecord models for an Italian University. There are no good English translations for many model and attributes names, because these are technical terms used only in Italy.

The same is true for many other applications.

In my opinion Ruby (and Rails) support for other languages is excellent because you can do:

  # app/models/facoltà.rb   class Facoltà < ActiveRecord::Base     def unità     end   end

and it does the right thing! You can use utf-8 characters everywhere and it just works. I only changed 5 regular expression to accepts [à- ùÀ-Ù] and now I can use accents in file, class, function, attribute, and variable names. This also works fine with RSpec and Cucumber. In rspec you only need to add

  $KCODE = 'u'

Cucumber just works. And you can also use accents in URLs (as WikiPedia do...).

It does not make sense not to use non English words if there's no good English translation for them.

Marcello

The fix for central european languages is relatively easy, but the
solution for languages in general is very hard. I'm a bit hesitant to
root for a solution that singles out a few languages and doesn't
support the rest.

This is already what rails does. Support for English like languages is given by default (e.g. support for localization). For more complex languages you need a plugin.

If you really want to explore this, it might be a good idea to start a
plugin that replaces the inflector implementation and see if other
people are interested?

Ok, I will try to do it. But this plugin will have the same problems of old localization plugins, because I need to monkey patch functions.

thanks, Marcello

Hi José, I want to do just the opposite, i.e. allow the use of accents everywhere, slug included. An example is

  Usabilità - Wikipedia

this url is ugly, but in your browser it looks so

  Usabilità - Wikipedia

Rails can do it and it works. I have also written working tests with RSpec and Cucumber. But you need to change 5 regular expressions to accept [à-ùÀ-Ù] as word characters.

The nice thing is you need to use '%C3%A0' only to build the route

  map.resource :usabilità, :as => 'usabilit%C3%A0'

but in the tests (spec and features), the view, the url, the controller, the models, the migrations, and the database you will use 'à'.

Marcello

Could we use the unicode character classes instead of hardcoding in the valid characters?

There is a way to make Regexps work with accents and other characters (including chinese ones) if we use some Regexp's special characters:

\w Any word character (letter, number, underscore) \W Any non-word character

What we want our regexp to catch in this case is: "word_character AND NOT number AND NOT underscore".

Regexp does not support ANDs, but it support ORs () and NOTs ([^]). We can translate the same sentence above as: "NOT (NOT word_character OR number OR underscore)".

Which we can translate to regexp as: /[^\W\d\_]/

It actually works on Ruby 1.8.6 and you can check on Rubular: http://rubular.com/regexes/5550

The only problem is that it does not check for uppercase or lowercase characters, so this won't help in the underscore inflector. But since you have to change another regexps this may help you in a way that it works for more cases and it would be also faster then using unicode character classes (which might actually be the only way to solve the underscore case).

Regards,