Should Inflector do consider accented vowels while underscoring?

Not sure if this is a plugin functionality, so I am asking here.

By default rails works well in italian, except for not seeing accented
vowels as vowels while doing underscore inflection.

<pre>

"FacoltàController".underscore

=> "facoltàcontroller"
</pre>

but it should be

<pre>

"FacoltàController".underscore

=> "facoltà_controller"
</pre>

Note the underscore after the 'à'. The fix is easy:

<pre>
--- a/activesupport/lib/active_support/inflector.rb
+++ b/activesupport/lib/active_support/inflector.rb
@@ -205,8 +205,8 @@ module ActiveSupport
     # "ActiveRecord::Errors".underscore # => active_record/errors
     def underscore(camel_cased_word)
       camel_cased_word.to_s.gsub(/::/, '/').
- gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
- gsub(/([a-z\d])([A-Z])/,'\1_\2').
+ gsub(/([A-ZÀ-Ù]+)([A-ZÀ-Ù][a-zà-ù])/,'\1_\2').
+ gsub(/([a-zà-ù\d])([A-ZÀ-Ù])/,'\1_\2').
         tr("-", "_").
         downcase
     end
</pre>

Should it go to core rails?

Hi Marcello,

The fix for central european languages is relatively easy, but the
solution for languages in general is very hard. I'm a bit hesitant to
root for a solution that singles out a few languages and doesn't
support the rest.

If you really want to explore this, it might be a good idea to start a
plugin that replaces the inflector implementation and see if other
people are interested?

Manfred

A good start point can be this plugin:

  http://github.com/rsl/stringex/tree/master

It already deals with accents to generate slugs.

Perhaps an incremental solution would be better -- add support for
Central European characters first and see if there's a need for a more
elaborate solution.

Only thing I have against this is that I believe all code should be
written in English (I'm Danish myself).

Cheers,
Daniel

This is what I do, but there are exceptions.

For example I am writing ActiveRecord models for an Italian
University. There are no good English translations for many model and
attributes names, because these are technical terms used only in
Italy.

The same is true for many other applications.

In my opinion Ruby (and Rails) support for other languages is
excellent because you can do:

  # app/models/facoltà.rb
  class Facoltà < ActiveRecord::Base
    def unità
    end
  end

and it does the right thing! You can use utf-8 characters everywhere
and it just works. I only changed 5 regular expression to accepts [à-
ùÀ-Ù] and now I can use accents in file, class, function, attribute,
and variable names.
This also works fine with RSpec and Cucumber. In rspec you only need
to add

  $KCODE = 'u'

Cucumber just works. And you can also use accents in URLs (as
WikiPedia do...).

It does not make sense not to use non English words if there's no good
English translation for them.

Marcello

The fix for central european languages is relatively easy, but the
solution for languages in general is very hard. I'm a bit hesitant to
root for a solution that singles out a few languages and doesn't
support the rest.

This is already what rails does. Support for English like languages is
given by default (e.g. support for localization). For more complex
languages you need a plugin.

If you really want to explore this, it might be a good idea to start a
plugin that replaces the inflector implementation and see if other
people are interested?

Ok, I will try to do it. But this plugin will have the same problems
of old localization plugins, because I need to monkey patch functions.

thanks,
Marcello

Hi José,
I want to do just the opposite, i.e. allow the use of accents
everywhere, slug included. An example is

  http://it.wikipedia.org/wiki/Usabilità

this url is ugly, but in your browser it looks so

  http://it.wikipedia.org/wiki/Usabilità

Rails can do it and it works. I have also written working tests with
RSpec and Cucumber. But you need to change 5 regular expressions to
accept [à-ùÀ-Ù] as word characters.

The nice thing is you need to use '%C3%A0' only to build the route

  map.resource :usabilità, :as => 'usabilit%C3%A0'

but in the tests (spec and features), the view, the url, the
controller, the models, the migrations, and the database you will use
'à'.

Marcello

Could we use the unicode character classes instead of hardcoding in the valid characters?

There is a way to make Regexps work with accents and other characters
(including chinese ones) if we use some Regexp's special characters:

\w Any word character (letter, number, underscore)
\W Any non-word character

What we want our regexp to catch in this case is: "word_character AND
NOT number AND NOT underscore".

Regexp does not support ANDs, but it support ORs ([]) and NOTs ([^]).
We can translate the same sentence above as: "NOT (NOT word_character
OR number OR underscore)".

Which we can translate to regexp as: /[^\W\d\_]/

It actually works on Ruby 1.8.6 and you can check on Rubular:
http://rubular.com/regexes/5550

The only problem is that it does not check for uppercase or lowercase
characters, so this won't help in the underscore inflector.
But since you have to change another regexps this may help you in a
way that it works for more cases and it would be also faster then
using unicode character classes (which might actually be the only way
to solve the underscore case).

Regards,