Should Inflector do consider accented vowels while underscoring?

marcello.nuccio · December 31, 2008, 4:47pm

Not sure if this is a plugin functionality, so I am asking here.

By default rails works well in italian, except for not seeing accented vowels as vowels while doing underscore inflection.

<pre>

"FacoltàController".underscore

=> "facoltàcontroller" </pre>

but it should be

<pre>

"FacoltàController".underscore

=> "facoltà_controller" </pre>

Note the underscore after the 'à'. The fix is easy:

<pre> --- a/activesupport/lib/active_support/inflector.rb +++ b/activesupport/lib/active_support/inflector.rb @@ -205,8 +205,8 @@ module ActiveSupport # "ActiveRecord::Errors".underscore # => active_record/errors def underscore(camel_cased_word) camel_cased_word.to_s.gsub(/::/, '/'). - gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2'). - gsub(/([a-z\d])([A-Z])/,'\1_\2'). + gsub(/([A-ZÀ-Ù]+)([A-ZÀ-Ù][a-zà-ù])/,'\1_\2'). + gsub(/([a-zà-ù\d])([A-ZÀ-Ù])/,'\1_\2'). tr("-", "_"). downcase end </pre>

Should it go to core rails?

Manfred_Stienstra · January 3, 2009, 10:54am

Hi Marcello,

The fix for central european languages is relatively easy, but the
solution for languages in general is very hard. I'm a bit hesitant to
root for a solution that singles out a few languages and doesn't
support the rest.

If you really want to explore this, it might be a good idea to start a
plugin that replaces the inflector implementation and see if other
people are interested?

Manfred

Jose_Valim · January 3, 2009, 3:47pm

A good start point can be this plugin:

http://github.com/rsl/stringex/tree/master

It already deals with accents to generate slugs.

Daniel_Schierbeck · January 3, 2009, 5:30pm

Perhaps an incremental solution would be better -- add support for Central European characters first and see if there's a need for a more elaborate solution.

Only thing I have against this is that I believe all code should be written in English (I'm Danish myself).

Cheers, Daniel

marcello.nuccio · January 4, 2009, 12:26pm

This is what I do, but there are exceptions.

For example I am writing ActiveRecord models for an Italian University. There are no good English translations for many model and attributes names, because these are technical terms used only in Italy.

The same is true for many other applications.

In my opinion Ruby (and Rails) support for other languages is excellent because you can do:

# app/models/facoltà.rb class Facoltà < ActiveRecord::Base def unità end end

and it does the right thing! You can use utf-8 characters everywhere and it just works. I only changed 5 regular expression to accepts [à- ùÀ-Ù] and now I can use accents in file, class, function, attribute, and variable names. This also works fine with RSpec and Cucumber. In rspec you only need to add

$KCODE = 'u'

Cucumber just works. And you can also use accents in URLs (as WikiPedia do...).

It does not make sense not to use non English words if there's no good English translation for them.

Marcello

marcello.nuccio · January 4, 2009, 1:05pm

The fix for central european languages is relatively easy, but the
solution for languages in general is very hard. I'm a bit hesitant to
root for a solution that singles out a few languages and doesn't
support the rest.

This is already what rails does. Support for English like languages is given by default (e.g. support for localization). For more complex languages you need a plugin.

If you really want to explore this, it might be a good idea to start a
plugin that replaces the inflector implementation and see if other
people are interested?

Ok, I will try to do it. But this plugin will have the same problems of old localization plugins, because I need to monkey patch functions.

thanks, Marcello

marcello.nuccio · January 4, 2009, 1:22pm

Hi José, I want to do just the opposite, i.e. allow the use of accents everywhere, slug included. An example is

Usabilità - Wikipedia

this url is ugly, but in your browser it looks so

Usabilità - Wikipedia

Rails can do it and it works. I have also written working tests with RSpec and Cucumber. But you need to change 5 regular expressions to accept [à-ùÀ-Ù] as word characters.

The nice thing is you need to use '%C3%A0' only to build the route

map.resource :usabilità, :as => 'usabilit%C3%A0'

but in the tests (spec and features), the view, the url, the controller, the models, the migrations, and the database you will use 'à'.

Marcello

Will_Bryant · January 5, 2009, 12:43am

Could we use the unicode character classes instead of hardcoding in the valid characters?

Jose_Valim · January 5, 2009, 1:58pm

There is a way to make Regexps work with accents and other characters (including chinese ones) if we use some Regexp's special characters:

\w Any word character (letter, number, underscore) \W Any non-word character

What we want our regexp to catch in this case is: "word_character AND NOT number AND NOT underscore".

Regexp does not support ANDs, but it support ORs () and NOTs ([^]). We can translate the same sentence above as: "NOT (NOT word_character OR number OR underscore)".

Which we can translate to regexp as: /[^\W\d\_]/

It actually works on Ruby 1.8.6 and you can check on Rubular: http://rubular.com/regexes/5550

The only problem is that it does not check for uppercase or lowercase characters, so this won't help in the underscore inflector. But since you have to change another regexps this may help you in a way that it works for more cases and it would be also faster then using unicode character classes (which might actually be the only way to solve the underscore case).

Regards,

Topic		Replies	Views
Added user configurable inflection rules to allow camelize and underscore to be inverse operations rubyonrails-core patch	0	175	April 2, 2008
Dealing with accented characters rubyonrails-talk	6	144	October 2, 2008
Inflector problem? rubyonrails-talk	0	90	June 19, 2007
International character search. rubyonrails-talk	3	126	October 13, 2006
[PR 12510] ActiveSupport::Inflector.underscore bug fix rubyonrails-core	0	205	November 6, 2013

Should Inflector do consider accented vowels while underscoring?

Related topics

More Resources