feedback on a few ActiveSupport::Multibyte patches

Norman_Clarke1 · May 10, 2010, 6:01pm

Hi all,

In response to Rodrigo Rosas's message about mb_chars.upcase not giving the expected result on 1.9, I've done some work in a fork to make String#mb_chars always return an instance of a proxy class, both with Ruby 1.8 and Ruby 1.9. The end result of the patch is (hopefully) to make Rails' multibyte functionality behave the same way in 1.8.7 and 1.9.x.

http://github.com/norman/rails/tree/multibyte

Basically, the problem is that with current edge Rails and 1.9.x, `"café".mb_chars.upcase` will return "CAFé" rather than the expected "CAFÉ".

In my changes, the proxy class leaves some methods undefined for 1.9 because they have a native equivalent, but redefines a few others because either they are buggy or, like String#upcase, don't have the same behavior as AS::Multibyte::Chars.

Additionally, I refactored all of the Unicode support in ActiveSupport into a new module, ActiveSupport::Multibyte::Unicode. This makes some useful functionality like UTF-8 normalization/composition/decomposition easier to reuse since it's no longer bound to the ActiveSupport::Multibyte::Chars class.

I'd be very grateful for any feedback.

Regards,

Norman

Rodrigo_Rosenfeld_R1 · May 11, 2010, 7:51pm

Norman, I checked out your multibyte branch but it is not working for me. Here is what I did:

$ cd ~/src/rails $ git remote add norman http://github.com/norman/rails.git $ git remote update $ git checkout norman/multibyte -b multibyte $ rvm ruby-head $ gem install thor bundle $ ruby bin/rails ~/temp/multibyte --dev $ cd ~/temp/multibyte $ script/rails c $ > 'ação'.mb_chars.upcase # yields 'AO' instead of 'AÇÃO' $ > 'ação'.mb_chars.class # yields ActiveSupport::Multibyte::Chars - OK

Any ideas?

Also, from the diffs between master and your branch I could realize that there is a lot of multibyte code in ActiveSupport. Maybe this could be put in an external gem on which AS would depend of. It would make AS cleaner and it would allow testing other gems as proxies... For instance, when running on JRuby, it would probably be better to have a different approach since strings in Java are unicode and String#toUpperCase() would already give the expected results... Any thoughts?

Thank you for your effort on correcting this multibyte issue for Ruby 1.9 on Rails,

Rodrigo.

Norman_Clarke1 · May 11, 2010, 8:24pm

Norman, I checked out your multibyte branch but it is not working for me. Here is what I did: <...> Any ideas?

No, not off the top of my head. But I'll retrace your steps and see if I get the same problems. Thanks for looking into it and getting back to me with your detailed feedback.

Also, from the diffs between master and your branch I could realize that there is a lot of multibyte code in ActiveSupport. Maybe this could be put in an external gem on which AS would depend of. It would make AS cleaner and it would allow testing other gems as proxies... For instance, when running on JRuby, it would probably be better to have a different approach since strings in Java are unicode and String#toUpperCase() would already give the expected results... Any thoughts?

I don't think there's "a lot" of multibyte code in ActiveSupport, it's around 1000 lines, or roughly twice the size of inflector. Maintaining it in a separate gem would be more project management overhead, for something that doesn't usually see a lot of developer activity and is going to be required anyway. Also, it's very easy to write your own proxy classes if you want, for example, to use one the relies on Java's native string handling for JRuby. I wouldn't be opposed if the Rails team wanted to do that, but I just don't see any significant benefit.

-Norman

Norman_Clarke1 · May 12, 2010, 4:02pm

I just checked this out and it is working correctly for me. I'm not sure where things are going wrong for you, but I'm unable to reproduce your problem. Here's more or less what I just did:

cd ~/work/rails git checkout master git pull origin master git checkout multibyte git rebase master cd activesupport rvm ruby-head rake test # this pukes because of recent changes to String rvm 1.9.2 rake test # segfault rvm 1.9.1 rake test # ok, all tests pass. cd .. ruby bin/rails /tmp/mb --dev cd /tmp/mb

now create temp.rb with following contents: # encoding utf-8 puts 'ação'.mb_chars.upcase

ruby script/rails runner temp.rb #works rvm ruby-head bundle install ruby script/rails runner temp.rb # also works rvm ree ruby script/rails runner temp.rb # also works

These are the Rubies I have installed (I'm on 64-bit Snow Leopard)

$ rvm list

rvm Rubies

jruby-1.4.0 [ [x86_64-java] ] ree-1.8.7-2010.01 [ x86_64 ] ruby-1.8.6-p399 [ x86_64 ] ruby-1.9.1-p243 [ x86_64 ] ruby-1.9.1-p378 [ x86_64 ] ruby-1.9.2-preview1 [ x86_64 ] => ruby-head [ x86_64 ]

System Ruby

system [ x86_64 i386 ppc ]

-Norman

Rodrigo_Rosenfeld_R1 · May 12, 2010, 11:59pm

HEm 12-05-2010 13:02, Norman Clarke escreveu:

Rodrigo_Rosenfeld_R1 · May 13, 2010, 7:41pm

Using this approach (a runner with a file specifying the encoding) your branch works at my work too.

But at home, I can run 'ação'.mb_chars.upcase in rails console and it works too. At work, 'ação'.mb_chars yields 'ao'. Any idea why this is not consistent in both environments?

Thanks,

Rodrigo.

Norman_Clarke1 · May 13, 2010, 8:01pm

If you're trying it on the console, it's probably a difference in the way your consoles are set up to handle UTF-8 characters. I think the only really reliable way to test this is by putting the text in a file.

Topic		Replies	Views
mb_chars.upcase and Ruby 1.9.2 rubyonrails-core	19	266	May 21, 2010
ActiveSupport::Multibyte for better Unicode support rubyonrails-core	55	841	September 29, 2006
ActiveSupport::Multibyte handlers rubyonrails-core	0	107	April 3, 2008
ActiveSupport::Multibyte question \|\| Rails console question rubyonrails-talk	4	179	October 8, 2006
Multi Byte Strings rubyonrails-core	22	571	November 12, 2006

feedback on a few ActiveSupport::Multibyte patches

Related topics

More Resources