UTF-8 String.strip bug (and several over methods)

Hello, with Rails 3.0.3

"Café Noir ".strip => "Café noir" but "Café ".strip => "Caf\303\251"

In fact, strip() doesn't works if the last printable character is accentuated. Surprisingly " écologie".strip works fine.

I've tried to dig deeper in active_support multibyte source code but didn't found any solution.

Any help ?

Strange, I get: $ rails console Loading development environment (Rails 3.0.3) ruby-1.9.2-p0 > "Café Noir ".strip => "Café Noir" ruby-1.9.2-p0 > "Café ".strip => "Café"

Which Ruby are you using?

Colin

I don't see the 'é' in your snippet code. Did you tried with real accentuated chars ?

I'm using Ruby enterprise edition 1.8.x - I didn't thought about a possible bug in Ruby itself. I might try a more recent 1.8 version or REE... Don't want to switch to 1.9 just for a so small (but annoying) problem...

I don't see the 'é' in your snippet code. Did you tried with real accentuated chars ?

Is this in reply to my response? You have not quoted anything and have changed the subject line so gmail has not linked up the thread.

If so I don't understand when you say you do not see the accented char, copying from my previous post: $ rails console Loading development environment (Rails 3.0.3) ruby-1.9.2-p0 > "Café Noir ".strip => "Café Noir" ruby-1.9.2-p0 > "Café ".strip => "Café" I see accented char é.

It is interesting, though, that the é in your mail looks different to the one here, even though I have just copied and pasted it from your email into mine. It does look like yours though when I paste it into the ruby console. What happens if you copy it from here and use it in your console?

I'm using Ruby enterprise edition 1.8.x - I didn't thought about a possible bug in Ruby itself. I might try a more recent 1.8 version or REE... Don't want to switch to 1.9 just for a so small (but annoying) problem...

This is the result in 1.8.7 $ ruby script/console Loading development environment (Rails 2.3.2) ruby-1.8.7-p302 > "Café Noir ".strip => "Café Noir" ruby-1.8.7-p302 > "Café ".strip => "Café" ruby-1.8.7-p302 >

Of course maybe your response was not to my mail at all, in which case I have been wasting my time.

Colin

Hello, with Rails 3.0.3

"Café Noir ".strip => "Café noir" but "Café ".strip => "Caf\303\251"

While it may not look pretty this is accurate if you are using utf8 - é is 0xC3 0xA9 in UTF8, which is 0o303 0o251 in octal. I'm not sure why inspect is choosing to show the octal escape codes but you string does contain the correct bytes. (maybe some heuristic that tries to determine whether the string is utf8 and show be displayed as such or whether it just contains random binary gunk)

Fred

Colin Law wrote in post #968503:

Is this in reply to my response? You have not quoted anything and have changed the subject line so gmail has not linked up the thread.

I'm posting throught ruby-forum, so may be something got mixed up during the process ?

So, in your case String.strip() does work correctly with both versions of Ruby. I really don't understand why it goes wrong for me. May be a bug in the REE code.

Have you seen Fred's reply back in your original thread?

Colin

Colin Law wrote in post #968534:

If I understand Fred correctly there is nothing wrong with the string, it is just the display that is wrong in the console. Are you seeing the same thing when you show it on a web page?

Colin

Frederick Cheung wrote in post #968521:

Hello, with Rails 3.0.3

"Caf Noir ".strip => "Caf noir" but "Caf ".strip => "Caf\303\251"

While it may not look pretty this is accurate if you are using utf8 - is 0xC3 0xA9 in UTF8, which is 0o303 0o251 in octal. I'm not sure why inspect is choosing to show the octal escape codes but you string does contain the correct bytes. (maybe some heuristic that tries to determine whether the string is utf8 and show be displayed as such or whether it just contains random binary gunk)

Fred

I tried in 3 different versions of ruby and the way it is rendered in irb is indeed different (and is confusing):

ruby-1.8.7-p302 > "Caf\303\251" => "Caf\303\251" ... ree-1.8.7-2010.02 > "Caf\303\251" => "Caf\303\251" ... ruby-1.9.2-head > "Caf\303\251" => "Café"

@Bob, are you sure you use UTF-8 encoding for your web page?

HTH,

Peter