UTF-8 String.strip bug (and several over methods)

Hello, with Rails 3.0.3

"Café Noir ".strip => "Café noir"
but
"Café ".strip => "Caf\303\251"

In fact, strip() doesn't works if the last printable character is
accentuated.
Surprisingly " écologie".strip works fine.

I've tried to dig deeper in active_support multibyte source code but
didn't found any solution.

Any help ?

Strange, I get:
$ rails console
Loading development environment (Rails 3.0.3)
ruby-1.9.2-p0 > "Café Noir ".strip
=> "Café Noir"
ruby-1.9.2-p0 > "Café ".strip
=> "Café"

Which Ruby are you using?

Colin

I don't see the 'é' in your snippet code. Did you tried with real
accentuated chars ?

I'm using Ruby enterprise edition 1.8.x - I didn't thought about a
possible bug in Ruby itself. I might try a more recent 1.8 version or
REE... Don't want to switch to 1.9 just for a so small (but annoying)
problem...

I don't see the 'é' in your snippet code. Did you tried with real
accentuated chars ?

Is this in reply to my response? You have not quoted anything and
have changed the subject line so gmail has not linked up the thread.

If so I don't understand when you say you do not see the accented
char, copying from my previous post:
$ rails console
Loading development environment (Rails 3.0.3)
ruby-1.9.2-p0 > "Café Noir ".strip
=> "Café Noir"
ruby-1.9.2-p0 > "Café ".strip
=> "Café"
I see accented char é.

It is interesting, though, that the é in your mail looks different to
the one here, even though I have just copied and pasted it from your
email into mine. It does look like yours though when I paste it into
the ruby console. What happens if you copy it from here and use it in
your console?

I'm using Ruby enterprise edition 1.8.x - I didn't thought about a
possible bug in Ruby itself. I might try a more recent 1.8 version or
REE... Don't want to switch to 1.9 just for a so small (but annoying)
problem...

This is the result in 1.8.7
$ ruby script/console
Loading development environment (Rails 2.3.2)
ruby-1.8.7-p302 > "Café Noir ".strip
=> "Café Noir"
ruby-1.8.7-p302 > "Café ".strip
=> "Café"
ruby-1.8.7-p302 >

Of course maybe your response was not to my mail at all, in which case
I have been wasting my time.

Colin

Hello, with Rails 3.0.3

"Café Noir ".strip => "Café noir"
but
"Café ".strip => "Caf\303\251"

While it may not look pretty this is accurate if you are using utf8 -
é is 0xC3 0xA9 in UTF8, which is 0o303 0o251 in octal. I'm not sure
why inspect is choosing to show the octal escape codes but you string
does contain the correct bytes. (maybe some heuristic that tries to
determine whether the string is utf8 and show be displayed as such or
whether it just contains random binary gunk)

Fred

Colin Law wrote in post #968503:

Is this in reply to my response? You have not quoted anything and
have changed the subject line so gmail has not linked up the thread.

I'm posting throught ruby-forum, so may be something got mixed up during
the process ?

So, in your case String.strip() does work correctly with both versions
of Ruby. I really don't understand why it goes wrong for me. May be a
bug in the REE code.

Have you seen Fred's reply back in your original thread?

Colin

Colin Law wrote in post #968534:

If I understand Fred correctly there is nothing wrong with the string,
it is just the display that is wrong in the console. Are you seeing
the same thing when you show it on a web page?

Colin

Frederick Cheung wrote in post #968521:

Hello, with Rails 3.0.3

"Caf Noir ".strip => "Caf noir"
but
"Caf ".strip => "Caf\303\251"

While it may not look pretty this is accurate if you are using utf8 -
is 0xC3 0xA9 in UTF8, which is 0o303 0o251 in octal. I'm not sure
why inspect is choosing to show the octal escape codes but you string
does contain the correct bytes. (maybe some heuristic that tries to
determine whether the string is utf8 and show be displayed as such or
whether it just contains random binary gunk)

Fred

I tried in 3 different versions of ruby and the way it is rendered in
irb
is indeed different (and is confusing):

ruby-1.8.7-p302 > "Caf\303\251"
=> "Caf\303\251"
...
ree-1.8.7-2010.02 > "Caf\303\251"
=> "Caf\303\251"
...
ruby-1.9.2-head > "Caf\303\251"
=> "Café"

@Bob, are you sure you use UTF-8 encoding for your web page?

HTH,

Peter