Encoding problems with Rails 3 + Ruby 1.9.1 (big surprise)

I have kind of an interesting problem.

I have a form wherein people enter information. Big surprise. If they enter any "weird" characters like ø or é or whatever, the form will submit and all is well. However, I have a select box for the state which, if you're looking at Spain, has states like A Coruña, Cádiz and País Vasco. These are pulled from the database which is set to have everything encoded in UTF-8. Everything we're doing is in UTF-8.

However... when it renders the template IF someone used a non-ASCII character in a field that appears BEFORE the select I get this error:

incompatible character encodings: ASCII-8BIT and UTF-8 (on the same line as f.select :state)

If one of the fields AFTER the state field (like the postal code) contains a non-ASCII character the error is reversed:

incompatible character encodings: UTF-8 and ASCII-8BIT (on the same line as f.select :postal_code)

The more I work with encodings in Rails and Ruby in general, the more I find myself confused and frustrated. I added config.encoding = Encoding::UTF_8 to my application.rb, but that doesn't appear to affect templates at all. The problem, so far as I can see, is in one of two places:

I either need to tell Rack to make all my string parameters encoded in UTF-8 or I need to set my template default encoding to UTF-8. A quick fix is:

params[:form].each { |k, v| v.force_encoding 'UTF-8' if v.is_a? String }

I know this is not ideal, but I don't understand how the view works well enough to do this better.

What should I do to fix this problem? (Oh, and I'm using ERB, as an FYI.)

Could you try latest Rails master?

If you're using the 'mysql' driver, please try mysql2 or ruby-mysql instead.

jeremy

cult hero wrote:

I have kind of an interesting problem.

No this problem is boring and is known since at least 2008. It just bit my ass with Rails 2.3.8.

Here is a beautiful fix: http://redmine.ruby-lang.org/issues/show/1238

I don't know exactly how to fix all this mess.

It's definitely not mysql related. I'm not using MySQL. I'm using PostgreSQL and I'm using Sequel. All the strings coming from the database are UTF-8.

And I saw A LOT about the magic comment, but where do I put it in a template? And there's no way to basically set a "default" magic comment?

Oh, and I should add I'm using beta3.

And I saw A LOT about the magic comment, but where do I put it in a template? And there's no way to basically set a "default" magic comment?

Same problem for me. I fixed the models, controllers and helpers by adding the magic comments, but I don't know how to fix the problem in the view. Anyone?

Adding to environment.rb:

Encoding.default_external = Encoding::UTF_8

Helps fix a few problems until it explodes somewhere else.

Ruby 1.9 is a catastrophe!

I have kind of an interesting problem.

I have a form wherein people enter information. Big surprise. If they

enter any “weird” characters like ø or é or whatever, the form will

submit and all is well. However, I have a select box for the state

which, if you’re looking at Spain, has states like A Coruña, Cádiz and

País Vasco. These are pulled from the database which is set to have

everything encoded in UTF-8. Everything we’re doing is in UTF-8.

However… when it renders the template IF someone used a non-ASCII

character in a field that appears BEFORE the select I get this error:

incompatible character encodings: ASCII-8BIT and UTF-8 (on the same

line as f.select :state)

If one of the fields AFTER the state field (like the postal code)

contains a non-ASCII character the error is reversed:

incompatible character encodings: UTF-8 and ASCII-8BIT (on the same

line as f.select :postal_code)

The more I work with encodings in Rails and Ruby in general, the more

I find myself confused and frustrated. I added config.encoding =

Encoding::UTF_8 to my application.rb, but that doesn’t appear to

affect templates at all. The problem, so far as I can see, is in one

of two places:

I either need to tell Rack to make all my string parameters encoded in

UTF-8 or I need to set my template default encoding to UTF-8. A quick

fix is:

params[:form].each { |k, v| v.force_encoding ‘UTF-8’ if v.is_a?

String }

I know this is not ideal, but I don’t understand how the view works

well enough to do this better.

What should I do to fix this problem? (Oh, and I’m using ERB, as an

FYI.)

Hi, I would recommend using Rails 3 Beta 4 and Ruby 1.9.2. This worked

well for me for the last 4 months. Next, I would recommend using the

mysql2 gem if you’re using mysql2.

Good luck,

-Conrad

Good luck,

-Conrad

Hi Conrad, thanks for the tip. Yeah I'm eagerly waiting for Rails 3 to get released!

In the mean time I managed to make Rails 2.3.8 play nicely with Ruby 1.9.1 and that's very painful to do. I would suggest people to stick to Ruby 1.8 for some time until everything gets settled. 3rd party gems also have to be updated to be compatible with the new 1.9 encoding handling.

Good luck,

-Conrad

Hi Conrad, thanks for the tip. Yeah I'm eagerly waiting for Rails 3 to get released!

In the mean time I managed to make Rails 2.3.8 play nicely with Ruby 1.9.1 and that's very painful to do. I would suggest people to stick to Ruby 1.8 for some time until everything gets settled. 3rd party gems also have to be updated to be compatible with the new 1.9 encoding handling.

I have also have been using Rails 2.3.5 and Ruby 1.9.2 for one project for 6+ months. Thus, it has been super simple to get everything working by using RVM. Thus, it will make it super simple to migrate this project to Rails 3. Lastly, if you're using Ruby 1.9.1, then you're definitely using the wrong version Ruby because it does have bugs. Furthermore, Ruby 1.9.2 is the first C Ruby version to pass 100% of the RubySpec. Last but least, Ruby 1.9.2 cleans up the Ruby syntax and provides the much needed speed boost in production.

Good luck,

-Conrad

Good luck,

-Conrad

Hi Conrad, thanks for the tip. Yeah I'm eagerly waiting for Rails 3 to get released!

In the mean time I managed to make Rails 2.3.8 play nicely with Ruby 1.9.1 and that's very painful to do. I would suggest people to stick to Ruby 1.8 for some time until everything gets settled. 3rd party gems also have to be updated to be compatible with the new 1.9 encoding handling.

Many gems have been updated to support Ruby 1.9 and it should be super simple to fix the ones that are not compatible. I had a very large code base using a lot of gems and plugins. The ones that had associated tests were much easier to fix in general. Lastly, getting up to speed with the syntax and semantic changed made porting for me the easiest as I worked through the various issues. In short, you'll have to make changes to your code either now or later. Thus, I prefer to make incremental improvements over. For example, moving to Ruby 1.9.2. Next, I plan to move to Rails 3. I go in knowing that somethings will not work and will need to be fixed which is a part of software engineering. Just create another branch and just do it. :slight_smile:

Good luck,

-Conrad

Ruby 1.9.2 is not yet released, I'll wait it goes final to update my freebsd port. Until then I'll be running buggy 1.9.1.

My main problem was handling differently encoded strings. So I had to add magic comments all over the place, and force_encoding of rdiscount's output which is US-ASCII.

Moreove my original language uses accentuated characters so if you only write english you might have not run into the same issues as me. But if one of your users posts an accentuated char I guess your app will explode. Have you tried?

Ruby 1.9.2 is not yet released, I’ll wait it goes final to update my

freebsd port. Until then I’ll be running buggy 1.9.1.

1.9.2 is currently in preview and I’m using it on several production applications

with great success. For me, it works better that 1.9.1.

My main problem was handling differently encoded strings. So I had to

add magic comments all over the place, and force_encoding of rdiscount’s

output which is US-ASCII.

Moreove my original language uses accentuated characters so if you only

write english you might have not run into the same issues as me. But if

one of your users posts an accentuated char I guess your app will

explode. Have you tried?

The application that I’m working on support German, Spanish, Russian,

Japanese, French, Portuguese, and Chinese.

Good luck,

-Conrad

Hi, do you test case that I can run locally because I have done a lot of work

in this regard?

-Conrad

The application that I'm working on support German, Spanish, Russian, Japanese, French, Portuguese, and Chinese.

uh? Did you have to add plenty magic comments to your files? Do you need to force_encoding on certain strings such as those returned by rdiscount or hpricot?

What changes did you make to your rails app when you moved from Ruby 1.8 to 1.9.x to avoid the dreaded US-ASCII conflict?

Last question, do you currently used Rails 3 or Rails 2 with Ruby 1.9? It's not clear from your previous posts. Thx

The application that I’m working on support German, Spanish, Russian,

Japanese, French, Portuguese, and Chinese.

uh? Did you have to add plenty magic comments to your files? Do you need

to force_encoding on certain strings such as those returned by rdiscount

or hpricot?

I did not use either hpricot and rdiscount within our development or production

application. Next, I did not have to force encoding because the underlying OS

environment is UTF-8 by default. Thus, you need to make sure that your external

encoding (.i.e. the encoding used for files) and internal encoding (.i.e. the encoding

that’s used for the creation of new string) match up. BTW, I found this information

when I did my initial research on putting together a multilingual application using

Ruby/Rails. The default encoding used in Ruby 1.9 is ASCII-7BIT which is the

same for Ruby 1.8. Did you read the relevant chapters in “Programming Ruby 1.9”?

If not, I would highly recommend reading them because they provide a wealth of

information.

What changes did you make to your rails app when you moved from Ruby 1.8

to 1.9.x to avoid the dreaded US-ASCII conflict?

I remember having to set the default internal encoding and I was good to go. Thus,

I had to do the following:

Encoding.default_internal = ‘utf-8’

Next, I’m using HTML 5 technologies and the view templates are set to utf-8

within the head tag.

Good luck,

-Conrad

Last question, do you currently used Rails 3 or Rails 2 with Ruby 1.9?

It’s not clear from your previous posts. Thx

The current version (i.e. production) uses Rails 2.3.5 and Ruby 1.9.2 and the

development version uses Rails 3.0 beta 4 and Ruby 1.9.2. Furthermore, both

applications currently use the mysql2 Ruby gem.

Good look,

-Conrad

Hi Conrad,

My files are all encoded in UTF-8 because I use TextMate and I double

checked on my server with:

$ file --mime-encoding app/views/layout/application.html.erb

My layout is defined with an html5 doctype and . I

tested it with w3c validator and it detects utf-8.

Now if I remove the magic comment <%# # -- coding: UTF-8 -- %> from

application.html.erb and put Encoding.default_internal = Encoding::UTF_8

at the top of environment.rb, I get the following error:

=> Booting WEBrick

=> Rails 2.3.8 application starting on http://0.0.0.0:3000

[gem_path/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in

`read’: “\xC3” on US-ASCII (Encoding::InvalidByteSequenceError)

Actually, I’m actually using bundler but I would put this statement at the bottom of the environment.rb.

Also, you can set the internal and the external encodings by doing the following:

ruby -E <external_encoding>:<internal_encoding>

For example, you could try using something similar to the following:

PassengerRuby <path_to_ruby_executable>/ruby -E utf-8:utf-8

Next, the w3c validator detection is not all that relevant in regards to how Ruby processes the file. The

w3c validator will parse the file from top to bottom checking that the file syntactically correct. ERB engine

will parse the file looking for relevant tags and replace them accordingly with the appropriate HTML.

-Conrad

Wait!

I noticed that on my freeBSD box, the locale environment variables are

not set.

$ locale

LANG=

LC_CTYPE=“C”

LC_COLLATE=“C”

LC_TIME=“C”

LC_NUMERIC=“C”

LC_MONETARY=“C”

LC_MESSAGES=“C”

LC_ALL=

Have you tried setting the LANG environment for your OS. This is currently set within

my environment.

Also this problem does not happen in development mode!

$ ./script/console

Loading development environment (Rails 2.3.8)

puts Encoding.default_internal

UTF-8

=> nil

$ RAILS_ENV=production ./script/console

Loading production environment (Rails 2.3.8)

/usr/local/lib/ruby/gems/1.9/gems/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in `read’: “\xC3” on US-ASCII (Encoding::InvalidByteSequenceError)

Any idea?

In regards to Rails 2.3, I’m using 2.3.5 and Rails 2.3.8 as well as the mysql2 Ruby gem which does UTF-8 by default. Do

you have a small test case or small application which reproduces the issue?

-Conrad