Rails and Ruby 1.9 encoding issues

Hey guys,

i'm currently trying to run a couple of apps on rails 2.3.2 and Ruby 1.9. Everything seems to work except for these nasty encoding issues a la:

incompatible character encodings: UTF-8 and ASCII-8BIT

I've searched through lighthouse and found a couple of tickets addressing those issues but the patches look like they fight the symptoms and not the real cause of these issues. Also I haven't found anything that got applied to master or to 2.3.2 addressing this so the first question is if there is some agreed solution to those or does nobody have a solution for now?

Its starting really basic. Just created a small empty rails app with a scaffold controller and a sqlite3 (sqlite3-ruby (1.2.4) db.

Now in script/console, when I create a Post with p = Post.create :title => "ü" I get:

p.encoding => #<Encoding:ASCII-8BIT> p.title => "\xC3\xB6"

Where in Ruby 1.8.x I'd get the "ü" back.

It would be also great if you could point me to where I should look at to see the actual problem.

Kind regards, John

I've searched through lighthouse and found a couple of tickets addressing those issues but the patches look like they fight the symptoms and not the real cause of these issues. Also I haven't found anything that got applied to master or to 2.3.2 addressing this so the first question is if there is some agreed solution to those or does nobody have a solution for now?

No, it's quite a large undertaking and I don't see it happening in the near future to be honest. Read on to learn why.

It would be also great if you could point me to where I should look at to see the actual problem.

First you probably want to read:

   Gray Soft / Not Found    Gray Soft / Not Found

After that you basically have to fix every library you use that's not encoding aware. In your specific case that means making sure that the SQLite 3 bindings return strings with the encoding properly set. After that you have to implement a way to specify an encoding for ERB templates and make it work in Rails (I think some work has been done on this).

Once that is done you have to make sure that all the code in Rails that touches your templates and strings don't try to do something incompatible with your strings. I don't believe it does, but it would be nice to have some testcases that prove this.

Finally you have to make sure that anything you write the response body to doesn't try to concatenate your response body to an incompatible encoding, which means making sure Rack and/or Mongrel do the right thing.

Manfred

Hrm,

meanwhile I’ve read through many forum threads (which are more confusing than helpful) and I also played around with that issue a bit more.

First of all I set up a clean install of Ruby1.9, Rails, Postgresql83 and http://github.com/qoobaa/pg/tree/master as the postgres adapter which claims to support ruby1.9 utf-8 (whatever that means). I even initialized the DB with UTF-8 but the results seem to be the same like on sqlite3 and mysql.

Whenever I enter something like this to the console:

Post.create :title => “ä”

I’m getting back:

Post.first.title => “\xC3\xA4”

Now as I said, this is the same for sqlite3, postgresql and mysql in various configurations. So the db is returning utf8 as expected but I still don’t understand where its getting interpreted as us-ascii. Is it the db adapter like this one http://github.com/qoobaa/pg/tree/master or is it really an ActiveRecord problem like it is suggested in many forum posts?

I’d love to help fixing this but I’m just unsure whose problem this really is. In my optimistic opinion this can’t be too hard for the db part.

These encoding issues should be fixed as soon as possible to get rails going on 1.9.1. There are so many reasons to do so!

Kind regards, John

No Ruby database libraries fully support 1.9 string encodings yet. You have to explicitly set the encoding of the strings returned.

jeremy

Just a plug, I have the latest SQL Server adapter working just fine in
1.9 with unicode string support. http://github.com/rails-sqlserver/2000-2005-adapter/tree/master

I really found James' articles on string encoding helpful when
learning this stuff.

  - Ken

Nice! There are similar patches for mysql, pg, and sqlite, but all are unicode-only also. These databases and Ruby both support a wide range of encodings.

jeremy

Hey again,

meanwhile the pg guy wrote me back on github which also explains some issues (at least for me)

qoobaa sent you a message.


Subject:

Currently I’m doing a project using Ruby 1.9.1 and RoR 2.3.2. The main problem is that RoR, Rack and every single DB adapter have problems with encodings. The best solution for now is to convert everything to ASCII-8BIT (it’s a lot easier) - especially force encodings in translations if you use I18n. You can check out my sqlite3-ruby gem (it’s Ruby 1.9.1 ready, there’s only UTF-16 support to be done) my pg gem should handle UTF-8 strings also (but the encoding is hardcoded - ugly solution, but works for now). Good luck.

Sounds like this will take years sigh

Kind regards and thanks for your answers so far!