Character encoding problems.

I've been fighting these problems since I moved to Ruby 1.9. I am now using Ruby 1.9.2 with Rails 2.3.11 and I am still having problems. The symptom is errors with this message:

(incompatible character encodings: UTF-8 and ISO-8859-1)

I've marked all my files with:

# -*- coding: utf-8 -*-

and I've tried doing everything I can to try and get everything to be UTF-8 but I still have other character sets -- sometimes ISO-8859-1 as in this case, other ties it is ASCII-8BIT. These errors usually happen in the views as the various parts are being concatenated together.

I need a general approach to these problems. They are killing me and there doesn't seem to be a clear way to deal with them.

Thank you, pedz

Maybe it help

http://blog.grayproductions.net/articles/ruby_19s_string

Regards, Igor Escobar Software Engineer

Another good link:

http://nuclearsquid.com/writings/ruby-1-9-encodings/

Regards, Igor Escobar Software Engineer

Yes, I understand Ruby's encodings. That isn't my question.

My question is what is a good consistent way to deal with the in Rails given that the input may be any language, you do not have control over the various source files in the gems, etc.

i.e. since the rails files do not have the UTF-8 encoding tag, their defaults is whatever are the default is for my country. But that setting may be different from the end user's browser's settings. There are just all sorts of ways that different encodings can get into the mix. Hasn't anyone else fought these wars besides me?

Hi Perry,

I've run into such encoding issues under ruby 1.9.x and rails 3.x when dealing with unknown/untrusted/bad non-utf8 data params that need to be handled by the rails app as utf-8.

The fix I found (that works for the needs of my apps) was by following a strategy outlined by Paul Battley (http://po-ru.com/diary/fixing- invalid-utf-8-in-ruby-revisited/ ) using Iconv for forcing a string to utf-8:

  ...   UTF8_IC = Iconv.new('UTF-8//IGNORE', 'UTF-8')

  ...   def force_utf8(v)     return (v.is_a?(String)) ? UTF8_IC.iconv("#{v} ")[0..-2] : v   end

  ...

and then adding a before_filter which modifies submitted params strings forcing them to utf-8 before accessing/using those params in the app.

(Maybe there's a better/simpler way since I last tested this issue?)

Jeff

Perry Smith wrote in post #1020442:

I've been fighting these problems since I moved to Ruby 1.9. I am now using Ruby 1.9.2 with Rails 2.3.11 and I am still having problems. The symptom is errors with this message:

(incompatible character encodings: UTF-8 and ISO-8859-1)

I've marked all my files with:

# -*- coding: utf-8 -*-

All that does is tell ruby that the strings in your source code should be interpreted as utf-8. Here is an example:

if data == "€" #That's a euro symbol   ...

By default strings in ruby source code are ASCII.

and I've tried doing everything I can to try and get everything to be UTF-8 but I still have other character sets -- sometimes ISO-8859-1 as in this case, other ties it is ASCII-8BIT. These errors usually happen in the views as the various parts are being concatenated together.

I need a general approach to these problems.

There is none. The bottom line is you have to know the encoding of any data you read into your program. You can try to guess at an encoding, but whether that works is hit and miss

Writing programs that handle anything but ascii data is hard. But the world isn't an ascii world, so you have to adapt. Nevertheless, ruby's handling of utf-8 is broken in so many places, it's astonishing. Hopefully, it will get better. Some people are sticking with ruby 1.8.6 until things get better--but it will never be the case anymore that you can assume that all data is ascii.

I've been fighting these problems since I moved to Ruby 1.9. I am now using Ruby 1.9.2 with Rails 2.3.11 and I am still having problems. The symptom is errors with this message:

(incompatible character encodings: UTF-8 and ISO-8859-1)

I've marked all my files with:

# -*- coding: utf-8 -*-

and I've tried doing everything I can to try and get everything to be UTF-8 but I still have other character sets -- sometimes ISO-8859-1 as in this case, other ties it is ASCII-8BIT. These errors usually happen in the views as the various parts are being concatenated together.

I need a general approach to these problems. They are killing me and there doesn't seem to be a clear way to deal with them.

I think Rails 3 is overall better at dealing with these encoding problems - I can't say I've run into much in the way of encoding problems. I believe the mysql2 gem is better at ensuring that the data that comes out of the database is flagged as utf8

Fred