Character encoding problems.

I've been fighting these problems since I moved to Ruby 1.9. I am now
using Ruby 1.9.2 with Rails 2.3.11 and I am still having problems. The
symptom is errors with this message:

(incompatible character encodings: UTF-8 and ISO-8859-1)

I've marked all my files with:

# -*- coding: utf-8 -*-

and I've tried doing everything I can to try and get everything to be
UTF-8 but I still have other character sets -- sometimes ISO-8859-1 as
in this case, other ties it is ASCII-8BIT. These errors usually happen
in the views as the various parts are being concatenated together.

I need a general approach to these problems. They are killing me and
there doesn't seem to be a clear way to deal with them.

Thank you,
pedz

Maybe it help

http://blog.grayproductions.net/articles/ruby_19s_string

Regards,
Igor Escobar
Software Engineer

Another good link:

http://nuclearsquid.com/writings/ruby-1-9-encodings/

Regards,
Igor Escobar
Software Engineer

Yes, I understand Ruby's encodings. That isn't my question.

My question is what is a good consistent way to deal with the in Rails
given that the input may be any language, you do not have control over
the various source files in the gems, etc.

i.e. since the rails files do not have the UTF-8 encoding tag, their
defaults is whatever are the default is for my country. But that
setting may be different from the end user's browser's settings. There
are just all sorts of ways that different encodings can get into the
mix. Hasn't anyone else fought these wars besides me?

Hi Perry,

I've run into such encoding issues under ruby 1.9.x and rails 3.x when
dealing with unknown/untrusted/bad non-utf8 data params that need to
be handled by the rails app as utf-8.

The fix I found (that works for the needs of my apps) was by following
a strategy outlined by Paul Battley (http://po-ru.com/diary/fixing-
invalid-utf-8-in-ruby-revisited/ ) using Iconv for forcing a string to
utf-8:

  ...
  UTF8_IC = Iconv.new('UTF-8//IGNORE', 'UTF-8')

  ...
  def force_utf8(v)
    return (v.is_a?(String)) ? UTF8_IC.iconv("#{v} ")[0..-2] : v
  end

  ...

and then adding a before_filter which modifies submitted params
strings forcing them to utf-8 before accessing/using those params in
the app.

(Maybe there's a better/simpler way since I last tested this issue?)

Jeff

Perry Smith wrote in post #1020442:

I've been fighting these problems since I moved to Ruby 1.9. I am now
using Ruby 1.9.2 with Rails 2.3.11 and I am still having problems. The
symptom is errors with this message:

(incompatible character encodings: UTF-8 and ISO-8859-1)

I've marked all my files with:

# -*- coding: utf-8 -*-

All that does is tell ruby that the strings in your source code should
be interpreted as utf-8. Here is an example:

if data == "€" #That's a euro symbol
  ...

By default strings in ruby source code are ASCII.

and I've tried doing everything I can to try and get everything to be
UTF-8 but I still have other character sets -- sometimes ISO-8859-1 as
in this case, other ties it is ASCII-8BIT. These errors usually happen
in the views as the various parts are being concatenated together.

I need a general approach to these problems.

There is none. The bottom line is you have to know the encoding of any
data you read into your program. You can try to guess at an encoding,
but whether that works is hit and miss

Writing programs that handle anything but ascii data is hard. But the
world isn't an ascii world, so you have to adapt. Nevertheless, ruby's
handling of utf-8 is broken in so many places, it's astonishing.
Hopefully, it will get better. Some people are sticking with ruby 1.8.6
until things get better--but it will never be the case anymore that you
can assume that all data is ascii.

I've been fighting these problems since I moved to Ruby 1.9. I am now
using Ruby 1.9.2 with Rails 2.3.11 and I am still having problems. The
symptom is errors with this message:

(incompatible character encodings: UTF-8 and ISO-8859-1)

I've marked all my files with:

# -*- coding: utf-8 -*-

and I've tried doing everything I can to try and get everything to be
UTF-8 but I still have other character sets -- sometimes ISO-8859-1 as
in this case, other ties it is ASCII-8BIT. These errors usually happen
in the views as the various parts are being concatenated together.

I need a general approach to these problems. They are killing me and
there doesn't seem to be a clear way to deal with them.

I think Rails 3 is overall better at dealing with these encoding
problems - I can't say I've run into much in the way of encoding
problems. I believe the mysql2 gem is better at ensuring that the data
that comes out of the database is flagged as utf8

Fred